Matching entries: 0
settings...
AuthorTitleYearJournal/ProceedingsReftypeDOI/URL
Bartlett, P.L., Long, P.M., Lugosi, G. and Tsigler, A. Benign overfitting in linear regression 2020 Proceedings of the National Academy of Sciences  article DOI URL 
Abstract: The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lie in an infinite-dimensional space vs. when the data lie in a finite-dimensional space with dimension that grows faster than the sample size.
BibTeX:
@article{Bartlett2020,
  author = {Bartlett, P. L. and Long, P. M. and Lugosi, G. and Tsigler, A.},
  title = {Benign overfitting in linear regression},
  journal = {Proceedings of the National Academy of Sciences},
  publisher = {National Academy of Sciences},
  year = {2020},
  url = {https://www.pnas.org/content/early/2020/04/22/1907378117},
  doi = {https://doi.org/10.1073/pnas.1907378117}
}
Bekkers, E.J. B-Spline CNNs on Lie groups 2020 International Conference on Learning Representations  inproceedings URL 
BibTeX:
@inproceedings{Bekkers2020,
  author = {E. J. Bekkers},
  title = {B-Spline CNNs on Lie groups},
  booktitle = {International Conference on Learning Representations},
  year = {2020},
  url = {https://openreview.net/forum?id=H1gBhkBFDH}
}
Belkin, M., Ma, S. and Mandal, S. To Understand Deep Learning We Need to Understand Kernel Learning 2018
Vol. 80Proceedings of the 35th International Conference on Machine Learning, pp. 541-549 
inproceedings URL 
Abstract: Generalization performance of classifiers in deep learning has recently become a subject of intense study. Deep models, which are typically heavily over-parametrized, tend to fit the training data exactly. Despite this ``overfitting", they perform well on test data, a phenomenon not yet fully understood. The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using six real-world and two synthetic datasets, we establish experimentally that kernel machines trained to have zero classification error or near zero regression error (interpolation) perform very well on test data. We proceed to give a lower bound on the norm of zero loss solutions for smooth kernels, showing that they increase nearly exponentially with data size. None of the existing bounds produce non-trivial results for interpolating solutions. We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels, a finding that parallels results recently reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. Similar performance of overfitted Laplacian and Gaussian classifiers on test, suggests that generalization is tied to the properties of the kernel function rather than the optimization process. Some key phenomena of deep learning are manifested similarly in kernel methods in the modern ``overfitted" regime. The combination of the experimental and theoretical results presented in this paper indicates a need for new theoretical ideas for understanding properties of classical kernel methods. We argue that progress on understanding deep learning will be difficult until more tractable ``shallow'' kernel methods are better understood.
BibTeX:
@inproceedings{Belkin2018a,
  author = {Belkin, M. and Ma, S. and Mandal, S.},
  title = {To Understand Deep Learning We Need to Understand Kernel Learning},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning},
  publisher = {PMLR},
  year = {2018},
  volume = {80},
  pages = {541--549},
  url = {https://proceedings.mlr.press/v80/belkin18a.html}
}
Belkin, M., Rakhlin, A. and Tsybakov, A.B. Does data interpolation contradict statistical optimality? 2019
Vol. 89Proceedings of Machine Learning Research, pp. 1611-1619 
inproceedings URL 
Abstract: We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.
BibTeX:
@inproceedings{Belkin2019,
  author = {Belkin, M. and Rakhlin, A. and Tsybakov, A. B.},
  title = {Does data interpolation contradict statistical optimality?},
  booktitle = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
  year = {2019},
  volume = {89},
  pages = {1611--1619},
  url = {http://proceedings.mlr.press/v89/belkin19a.html}
}
Charalampopoulos, A.-T.G. and Sapsis, T.P. Machine-learning energy-preserving nonlocal closures for turbulent fluid flows and inertial tracers 2021   misc  
BibTeX:
@misc{Charalampopoulos2021,
  author = {A.-T. G. Charalampopoulos and T. P. Sapsis},
  title = {Machine-learning energy-preserving nonlocal closures for turbulent fluid flows and inertial tracers},
  year = {2021}
}
Chen, Z., Zhang, J., Arjovsky, M. and Bottou, L. Symplectic Recurrent Neural Networks 2019   article URL 
Abstract: We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories. An SRNN models the Hamiltonian function of the system by a neural network and furthermore leverages symplectic integration, multiple-step training and initial state optimization to address the challenging numerical issues associated with Hamiltonian systems. We show SRNNs succeed reliably on complex and noisy Hamiltonian systems. We also show how to augment the SRNN integration scheme in order to handle stiff dynamical systems such as bouncing billiards.
BibTeX:
@article{Chen2019,
  author = {Chen, Z. and Zhang, J. and Arjovsky, M. and Bottou, L.},
  title = {Symplectic Recurrent Neural Networks},
  year = {2019},
  url = {http://arxiv.org/abs/1909.13334}
}
Erichson, N.B., Muehlebach, M. and Mahoney, M.W. Physics-informed Autoencoders for Lyapunov-stable Fluid Flow Prediction 2019   article URL 
Abstract: In addition to providing high-profile successes in computer vision and natural language processing, neural networks also provide an emerging set of techniques for scientific problems. Such data-driven models, however, typically ignore physical insights from the scientific system under consideration. Among other things, a physics-informed model formulation should encode some degree of stability or robustness or well-conditioning (in that a small change of the input will not lead to drastic changes in the output), characteristic of the underlying scientific problem. We investigate whether it is possible to include physics-informed prior knowledge for improving the model quality (e.g., generalization performance, sensitivity to parameter tuning, or robustness in the presence of noisy data). To that extent, we focus on the stability of an equilibrium, one of the most basic properties a dynamic system can have, via the lens of Lyapunov analysis. For the prototypical problem of fluid flow prediction, we show that models preserving Lyapunov stability improve the generalization error and reduce the prediction uncertainty.
BibTeX:
@article{Erichson2019,
  author = {Erichson, N. B. and Muehlebach, M. and Mahoney, M. W.},
  title = {Physics-informed Autoencoders for Lyapunov-stable Fluid Flow Prediction},
  year = {2019},
  url = {http://arxiv.org/abs/1905.10866}
}
Jacot, A., Gabriel, F. and Hongler, C. Neural Tangent Kernel: Convergence and Generalization in Neural Networks 2018
Vol. 31Advances in Neural Information Processing Systems, pp. 8571-8580 
inproceedings URL 
BibTeX:
@inproceedings{Jacot2018,
  author = {Jacot, A. and Gabriel, F. and Hongler, C.},
  title = {Neural Tangent Kernel: Convergence and Generalization in Neural Networks},
  booktitle = {Advances in Neural Information Processing Systems},
  publisher = {Curran Associates, Inc.},
  year = {2018},
  volume = {31},
  pages = {8571--8580},
  url = {https://proceedings.neurips.cc/paper/2018/file/5a4be1fa34e62bb8a6ec6b91d2462f5a-Paper.pdf}
}
Lagaris, I., Likas, A. and Fotiadis, D. Artificial neural networks for solving ordinary and partial differential equations 1998 IEEE Transactions on Neural Networks
Vol. 9(5), pp. 987-1000 
article DOI  
Abstract: We present a method to solve initial and boundary value problems using artificial neural networks. A trial solution of the differential equation is written as a sum of two parts. The first part satisfies the initial/boundary conditions and contains no adjustable parameters. The second part is constructed so as not to affect the initial/boundary conditions. This part involves a feedforward neural network containing adjustable parameters (the weights). Hence by construction the initial/boundary con- ditions are satisfied and the network is trained to satisfy the differential equation. The applicability of this approach ranges from single ordinary differential equations (ODE’s), to systems of coupled ODE’s and also to partial differential equations (PDE’s). In this article, we illustrate the method by solving a variety of model problems and present comparisons with solutions obtained using the Galekrkin finite element method for several cases of partial differential equations. With the advent of neuroprocessors and digital signal processors the method becomes particularly interesting due to the expected essential gains in the execution speed.
BibTeX:
@article{Lagaris1998,
  author = {I.E. Lagaris and A. Likas and D.I. Fotiadis},
  title = {Artificial neural networks for solving ordinary and partial differential equations},
  journal = {IEEE Transactions on Neural Networks},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  year = {1998},
  volume = {9},
  number = {5},
  pages = {987--1000},
  doi = {https://doi.org/10.1109/72.712178}
}
Lu, L., Jin, P., Pang, G., Zhang, Z. and Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators 2021 Nature Machine Intelligence
Vol. 3(3), pp. 218-229 
article DOI  
BibTeX:
@article{Lu2021,
  author = {Lu Lu and Pengzhan Jin and Guofei Pang and Zhongqiang Zhang and George Em Karniadakis},
  title = {Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators},
  journal = {Nature Machine Intelligence},
  publisher = {Springer Science and Business Media LLC},
  year = {2021},
  volume = {3},
  number = {3},
  pages = {218--229},
  doi = {https://doi.org/10.1038/s42256-021-00302-5}
}
Qian, E., Kramer, B., Peherstorfer, B. and Willcox, K. Lift & Learn: Physics-informed machine learning for large-scale nonlinear dynamical systems 2020 Physica D: Nonlinear Phenomena
Vol. 406, pp. 132401 
article DOI URL 
Abstract: We present Lift & Learn, a physics-informed method for learning low-dimensional models for large-scale dynamical systems. The method exploits knowledge of a system's governing equations to identify a coordinate transformation in which the system dynamics have quadratic structure. This transformation is called a lifting map because it often adds auxiliary variables to the system state. The lifting map is applied to data obtained by evaluating a model for the original nonlinear system. This lifted data is projected onto its leading principal components, and low-dimensional linear and quadratic matrix operators are fit to the lifted reduced data using a least-squares operator inference procedure. Analysis of our method shows that the Lift & Learn models are able to capture the system physics in the lifted coordinates at least as accurately as traditional intrusive model reduction approaches. This preservation of system physics makes the Lift & Learn models robust to changes in inputs. Numerical experiments on the FitzHugh--Nagumo neuron activation model and the compressible Euler equations demonstrate the generalizability of our model.
BibTeX:
@article{Qian2020,
  author = {E. Qian and B. Kramer and B. Peherstorfer and K. Willcox},
  title = {Lift & Learn: Physics-informed machine learning for large-scale nonlinear dynamical systems},
  journal = {Physica D: Nonlinear Phenomena},
  year = {2020},
  volume = {406},
  pages = {132401},
  url = {http://www.sciencedirect.com/science/article/pii/S0167278919307651},
  doi = {https://doi.org/10.1016/j.physd.2020.132401}
}
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network 2020 Physica D: Nonlinear Phenomena
Vol. 404, pp. 132306 
article DOI URL 
Abstract: Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of ``unrolling'' an RNN is routinely presented without justification throughout the literature. The goal of this tutorial is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in Signal Processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the ``Vanilla LSTM''1 1The nickname ``Vanilla LSTM'' symbolizes this model's flexibility and generality (Greff et al., 2015). network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this treatise valuable as well.
BibTeX:
@article{Sherstinsky2020,
  author = {A. Sherstinsky},
  title = {Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network},
  journal = {Physica D: Nonlinear Phenomena},
  year = {2020},
  volume = {404},
  pages = {132306},
  url = {http://www.sciencedirect.com/science/article/pii/S0167278919305974},
  doi = {https://doi.org/10.1016/j.physd.2019.132306}
}
Shin, Y., Zhang, Z. and Karniadakis, G.E. Error estimates of residual minimization using neural networks for linear PDEs 2020   misc  
BibTeX:
@misc{Shin2020,
  author = {Y. Shin and Z. Zhang and G. E. Karniadakis},
  title = {Error estimates of residual minimization using neural networks for linear PDEs},
  year = {2020}
}
Zang, Y., Bao, G., Ye, X. and Zhou, H. Weak adversarial networks for high-dimensional partial differential equations 2020 Journal of Computational Physics
Vol. 411, pp. 109409 
article DOI URL 
Abstract: Solving general high-dimensional partial differential equations (PDE) is a long-standing challenge in numerical mathematics. In this paper, we propose a novel approach to solve high-dimensional linear and nonlinear PDEs defined on arbitrary domains by leveraging their weak formulations. We convert the problem of finding the weak solution of PDEs into an operator norm minimization problem induced from the weak formulation. The weak solution and the test function in the weak formulation are then parameterized as the primal and adversarial networks respectively, which are alternately updated to approximate the optimal network parameter setting. Our approach, termed as the weak adversarial network (WAN), is fast, stable, and completely mesh-free, which is particularly suitable for high-dimensional PDEs defined on irregular domains where the classical numerical methods based on finite differences and finite elements suffer the issues of slow computation, instability and the curse of dimensionality. We apply our method to a variety of test problems with high-dimensional PDEs to demonstrate its promising performance.
BibTeX:
@article{Zang2020,
  author = {Y. Zang and G. Bao and X. Ye and H. Zhou},
  title = {Weak adversarial networks for high-dimensional partial differential equations},
  journal = {Journal of Computational Physics},
  year = {2020},
  volume = {411},
  pages = {109409},
  url = {http://www.sciencedirect.com/science/article/pii/S0021999120301832},
  doi = {https://doi.org/10.1016/j.jcp.2020.109409}
}
Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O. Understanding Deep Learning (Still) Requires Rethinking Generalization 2021 Commun. ACM
Vol. 64(3), pp. 107-115 
article DOI URL 
Abstract: Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training.Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice.We interpret our experimental findings by comparison with traditional models.We supplement this republication with a new section at the end summarizing recent progresses in the field since the original version of this paper.
BibTeX:
@article{Zhang2021,
  author = {Zhang, C. and Bengio, S. and Hardt, M. and Recht, B. and Vinyals, O.},
  title = {Understanding Deep Learning (Still) Requires Rethinking Generalization},
  journal = {Commun. ACM},
  publisher = {Association for Computing Machinery},
  year = {2021},
  volume = {64},
  number = {3},
  pages = {107--115},
  url = {https://doi.org/10.1145/3446776},
  doi = {https://doi.org/10.1145/3446776}
}