SCIENCE CHINA Information Sciences, Volume 64 , Issue 1 : 116101(2021) https://doi.org/10.1007/s11432-020-2885-6

Why over-parameterization of deep neural networks does not overfit?

More info
  • ReceivedApr 11, 2020
  • AcceptedApr 15, 2020
  • PublishedSep 14, 2020


There is no abstract available for this article.


This work was supported by National Natural Science Foundation of China (NSFC) (Grant Nos. 61751306, 61921006). The author wants to thank Shen-Huan LYU and Zhi-Hao TAN for discussion and help in figures.


[1] Neyshabur B, Tomioka R, Srebro N. Norm-based capacity control in neural networks. In: Proceedings of the 28th Conference on Learing Theory, Paris, 2015. 1376--1401. Google Scholar

[2] Zhang C Y, Bengio S, Hardt M, et al. Understanding deep learning requires rethinking generalization. In: Proceedings of the 5th International Conference on Learning Representation, Toulon, 2017. Google Scholar

[3] Nagarajan V, Kolter J Z. Uniform convergence may be unable to explain generalization in deep learning. In: Proceedins of Advances in Neural Information Processing Systems, 2019. 11615--11626. Google Scholar

[4] Lawrence S, Giles C L, Tsoi A C. Lessons in neural network training: overfitting may be harder than expected. In: Proceedings of the 14th National Conference on Artificial Intelligence, Providence, 1997. 540--545. Google Scholar

[5] Liu Y Y, Starzyk J A, Zhu Z. Optimized Approximation Algorithm in Neural Networks Without Overfitting. IEEE Trans Neural Netw, 2008, 19: 983-995 CrossRef Google Scholar

[6] Kulis B. Metric learning: a survey. Found Trends Mach Learn, 2013, 5: 287--363. Google Scholar

[7] Davis J V, Kulis B, Jain P, et al. Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, Corvalis, 2007. 209--216. Google Scholar

  • Figure 1

    (Color online) (a) A decompositional view of deep neural networks; (b) a typical performance plot showing that over-parameterization of the CC part can lead to overfitting (replot based on experimental results presented in [4]); (c) a typical performance plot which shows that over-parameterization of FST does not necessarily lead to overfitting.