Page Text: , the sub-Gaussian concentration shows that the fluctuation of
under both
is at most
. Since
, the fluctuation is indeed negligible compared with the mean difference. Careful analysis also shows that the contribution of
to the entropy difference is negligible.
The last requirement requires proper modifications on the priors
constructed in Lemma 5 to satisfy the mean constraint. This can be done via the following trick of change of measures: let
be the priors constructed in Lemma 5which attains
, whose value is summarized in the following lemma.
Lemma 11
Next we construct the priors
as follows: for
Then it is easy to show that both
and
are probability measures, have matched moments up to degree
, and have mean
Hence, the fourth requirement is fulfilled without hurting the previous ones.
In summary, Theorem 1 holds with
, and we arrive at the desired lower bound
.
4. Bibliographic Notes
The method of two fuzzy hypotheses (Theorem 1) are systematically used in Ingster and Suslina (2012) on nonparametric testing, and the current form of the theorem is taken from Theorem 2.14 of Tsybakov (2009). Statistical closeness of Gaussian mixture models via moment matching (Theorem 3) was established in Cai and Low (2011), Hardt and Price (2015), Wu and Yang (2018) for the
-divergence, where the result is new for the TV distance. For Theorem 4, the TV version was established in part by Jiao, Kartik, Han and Weissman (2015) and Jiao, Han and Weissman (2018), where the
version is new here. When
, a stronger bound of the TV distance without the squared root is also available in Wu and Yang (2016). For more properties of Hermite and Charlier polynomials, we refer to Labelle and Yeh (1989).
For Gaussian examples, the proper learning of two-component Gaussian mixture was established in Hardt and Price (2015). The
norm estimation problem was taken from Cai and Low (2011), which was further motivated by Lepski, Nemirovski and Spokoiny (1999). The Gaussian mean testing example under
metric was taken from Ingster and Suslina (2012). For technical lemmas, proofs of Lemma 5 is taken from Lepski, Nemirovski and Spokoiny (1999) and Wu and Yang (2016), respectively, and Lemma 6 was due to Bernstein (1912).
For Poisson examples, the non-asymptotic equivalence between i.i.d. sampling model and the Poissonized model was due to Jiao, Kartik, Han and Weissman (2015). The tight bounds of the generalized uniformity testing problem were due to Batu and Canonne (2017) for constant
, and Diakonikolas, Kane and Stewart (2018) for general
, where their proof was greatly simplified here thanks to Theorem 4. For Shannon entropy estimation, the optimal sample complexity was obtained in Valiant and Valiant (2011), and the minimax risk was obtained independently in Jiao, Kartik, Han and Weissman (2015) and Wu and Yang (2016). For tools in approximation theory to establish Lemma 10 and 11, we refer to books Devore and Lorentz (1993), Ditzian and Totik (2012) for wonderful toolsets.
Yuri Ingster and Irina A. Suslina. Nonparametric goodness-of-fit testing under Gaussian models. Vol. 169. Springer Science & Business Media, 2012.
Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009.
T. Tony Cai, and Mark G. Low. Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional. The Annals of Statistics 39.2 (2011): 1012–1041.
Moritz Hardt and Eric Price. Tight bounds for learning a mixture of two gaussians. Proceedings of the forty-seventh annual ACM symposium on Theory of computing. ACM, 2015.
Yihong Wu and Pengkun Yang. Optimal estimation of Gaussian mixtures via denoised method of moments. arXiv preprint arXiv:1807.07237 (2018).
Jiantao Jiao, Kartik Venkat, Yanjun Han, and Tsachy Weissman, Minimax estimation of functionals of discrete distributions. IEEE Transactions on Information Theory 61.5 (2015): 2835-2885.
Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of the
distance. IEEE Transactions on Information Theory 64.10 (2018): 6672–6706.
Yihong Wu and Pengkun Yang, Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory 62.6 (2016): 3702–3720.
Jacques Labelle, and Yeong Nan Yeh. The combinatorics of Laguerre, Charlier, and Hermite polynomials. Studies in Applied Mathematics 80.1 (1989): 25–36.
Oleg Lepski, Arkady Nemirovski, and Vladimir Spokoiny. On estimation of the L r norm of a regression function. Probability theory and related fields 113.2 (1999): 221-253.
Serge Bernstein. Sur l’ordre de la meilleure approximation des fonctions continues par des polynomes de degré donné. Vol. 4. Hayez, imprimeur des académies royales, 1912.
Tugkan Batu, and Clément L. Canonne. Generalized uniformity testing. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2017.
Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Sharp bounds for generalized uniformity testing. Advances in Neural Information Processing Systems. 2018.
Gregory Valiant and Paul Valiant. The power of linear estimators. 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science. IEEE, 2011.
Ronald A. DeVore and George G. Lorentz. Constructive approximation. Vol. 303. Springer Science & Business Media, 1993.
Zeev Ditzian and Vilmos Totik. Moduli of smoothness. Vol. 9. Springer Science & Business Media, 2012.