As noted in the previous section, the study of classical algorithms in many cases can be carried out using asymptotic methods of mathematical statistics, in particular, using CLT and methods of inheritance of convergence. The separation of classical mathematical statistics from the needs of applied research manifested itself, in particular, in the fact that the widespread monographs lack the mathematical apparatus necessary, in particular, for the study of two-sample statistics. The bottom line is that you have to go to the limit not by one parameter, but by two - the volumes of two samples. I had to develop an appropriate theory - the theory of inheritance of convergence, set out in our monograph.

However, the results of such a study will have to be applied with finite sample sizes. There is a whole bunch of problems associated with such a transition. Some of them were discussed in connection with the study of the properties of statistics constructed from samples from specific distributions.

However, when discussing the effect of deviations from the initial assumptions on the properties of statistical procedures, additional problems arise. What deviations are considered typical? Should we focus on the most “harmful” deviations that distort the properties of algorithms to the greatest extent, or should we focus on the “typical” deviations?

With the first approach, we get a guaranteed result, but the "price" of this result may be too high. As an example, let us point out the universal Berry-Esseen inequality for the error in the CLT. A.A. quite rightly emphasizes Borovkov that "the rate of convergence in real problems, as a rule, turns out to be better."

The second approach raises the question of what deviations are considered "typical". One can try to answer this question by analyzing large arrays of real data. It is only natural that the responses of different research groups will differ, as can be seen, for example, in the results presented in the article.

One of the false ideas is the use in the analysis of possible deviations only of any specific parametric family - the Weibull-Gnedenko distributions, the three-parameter family of gamma distributions, etc. Back in 1927, Acad. USSR Academy of Sciences S.N. Bernstein discussed the methodological error of reducing all empirical distributions to the four-parameter Pearson family. However, parametric methods of statistics are still very popular, especially among applied specialists, and the blame for this delusion lies primarily with the teachers of statistical methods (see below, as well as the article).

15. Choosing one of the many criteria to test a particular hypothesis

In many cases, many methods have been developed to solve a specific practical problem, and a specialist in mathematical research methods is faced with a problem: which of them should be offered to an applied specialist for analyzing specific data?

As an example, consider the problem of checking the homogeneity of two independent samples. As you know, for its solution, you can offer a lot of criteria: Student, Cramer-Welch, Lord, chi-square, Wilcoxon (Mann-Whitney), Van der Waerden, Savage, N.V. Smirnov, type omega-square (Lehman -Rosenblatt), G.V. Martynov and others. Which one to choose?

The idea of \u200b\u200b"voting" naturally comes to mind: to check according to many criteria, and then make a decision "by the majority of votes". From the point of view of statistical theory, such a procedure simply leads to the construction of another criterion, which a priori is no better than the previous ones, but more difficult to study. On the other hand, if the solutions for all the considered statistical criteria, based on different principles, coincide, then, in accordance with the concept of sustainability, this increases the confidence in the overall solution obtained.

There is a widespread, especially among mathematicians, a false and harmful opinion about the need to find optimal methods, solutions, etc. The point is that optimality usually disappears when deviating from the initial assumptions. Thus, the arithmetic mean as an estimate of the mathematical expectation is optimal only when the initial distribution is normal, while a consistent estimate is always, if only the mathematical expectation exists. On the other hand, for any arbitrary method of estimation or hypothesis testing, it is usually possible to formulate the concept of optimality in such a way that the method under consideration becomes optimal - from this specially chosen point of view. Take, for example, the sample median as an estimate of the expected value. It is, of course, optimal, albeit in a different sense than the arithmetic mean (optimal for a normal distribution). Namely, for the Laplace distribution, the sample median is an estimate of the maximum likelihood, and therefore optimal (in the sense specified in the monograph).

The homogeneity criteria were analyzed in the monograph. There are several natural approaches to the comparison of criteria - based on the asymptotic relative efficiency according to Bahadur, Hodges-Lehman, Pitman. And it turned out that each criterion is optimal with a corresponding alternative or a suitable distribution over a set of alternatives. In this case, mathematical calculations usually use the shift alternative, which is relatively rare in the practice of analyzing real statistical data (in connection with the Wilcoxon criterion, this alternative was discussed and criticized by us in). The bottom line is sad - the brilliant mathematical technique demonstrated in does not allow giving recommendations for choosing a criterion for testing homogeneity when analyzing real data. In other words, from the point of view of the work of an applied person, i.e. analysis of specific data, the monograph is useless. The brilliant mastery of mathematics and the enormous diligence demonstrated by the author of this monograph, alas, brought nothing to practice.

Of course, every practically working statistician in one way or another solves for himself the problem of choosing a statistical criterion. Based on a number of methodological considerations, we opted for an omega-square (Lehmann-Rosenblatt) criterion that was consistent against any alternative. However, there remains a feeling of dissatisfaction with the insufficient justification of this choice.

EFFICIENCY ASYMPTOTIC CRITERION

A concept that makes it possible to carry out in the case of large samples quantitatively of two different statistical. criteria used to test false and the same statistic. hypotheses. The need to measure the effectiveness of criteria arose in the 30s and 40s, when computationally simple but ineffective

Encyclopedia of Mathematics. - M .: Soviet encyclopedia... I. M. Vinogradov. 1977-1985.

See what "EFFICIENCY ASYMPTOTIC CRITERION" is in other dictionaries:

    Correlation coefficient - (Correlation coefficient) The correlation coefficient is a statistical indicator of the dependence of two random variables. Determination of the correlation coefficient, types of correlation coefficients, properties of the correlation coefficient, calculation and application ... ... Investor encyclopedia

    Methods of mathematical. statistics that do not imply knowledge of the functional form of general distributions. The name nonparametric methods emphasizes their difference from the classical parametric methods, in which it is assumed that the general ... ... Encyclopedia of Mathematics

    The process of presenting information in a certain standard form and the reverse process of recovering information from its such representation. In mathematical. literature is called coding. mapping of an arbitrary set AB is a set of finite ... Encyclopedia of Mathematics

1 Entropy and information distance

1.1 Basic definitions and notation.

1.2 Entropy of discrete distributions with bounded expectation.

1.3 Logarithmic generalized metric on the set of discrete distributions.

1.4 Compactness of functions of a countable set of arguments

1.5 Continuity of the Kullback - Leibler - Sanov information distance

1.6 Conclusions.

2 Probabilities of large deviations

2.1 Probabilities of large deviations of functions from the number of cells with a given filling.

2.1.1 Local limit theorem.

2.1.2 The integral limit theorem.

2.1.3 Information distance and large deviation probabilities of separable statistics

2.2 Probabilities of large deviations of separable statistics that do not satisfy Cramer's condition.

2.3 Conclusions.

3 Asymptotic properties of goodness-of-fit tests

3.1 Consent criteria for a non-return selection scheme

3.2 Asymptotic relative efficiency of goodness-of-fit tests.

3.3 Criteria based on the number of cells in generalized layouts.

3.4 Conclusions.

Recommended list of dissertations

  • Asymptotic efficiency of goodness-of-fit tests based on the characterization properties of distributions 2011, candidate of physical and mathematical sciences Volkova, Ksenia Yurievna

  • Large deviations and limit theorems for some functionals of a random walk 2011, candidate of physical and mathematical sciences Shklyaev, Alexander Viktorovich

  • Limit theorems and large deviations for increments of random walks 2004, candidate of physical and mathematical sciences Kozlov, Andrey Mikhailovich

  • On the rate of convergence of statistics of goodness-of-fit tests with exponential measures of divergence to a chi-square distribution 2010, candidate of physical and mathematical sciences Zubov, Vasily Nikolaevich

  • Large deviation probabilities of ergodic Markov chains asymptotically homogeneous in space 2004, Doctor of Physical and Mathematical Sciences Korshunov, Dmitry Alekseevich

Dissertation introduction (part of the abstract) on the topic "Asymptotic properties of goodness-of-fit criteria for testing hypotheses in a selection scheme without returning, based on filling the cells in a generalized allocation scheme"

The object of research and the relevance of the topic. In the theory of statistical analysis of discrete sequences a special place is occupied by goodness-of-fit criteria for testing the possibly complex null hypothesis, which is that for a random sequence such that

Xi е hi, i \u003d 1,, n, where hi \u003d (0,1,., M), for any i \u003d 1,., N, and for any k £ 1m the probability of the event

Xi \u003d k) does not depend on r. This means that the sequence is stationary in a sense.

In a number of applied problems, as a sequence (Xr-) ™ \u003d 1, we consider a sequence of colors of balls when choosing without returning until exhaustion from an urn containing ni - 1\u003e 0 balls of color k, k € 1m. 1,., Pm - 1). Let the urn contain n - 1 balls, m k \u003d 0

We denote by r (k) (fc) Jk) rw - Г! ,. ... ... , sequence of numbers of balls of color A; in the sample. Consider the sequence where k)

Kk-p-GPk1.

The sequence h ^ is defined using the distances between the places of adjacent balls of color k in such a way that

Pk Kf \u003d p. 1\u003e \u003d 1

The set of sequences h (fc) for all k £ 1m uniquely determines the sequence. The sequences hk for different k are dependent on each other. In particular, any of them is uniquely determined by all the others. If the cardinality of the set 1m is equal to 2, then the sequence of colors of the balls is uniquely determined by the sequence of distances between the places of neighboring balls of the same fixed color. Let an urn containing n - 1 balls of two different colors contain N - 1 balls of color 0. One can establish a one-to-one correspondence between the set ffl (N- l, n - N) and the set 9 \\ n, N vectors h (n, N ) \u003d (hi,., hjf) with positive integer components such that K \u003d A. (0.1)

The set 9NP) dz corresponds to the set of all different partitions of the integer positive number n into N ordered terms.

Having given a certain probability distribution on the set of vectors Н Hn, dg, we obtain the corresponding probability distribution on the set Wl (N - 1, n - N). The set is a subset of the set of vectors with nonnegative integer components satisfying (0.1). As probability distributions on the set of vectors in the thesis, distributions of the form

P (%, N) \u003d (n,., RN)) \u003d P (£ „\u003d ru, v \u003d l,., N \\ jr ^ \u003d n), (0.2) where. , £ dr are independent non-negative integer random variables.

Distributions of the form (0.2) in / 24 / are called generalized schemes of distribution of n particles over N cells. In particular, if the random variables £ b. , £ лг in (0.2) are distributed according to Poisson's laws with parameters Ai,., Лдг, respectively, then the vector h (n, N) has a polynomial distribution with the probabilities of outcomes

Ri \u003d. , Л ", V \u003d \\,., N.

L \\ +. ... ... + AN

If the random variables £ b\u003e & v in (0-2) are equally distributed according to the geometric law where p is any in the interval 0< р < 1, то, как отмечено в /25/,/26/, получающаяся обобщенная схема размещения соответствует равномерному распределению на множестве В силу взаимнооднозначного соответствия между множеством dft(N - 1 ,п - N) и множеством tRn,N получаем равномерное распределение на множестве выборов без возвращения. При этом, вектору расстояний между местами шаров одного цвета взаимно однозначно соответствует вектор частот в обобщенной схеме размещения, и, соответственно, числу расстояний длины г - число ячеек, содержащих ровно г частиц. Для проверки по единственной последовательности гипотезы о том, что она получена как результат выбора без возвращения, и каждая такая выборка имеет одну и ту же вероятность можно проверить гипотезу о том, что вектор расстояний между местами шаров цвета 0 распределен как вектор частот в соответствующей обобщенной схеме размещения п частиц по N ячейкам.

As noted in / 14 /, / 38 /, a special place in testing hypotheses about the distribution of frequency vectors h (n, N) \u003d (hi,., / Rdr) in generalized schemes for allocating n particles over N cells is occupied by criteria based on based on statistics like 1 m (N -l, nN) \\ N

LN (h (n, N)) \u003d Zfv (hv)

Fn \u003d F (-T7, flQ Hi II-

0.4) where fu, v \u003d 1,2 ,. and φ are some real-valued functions, N

Mr \u003d E \u003d r), r \u003d 0.1 ,. 1 / \u003d 1

The values \u200b\u200bin / 27 / were called the number of cells containing exactly g particles.

Statistics of the form (0.3) in / 30 / are called separable (additively separable) statistics. If the functions / „in (0.3) do not depend on u, then such statistics were called in / 31 / symmetric separable statistics.

For any r, the statistic / xr is a symmetric separable statistic. From equality

Е ДМ \u003d Е ДФг (0.5) it follows that the class of symmetric separable statistics of hv coincides with the class of linear functions of fir. Moreover, the class of functions of the form (0.4) is broader than the class of symmetric separable statistics.

But \u003d (#o (n, N)) is a sequence of simple null hypotheses that the distribution of the vector h (n, N) is (0.2), where the random variables ,. in (0.2) are identically distributed and k) \u003d pk, k \u003d 0,1,2,., the parameters n, N change in the central region.

Consider some P £ (0,1) and a sequence of, generally speaking, complex alternatives

H \u003d (H (n, N)) such that there exists - the maximum number for which, for any simple hypothesis H \\ e H (n, N), the inequality

РШ\u003e an, N (P))\u003e Р

We will reject the hypothesis Hq (ti, N) if fm\u003e asm ((3). If there is a limit

Rn ~ 1nP (0nr\u003e an, N (P)) \u003d u (p, H), where the probability for each N is calculated under the hypothesis Ht (n, N), then the value ^ (/ 3, H) is named in / 38 / the index of the criterion φ at the point (j3, H). The latter limit may, generally speaking, not exist. Therefore, in the dissertation work, in addition to the criterion index, the value

Ish (~ 1pR (fm\u003e al (/?)))

JV-\u003e oo N-oo mean, respectively, the lower and upper limits of the sequence (odr) as N -\u003e oo,

If a criterion index exists, then the criterion subscript coincides with it. The criterion subscript always exists. The higher the value of the criterion index (the lower criterion index), the better in the considered sense the statistical criterion. In / 38 / the problem of constructing goodness-of-fit criteria for generalized layouts with the greatest value the index of the criterion in the class of criteria that reject the hypothesis Ho (n, N) at / MO Ml Mtch RG iV "iV" "" "" ~ yv "" ^ "where m\u003e 0 is some fixed number, the sequence of constants u from a given value of the power of the criterion for a sequence of alternatives, ft is a real function of m + 1 arguments.

The criterion indices are determined by the probabilities of large deviations. As was shown in / 38 /, the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of separable statistics under the Cramer condition for the random variable / (ξ) is determined by the corresponding information distance of the Kul-Bak - Leibler - Sanov (the random variable rj satisfies the condition< Н /28/).

The question of the probabilities of large deviations of statistics from an unlimited number of fir, as well as arbitrary separable statistics that do not satisfy Cramer's condition, remained open. This did not allow us to finally solve the problem of constructing criteria for testing hypotheses in generalized allocation schemes with the highest rate of convergence to zero for the probability of a type I error with approaching alternatives in the class of criteria based on statistics of the form (0.4). The relevance of the dissertation research is determined by the need to complete the solution of the specified problem.

The aim of the thesis is to construct goodness-of-fit criteria with the highest criterion index (lower criterion index) for testing hypotheses in a selection scheme without returning in the class of criteria that reject the hypothesis U (n, N) at $.<>,■ ■)><*. (0-7) где ф - функция от счетного количества аргументов, и параметры п, N изменяются в центральной области.

In accordance with the purpose of the study, the following tasks were set:

Investigate the properties of the entropy and information distance of Kul-bak - Leibler - Sanov for discrete distributions with a countable number of outcomes;

Investigate the probabilities of large deviations of statistics of the form (0.4);

Investigate the probabilities of large deviations of symmetric separable statistics (0.3) that do not satisfy Cramer's condition;

Find a statistic such that the goodness-of-fit test constructed on its basis for testing hypotheses in generalized allocation schemes has the highest index value in the class of criteria of the form (0.7).

Scientific novelty:

Scientific and practical value. A number of questions about the behavior of the probabilities of large deviations in generalized allocation schemes have been solved. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences and were used in / 3 /, / 21 / when justifying the security of one class of information systems. Provisions for Defense:

Reduction of the problem of testing a hypothesis by a single sequence of colors of balls from the fact that this sequence was obtained as a result of a choice without returning until exhaustion of balls from an urn containing balls of two colors, and each such choice has the same probability, to the construction of goodness-of-fit tests for testing hypotheses in the corresponding generalized layout scheme;

Continuity of the Kullback - Leibler - Sanov entropy and information distance functions on an infinite-dimensional simplex with the introduced logarithmic generalized metric;

A theorem on the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations for symmetric separable statistics that do not satisfy the Cramer condition in the generalized allocation scheme in the semi-axial case;

The theorem on the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations for statistics of the form (0.4);

Construction of a goodness-of-fit test for testing hypotheses in generalized layouts with the highest index value in the class of criteria of the form (0.7).

Approbation of work. The results were presented at the seminars of the Department of Discrete Mathematics of the Mathematical Institute. VA Steklov RAS, Information Security Department of ITMiVT named after S. A. Lebedev RAS and:

Fifth All-Russian Symposium on Applied and Industrial Mathematics. Spring session, Kislovodsk, May 2 - 8, 2004;

Sixth International Petrozavodsk Conference "Probabilistic Methods in Discrete Mathematics" June 10 - 16, 2004;

Second International Conference "Information Systems and Technologies (IST" 2004) ", Minsk, November 8-10, 2004;

International conference "Modern Problems and new Trends in Probability Theory", Chernivtsi, Ukraine, June 19 - 26, 2005.

The main results of the work were used in the research work "Apologia" carried out by the Institute of Mathematics and Mechanics and Technology of the Russian Academy of Sciences. SA Lebedev in the interests of the Federal Service for Technical and Export Control of the Russian Federation, and were included in the report on the implementation of the R&D stage / 21 /. Some of the results of the dissertation were included in the research report "Development of mathematical problems of cryptography" of the Academy of Cryptography of the Russian Federation for 2004/22 /.

The author expresses his deep gratitude to the scientific adviser, Doctor of Physics and Mathematics A.F. Ronzhin, and to the scientific adviser, Doctor of Physics and Mathematics, Senior Researcher A.V. Knyazev. The author expresses his gratitude to Professor A.M. Mathematical Sciences Kruglov IA for the attention shown to this work, and a number of valuable remarks.

The structure and content of the work.

The first chapter examines the properties of entropy and information distance for distributions on the set of non-negative integers.

In the first paragraph of the first chapter, the notation is introduced and the necessary definitions are given. In particular, the following notation is used: x \u003d (xq, x \\,.) Is an infinite-dimensional vector with a countable number of components;

H (x) - -Ex ^ oXvlnx, -, truncm (x) \u003d (x0, x1,., Xm, 0,0 ,.)] f2 * \u003d (x, xu\u003e 0, zy \u003d 0,1 ,. , oh x „< 1}; Q = {х, х, > 0, u \u003d 0,1,., О xv \u003d 1); \u003d (x G 0, £ L0 \u003d 7);

Ml \u003d o Ue\u003e 1 | 5 € o< Ml - 7МГ1 < 00}. Понятно, что множество £1 соответствует семейству вероятностных распределений на множестве неотрицательных целых чисел, П7 - семейству вероятностных распределений на множестве неотрицательных целых чисел с математическим ожиданием 7.

If y 6, then for e\u003e 0 by Oe (y) we will denote the set

Oe (y) - (x ^< уие£ для всех v = 0,1,.}.

In the second section of the first chapter, a theorem on the boundedness of the entropy of discrete distributions with bounded mathematical expectation is proved.

Theorem 1. On the boundedness of the entropy of discrete distributions with bounded expectation.

For any w 6 P7

H (x)

If x € fly corresponds to a geometric distribution with a mathematical definition of 7, that is, 7 x „\u003d (1-p) p \\ v \u003d 0.1,., Where p \u003d -,

1 + 7 then the equality

H (x) \u003d F (<7).

The statement of the theorem can be viewed as the result of a formal application of the Lagrange conditional multiplier method in the case of an infinite number of variables. The theorem that the only distribution on the set (k, k + 1, k + 2 ,.) with a given mathematical expectation and maximum entropy is a geometric distribution with a given mathematical expectation is given (without proof) in / 47 /. The author, however, has given a rigorous proof.

In the third section of the first chapter, a definition of a generalized metric is given - a metric that admits infinite values.

For x, y € Q, the function p (x, y) is defined as the minimal e\u003e 0 with the property yie ~ £<хи< уиее для всех и = 0,1,. Если такого е не существует, то полагается, что р(х,у) = оо.

It is proved that the function p (x, y) is a generalized metric on a family of distributions on the set of non-negative integers, as well as on the entire set Cl *. Instead of e in the definition of the metric p (x, y), you can use any other positive number other than 1. The resulting metrics will differ by a multiplicative constant. Let J (x, y) denote the information distance

00 £ J (x, y) \u003d Е In-.

Hereinafter, it is assumed that 0 In 0 \u003d 0.0 In jj \u003d 0. The information distance is defined for such x, y such that xn \u003d 0 for all and such that yi \u003d 0. If this condition is not satisfied, then we will put J (x, ij) \u003d oo. Let L SP. Then we will denote

J (A Y) \u003d | nf J (x, y).

In the fourth section of the first chapter, we define the compactness of functions defined on the set Q *. The compactness of a function with a countable number of arguments means that with any degree of accuracy the value of a function can be approximated by the values \u200b\u200bof this function at points where only a finite number of arguments are nonzero. The compactness of the entropy and information distance functions is proved.

1. For any 0< 7 < оо функция Н(х) компактна на

2. If for some 0< 70 < оо

R e then for any 0<7<оо,г>0 the function x) \u003d J (x, p) is compact on the set Ц7] Og (p).

In the fifth section of the first chapter, the properties of the information distance defined on an infinite-dimensional space are considered. Compared with the finite-dimensional case, the situation with the continuity of the information distance function changes qualitatively. It is shown that the information distance function is not continuous on the set in any of the metrics

Pl & V) \u003d E \\ Xi ~ Y \\, u \u003d 0

E (xv - Yi) 2 v \u003d Q

P3 (x, y) \u003d 8Up \\ xu-yv \\. v

The following inequalities are proved for the functions of entropy H (x) and information distance J (x, p):

1. For any x, x "€ fi

H (x) - H (x ") \\< - 1){Н{х) + Н{х")).

2. If for some x, p e there exists an e\u003e 0 such that x 6 0 £ (p), then for any x "e Q J (x, p) - J (x", p) |< (е"М - 1){Н{х) + Н{х") + ееН(р)).

Taking into account Theorem 1, these inequalities imply the uniform continuity of the entropy and information distance functions on the corresponding subsets of Q in the metric p (x, y) t, namely,

1. For any 7 such that 0< 7 < оо, функция Н(х) равномерно непрерывна на Г2 в метрике р(ж,у);

2. If for some 70, 0< 70 < оо

THEN for any 0<7<оои£>0 function

A p (x) \u003d J (x, p) is uniformly continuous on the set Oe (p) in the metric p (x, y).

The definition of non-extremal function is given. The non-extremity condition means that the function has no local extrema, or the function takes the same values \u200b\u200bat local minima (local maxima). The non-extremality condition weakens the requirement that there are no local extrema. For example, the function sin x on the set of real numbers has local extrema, but satisfies the non-extremity condition.

Let for some 7\u003e 0, the region A is given by the condition

A \u003d (x € VLv4\u003e (x)\u003e a), (0.9) where φ (x) is a real-valued function, a is some real constant, inf φ (x)< а < inf ф(х).

The question was studied, under what conditions on the function φ when changing the parameters n, N in the central region, ^ -; 7, for all their sufficiently large values \u200b\u200bthere are non-negative integers ko, k1,., Kn such that k0 + ki +. + kn \u003d N, k \\ + 2k2. + pkp - N and

Ф (ko k \\ kn

- £, 0,0,.)\u003e A.

It is shown that for this it is sufficient for the function φ to be non-extremal, compact and continuous in the metric p (x, y), and also that for at least one point x satisfying (0.9), for some ε\u003e 0 there exists a finite moment of degree 1 + e and x „\u003e 0 for any v \u003d 0,1 ,.

In the second chapter, a rough (up to logarithmic equivalence) asymptotic behavior of the probability of large deviations of functions from A \u003d (^ 0) ■) C "n, 0,.) Is investigated - the number of cells with a given filling in the central region of variation of the parameters N, n. the asymptotics of the probabilities of large deviations are sufficient for studying the indices of the goodness-of-fit tests.

Let the random variables ^ in (0.2) be identically distributed and

P (z), the generating function of a random variable, converges in a circle of radius 1< R < оо. Следуя /38/, для 0 < z < R обозначим через £(z) случайную величину такую, что

Ml + £ \u003d £ i1 + ex „< 00.

0.10) k] \u003d Pk, k \u003d 0.1 ,.

We denote

If there is a solution to the equation m Z (z) \u003d b then it is unique / 38 /. Throughout what follows we will assume that pk\u003e 0, A; \u003d 0.1 ,.

In the first section of the first section of the second chapter, we find the asymptotics of the logarithms of the probabilities of the form

1nP (/ x0 \u003d ko,., Cn \u003d kn).

The following theorem is proved.

Theorem 2. A rough local theorem on the probabilities of large deviations. Let n, N - »oo so that jj -\u003e 7,0<7 < оо, существует z7 - корень уравнения M£(z) = 7, с. в. £(г7) имеет положительную дисперсию. Тогда для любого k G Cl(n,N)

1nP (A \u003d k) \u003d JftpK)) + O (^ lniV).

The assertion of the theorem follows directly from the formula for the joint distribution fii ,. fin in / 26 / and the following estimate: if nonnegative integer values, Нп satisfy the condition

Hi + 2d2 + + PNn \u003d n, then the number of nonzero values \u200b\u200bamong them is 0 (n / n). This is a rough estimate that does not claim to be new. The number of nonzero μg in generalized allocation schemes does not exceed the value of the maximum filling of cells, which in the central region with a probability tending to 1 does not exceed the value O (lnn) / 25 /, / 27 /. Nevertheless, the obtained estimate 0 (y / n) is satisfied with probability 1, and it is sufficient to obtain a rough asymptotics.

In the second paragraph of the first paragraph of the second chapter, the value of the limit is found where adr is a sequence of real numbers converging to some a G R, φ (x) is a real-valued function. The following theorem is proved.

Theorem 3. A rough integral theorem on the probabilities of large deviations. Let the conditions of Theorem 2 be satisfied, for some r\u003e 0, C\u003e 0 the real function φ (x) is compact, uniformly continuous in the metric p on the set

A \u003d 0r +<;(p(z7)) П Ц7+с] и удовлетворяет условию неэкстремальности на множестве fly. Если для некоторой константы а такой, что inf ф(х) < а < sup ф(х). xeily существует вектор ра € fi7 П 0r(p(z7)); такой, что

Ф (pa)\u003e a and j (( (x)\u003e a, xe 7), p (2; 7)) \u003d 7 (pa, p (* y)) mo for any sequence a ^ converging to a,

Jim -vbPW %% ,.)\u003e aN) \u003d J (pa, p (2h)). (0.11)

Under additional restrictions on the function φ (x), the information distance J (pa, p (z7)) in (2.3) can be calculated more specifically. Namely, the following theorem is true. Theorem 4. On information distance. Let for some 0< 7 < оо для некоторвх г > 0, C\u003e 0, the real function φ (x) and its first-order partial derivatives are compact and uniformly continuous in the generalized metric p (x, y) on the set p G

A \u003d Oi (p)% + c] there exist T\u003e 0, R\u003e 0, such that for all \\ t \\<Т,0 < z < R,x е А

E ^ exp ^ -ph (x))< оо,

0 (a;) exp (t-< со, i/=o oxv 0X1/ для некоторого е > 0 °° Q pvv1 + £ zu exp (t-φ (x))< оо, (0.13) и существует единственный вектор x(z,t), удовлетворяющий системе уравнений xv(z, t) = pvzv ехр {Ь-ф(х(г, t))}, v = 0,1,. функция ф(х) удовлетворяет на множестве А условию неэкстремальности, а - некоторая константа, ф(р) < а < sup ф(:x)(z,t),

0

00 vpv (za, ta) \u003d 7, 1 / \u003d 0

0 (p (* aL)) \u003d a, where

Then p (za, ta) € and

J ((x e A, φ (x) \u003d a), p) \u003d J (p (za, ta), p)

00 d 00 d \u003d l \\ nza + taYl ir- (x (za, ta)) - In E ^ y / exp (ta-z- (p (zatta))). j / \u003d 0 C ^ i / t ^ \u003d 0

If the function φ (x) is a linear function, and the function f (x) is defined using equality (0.5), then condition (0.12) turns into Cramér's condition for the random variable f (£ (z)). Condition (0.13) is the form of condition (0.10) and is used to prove the presence in domains of the form (x G φ (x)\u003e a) at least one point from 0 (n, N) for all sufficiently large n, N.

Let ^) (n, N) \u003d (hi,., / Rdr) be the frequency vector in the generalized allocation scheme (0.2). As a consequence of Theorems 3 and 4, the following theorem is formulated.

Theorem 5. A rough integral theorem on the probabilities of large deviations of symmetric decomposable statistics in a generalized allocation scheme.

Let n, N - »oo so that ^ - 7, 0< 7 < оо, существует z1 - корень уравнения М£(,г) = 7, с. в. £(27) имеет положительную дисперсию и максимальный шаг распределения 1, а - некоторая константа, f(x) - действительная функция, а < Mf(^(z1)), существуют Т > 0, R\u003e 0 such that for all | t |<Т,0 < z < R,

00 oo, u \u003d 0 there exist ta \\

E vVi / ("01 ta) \u003d b where f (v) p" (za, ta) \u003d a, 1 / \u003d 0

Then for any sequence ad converging to a,

Jim - - InF "(- £ f (h„)\u003e aN) \u003d J (p (za, ta), p (z7))

00 7 In 2a + taa - In £ p ^ / e ^ M i / \u003d 0

This theorem was first proved by A. F. Ronzhin in / 38 / using the saddle method.

In the second section of the second chapter, we study the probabilities of large deviations of separable statistics in the generalized cxj ^ iax allocation in the case of failure of the Cramer condition for the random variable f (€ (z)). Cramer's condition for the random variable f (£ (z)) is not satisfied, in particular, if £ (z) is a Poisson random variable and f (x) is x2. Note that Cramer's condition for the separable statistics themselves in generalized allocation schemes is always satisfied, since for any fixed n, N the number of possible outcomes in these schemes is finite.

As noted in / 2 /, if Cramer's condition is not satisfied, then to find the asymptotic behavior of the probabilities of large deviations of sums of identically distributed random variables, additional fulfillment is required. f

V and. ... I conditions for correct change in the distribution of the term. In work j

О, 5 we consider the case corresponding to the fulfillment of condition (3) in / 2 /, that is, the semi-exponential case. Let P (£ i \u003d k)\u003e 0 for all k \u003d 0,1 ,. and the function p (k) \u003d - \\ nP (^ \u003d k), can be extended to a function of continuous argument - a regularly varying function of order p, 0< р < со /45/, то есть положительной функции такой, что при t -> oo p (tx) str.

Let the function f (x) for sufficiently large values \u200b\u200bof the argument be a positive, strictly increasing, regularly varying order function. We define the function φ (x) by setting for sufficiently large x φ) \u003d p (Γ \\ x)).

On the rest of the numerical axis, ip (x) can be specified in an arbitrary bounded measurable way.

Then s. in. F (£ i) has moments of any order and does not satisfy the Cramér condition, p (x) \u003d o (x) as x -\u003e oo, and the following theorem is true. Theorem 6. Suppose that for sufficiently large x the function ip (x) does not decrease monotonically, Φг ^ the action does not increase monotonically, n, N -\u003e oo so that jj \u003d A, 0< Л < оо; гд - единственный корень уравнения M^i(^) = Л, тогда для любого с > b (z \\), where b (z) \u003d M / (£ i (.z)), there exists the limit CN) \u003d - (c - b (z \\)) 4.

From Theorem b it follows that if Cramér's condition is not satisfied, the limit lim 1 InP (LN (h (n, N))\u003e cN) \u003d 0, ^ ^ iv-too iv, which proves the validity of the conjecture stated in / 39 /. Thus, the value of the index of the goodness-of-fit criterion in generalized allocation schemes, if the Cramer condition is not met, is always zero. In this case, in the class of criteria, when the Cramer condition is satisfied, criteria with a nonzero index value are constructed. Hence, we can conclude that it is asymptotically ineffective to use criteria whose statistics do not satisfy Cramer's condition, for example, the chi-square test in a polynomial scheme, to construct goodness-of-fit tests for testing hypotheses for non-converging alternatives in this sense. A similar conclusion ^ was made in / 54 / based on the results of comparing the chi-square statistics and the maximum likelihood ratio in the polynomial scheme.

The third chapter solves the problem of constructing goodness-of-fit criteria with the highest criterion index value (the highest value of the criterion subscript) for testing hypotheses in generalized placement schemes. Based on the results of the first and second chapters on the properties of the functions of entropy, information distance and probabilities of large deviations in the third chapter, we find a function of the form (0.4) such that the goodness-of-fit criterion built on its basis has the highest value of the exact subscript in the class of criteria under consideration. The following theorem is proved.

Theorem 7. On the existence of an index. Let the conditions of Theorem 3 be satisfied, 0< /3 < 1, Н = Hp(i),Hp(2>,. is a sequence of alternative distributions, a, φ ((3, N) is the maximum number for which, under the hypothesis Нр<ло выполнено неравенство существует предел lim^-оо о>f (P, N) - a. Then at the point (/ 3, H) there is a criterion index φ

3ff, H) \u003d 3 ((φ (x)\u003e a, χ £ \u003cPW).

Shi)<ШН)> where w / fo fh h v ^ l ^

The Conclusion sets out the results obtained in their correlation with the general goal and specific tasks set in the dissertation, formulates conclusions on the results of the dissertation research, indicates the scientific novelty, theoretical and practical value of the work, as well as specific scientific tasks that were identified by the author and the solution of which seems to be relevant ...

A brief review of the literature on the research topic. In the thesis, the problem of constructing goodness-of-fit criteria in generalized allocation schemes with the highest value of the criterion index in the class of functions of the form (0.4) with non-converging alternatives is considered.

Generalized layout schemes were introduced by V.F.Kolchin in / 24 /. The quantities in the polynomial scheme were called the number of cells with r pellets and were studied in detail in the monograph by V. F. Kolchin, B. A. Sevastyanov, V. P. Chistyakov / 27 /. The values \u200b\u200bof fir in generalized allocation schemes were investigated by V.F.Kolchin in / 25 /, / 26 /. Statistics of the form (0.3) were first considered by Yu. I. Medvedev in / 30 / and were called separable (additively separable) statistics. If the functions / „in (0.3) do not depend on and, such statistics were called in / 31 / symmetric separable statistics. The asymptotics of the moments of separable statistics in generalized allocation schemes was obtained by G.I. Ivchenko in / 9 /. Limit theorems for a generalized allocation scheme were also considered in / 23 /. Surveys of the results of limit theorems and goodness-of-fit tests in discrete probabilistic schemes of type (0.2) were given by V. A. Ivanov, G. I. Ivchenko, Yu. I. Medvedev in / 8 / and G. I. Ivchenko, Yu. I. Medvedev , A. F. Ronzhin in / 14 /. Goodness-of-fit criteria for generalized layouts were considered by A. F. Ronzhin in / 38 /.

Comparison of the properties of statistical tests in these works was carried out from the point of view of the relative asymptotic efficiency. We considered the case of converging (continual) hypotheses - efficiency in the sense of Pitman and non-converging hypotheses - efficiency in the sense of Bahadur, Hodges - Lehman and Chernov. Connection between different kinds the relative effectiveness of statistical tests is discussed, for example, in / 49 /. As follows from the results of 10. I. Medvedev in / 31 / on the distribution of separable statistics in a polynomial scheme, the greatest asymptotic power for converging hypotheses in the class of separable statistics from the frequencies of outcomes in the polynomial scheme has a test based on the chi-square statistics. This result was generalized by A.F.Ronzhin for schemes of type (0.2) in / 38 /. II Viktorova and VP Chistyakov in / 4 / constructed an optimal criterion for a polynomial scheme in the class of linear functions of / xr. AF Ronzhin in / 38 / constructed a criterion that, for a sequence of alternatives not approaching the null hypothesis, minimizes the logarithmic rate of convergence of the probability of an error of the first kind to zero in the class of statistics of the form (0.6). Comparison of the relative efficiency of the chi-squared statistics and the maximum likelihood ratio for converging and non-converging hypotheses was carried out in / 54 /.

In the dissertation work, the case of non-converging hypotheses was considered. The study of the relative statistical efficiency of tests for non-converging hypotheses requires the study of probabilities of super-large deviations - of the order of 0 (i / n). For the first time such a problem for a polynomial distribution with a fixed number of outcomes was solved by I. N. Sanov in / 40 /. The asymptotic optimality of goodness-of-fit tests for testing simple and complex hypotheses for a polynomial distribution in the case of a finite number of outcomes with non-converging alternatives was considered in / 48 /. Properties of information distance were previously considered by Kullback, Leibler / 29 /, / 53 / and I. II. Sanov / 40 /, as well as Heffding / 48 /. In these papers, the continuity of the information distance was considered on finite spaces in the Euclidean metric. Nearby the author considered a sequence of spaces with increasing dimension, for example, in the work of Yu. V. Prokhorov / 37 / or in the work of VI Bogachev, A. V. Kolesnikov / 1 /. Rough (up to logarithmic equivalence) theorems on the probabilities of large deviations of separable statistics in generalized allocation schemes under the Cramer condition were obtained by A. F. Ronzhin in / 38 /. A.N. Timashev in / 42 /, / 43 / obtained exact (up to equivalence) multidimensional integral and local limit theorems on the probabilities of large deviations of the vector fir ^ n, N),., Iir. (N, N), where s, r \\,., rs are fixed integers,

ABOUT<П < .

The study of the probabilities of large deviations when the Cramer condition is not satisfied for the case of independent random variables was carried out in the works of A. V. Nagaev / 35 /. The method of conjugate distributions is described by Feller / 45 /.

Statistical problems of hypothesis testing and parameter estimation in a choice scheme without returning in a slightly different setting were considered by G.I. Ivchenko, V.V. Levin, E.E. Timonina / 10 /, / 15 /, where estimation problems for a finite population were solved, when the number of its elements is an unknown quantity, the asymptotic normality of multidimensional S - statistics from s independent samples in a selection scheme without return was proved. The problem of studying random variables associated with repetitions in sequences of independent tests was investigated by A.M. Zubkov, V.G. Mikhailov, A.M.Shoitov in / 6 /, / 7 /, / 32 /, / 33 /, / 34 / ... The analysis of the main statistical problems of estimation and testing of hypotheses in the framework of the general Markov-Poya model was carried out by G. I. Ivchenko, Yu. I. Medvedev in / 13 /, the probabilistic analysis of which was given in / 11 /. A method for specifying non-uniformly probable measures on a set of combinatorial objects, which is not reducible to the generalized allocation scheme (0.2), was described in G. I. Ivchenko, Yu. I. Medvedev / 12 /. A number of problems in the theory of probability, in which the answer can be obtained as a result of calculations and on recurrent formulas, was indicated by A.M. Zubkov in / 5 /.

Inequalities for the entropy of discrete distributions were obtained in / 50 / (cited from the abstract of A.M. Zubkov in RZhMat). If (pn) ^ Lo is a probability distribution, oo

Pn \u003d E Pk, k \u003d m

A \u003d supp ^ Pn + i< оо (0.14) п> 0 and

F (x) \u003d (x + 1) In (x + 1) - x In x, then for the entropy R of this probability distribution

00 i \u003d - 5Z Pk ^ Pk k \u003d 0 the inequalities -L 1 00 00 P

H + (In -f-) £ (Arn - Pn + 1)< F(А) < Я + £ (АРп - P„+i)(ln

Л D п \u003d П -t items 4-1 and inequalities turn into equalities if

Pn \u003d (xf1) n + vn\u003e Q. (0.15)

Note that the extremal distribution (0.15) is a geometric distribution with mathematical expectation A, and the function F (A) of the parameter (0.14) coincides with the function of the mathematical expectation in Theorem 1.

Similar dissertations in the specialty "Probability theory and mathematical statistics", 01.01.05 code VAK

  • Asymptotic efficiency of exponentiality criteria free of the scale parameter 2005, candidate of physical and mathematical sciences Chirina, Anna Vladimirovna

  • Some problems of probability theory and mathematical statistics related to the Laplace distribution 2010, Candidate of Physical and Mathematical Sciences Lyamin, Oleg Olegovich

  • Limit theorems in dense embedding problems and dense series in discrete random sequences 2009, Candidate of Physical and Mathematical Sciences, Mezhennaya, Natalya Mikhailovna

  • Limit theorems for the number of intersections of a strip by trajectories of a random walk 2006, candidate of physical and mathematical sciences Orlova, Nina Gennadievna

  • Optimization of the structure of moment estimates of the accuracy of normal approximation for distributions of sums of independent random variables 2013, Doctor of Physical and Mathematical Sciences Shevtsova, Irina Gennadievna

Thesis conclusion on the topic "Probability theory and mathematical statistics", Kolodzey, Alexander Vladimirovich

3.4. findings

In this chapter, based on the results of the previous chapters, we succeeded in constructing a goodness-of-fit test for testing hypotheses in generalized allocation schemes with the highest logarithmic rate of convergence to zero of the probabilities of errors of the first kind at a fixed probability of errors of the first kind and non-converging alternatives. ~ "

Conclusion

The aim of the thesis was to build goodness-of-fit criteria for testing hypotheses in a selection scheme without returning from an urn containing balls of 2 colors. The author decided to study statistics based on the frequencies of the distances between balls of the same color. In this setting, the problem was reduced to the problem of testing hypotheses in a suitable generalized layout.

The dissertation work included

The properties of entropy and information distance of discrete distributions with an unlimited number of outcomes with a limited mathematical expectation are investigated;

A rough (up to logarithmic equivalence) asymptotics of probabilities of large deviations for a wide class of statistics in a generalized allocation scheme is obtained;

Based on the results obtained, a criterion function with the highest logarithmic rate of tending to zero of the probability of a type I error with a fixed probability of a type II error and non-converging alternatives was constructed;

It is proved that statistics that do not satisfy the Cramer condition have a lower rate of convergence to zero of the probabilities of large deviations compared with statistics that satisfy this condition.

The scientific novelty of the work is as follows.

The concept of a generalized metric is given - a function that admits infinite values \u200b\u200band satisfies the axioms of identity, symmetry and triangle inequality. A generalized metric is found and sets are indicated on which the functions of entropy and information distance, given on a family of discrete distributions with a countable number of outcomes, are continuous in this metric;

In the generalized allocation scheme, a rough (up to logarithmic equivalence) asymptotics is found for the probabilities of large deviations for statistics of the form (0.4) satisfying the corresponding form of the Cramer condition;

In the generalized allocation scheme, a rough (up to logarithmic equivalence) asymptotics is found for the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition;

In the class of criteria of the form (0.7), a criterion with the highest value of the criterion index is constructed.

In this paper, a number of questions about the behavior of the probabilities of large deviations in generalized placement schemes are solved. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences and were used in / 3 /, / 21 / to substantiate the security of one class of information systems.

However, a number of questions remain open. The author limited himself to the consideration of the central zone of variation of the parameters n, N of generalized schemes of distribution of n particles over N cells. If the carrier of the distribution of random variables generating the generalized allocation scheme (0.2) is not a set of the form r, r +1, r + 2,., Then in proving the continuity of the information distance function and studying the probabilities of large deviations, it is necessary to take into account the arithmetic structure of such a carrier that not considered in the author's work. For practical application criteria built on the basis of the proposed function with the maximum value of the index, it is required to study its distribution both under the null hypothesis and under alternatives, including those approaching. It is also of interest to transfer the developed methods and generalize the results obtained to other probabilistic schemes that differ from generalized placement schemes.

If are the frequencies of the distances between the numbers of the outcome 0 in the binomial scheme with the probabilities of outcomes po\u003e 1 - Po, then it can be shown that in this case

PL \u003d kh.t fin \u003d kn) \u003d I (± iki \u003d n) (kl + -, (3.3) v \u003d \\ K \\ \\. Kn \\ where

O * \u003d Po ~ 1 (1 ~ Po), v \u003d

From the analysis of the formula for the joint distribution of the quantities μr in the generalized distribution scheme, proved in / 26 /, it follows that the distribution (3.3), generally speaking, cannot be represented in the general case as the joint distribution of the quantities μr in any generalized distribution scheme of particles by cells. This distribution is a special case of distributions on the set of combinatorial objects introduced in / 12 /. It seems to be an urgent task to transfer the results of the dissertation work for generalized layout schemes to this case, which was discussed in / 52 /.

If the number of outcomes in a selection scheme without return or in a polynomial allocation scheme is more than two, then the joint frequency distribution of distances between adjacent identical outcomes can no longer be represented in such a simple way. So far it is possible to calculate only the mathematical expectation and variance of the number of such distances / 51 /.

List of dissertation research literature candidate of Physical and Mathematical Sciences Kolodzey, Alexander Vladimirovich, 2006

1. Bogachev V. I., Kolesnikov A. V. Nonlinear transformations of convex measures and entropy of Radon-Nikodym densities // Doklady Akademii Nauk. - 2004 .-- T. 207 .-- 2. - S. 155 - 159.

2. Vidyakin VV, Kolodzei AV Statistical detection of hidden channels in data transmission networks // Abstracts. report II Int. conf. "Information systems and technologies IST" 2004 "(Minsk, 8-10 October 2004) Minsk: BSU, 2004. - Part 1. - P. 116 - 117.

3. Viktorova II, Chistyakov VP Some generalizations of the criterion of empty boxes // Theory Probab. and its application. - 1966 .-- T. XI. - 2.S. 306-313.

4. Zubkov AM Recurrent formulas for calculating functionals of discrete random variables // Review of Prikl. and industry. mat. 1996. - T. 3. - 4. - S. 567 - 573.

5. G. Zubkov A. M., Mikhailov V. G. Limit distributions of random variables associated with long repetitions in a sequence of independent trials // Theory Probab. and its application. - 1974 .-- T. XIX. 1. - S. 173 - 181.

6. Zubkov AM and Mikhailov VG, “On repetitions of s - chains in a sequence of independent variables,” Teor. Probab. and its application. - 1979. T. XXIV. - 2.- S. 267 - 273.

7. Ivanov VA, Ivchenko GI, Medvedev Yu. I. Discrete problems in probability theory // Itogi Nauki i Tekhniki. Ser. theory probable., matem. stat., theory. cybern. T. 23 .-- M .: VINITI, 1984.S. 3-60.

8. Ivchenko GI, “On the moments of separable statistics in a generalized allocation scheme,” Mat. notes. 1986. - T. 39. - 2. - S. 284 - 293.

9. Ivchenko GI and Levin VV, “Asymptotic normality in a selection scheme without return,” Teor. Probab. and its applied. - 1978.- T. XXIII. 1. - S. 97 - 108.

10. Ivchenko GI, Medvedev Yu. I. About the Markov-Poya urn scheme: from 1917 to the present day // Review of Prikl. and industry. mat. - 1996.- T. 3. 4.- S. 484-511.

11. Ivchenko GI, Medvedev Yu. I. Random combinatorial objects // Reports of the Academy of Sciences. 2004 .-- T. 396 .-- 2. - S. 151 - 154.

12. Ivchenko GI and Medvedev Yu. I. Statistical problems associated with the organization of control over the generation of discrete random sequences // Diskretn. mat. - 2000 .-- T. 12. - 2.P. 3 - 24.

13. Ivchenko GI, Medvedev Yu. I., Ronzhin AF Separable statistics and goodness-of-fit criteria for polynomial samples // Trudy Mat. Institute of the USSR Academy of Sciences. 1986 .-- T. 177 .-- S. 60 - 74.

14. Ivchenko GI and Timonina EE, “On estimation when choosing from a finite set,” Mat. notes. - 1980 .-- T. 28 .-- 4. - S. 623 - 633.

15. Kolodzei AV, “A theorem on the probabilities of large deviations for separable statistics that do not satisfy the Cramer condition,” Diskretn. mat. 2005. - T. 17. - 2. - S. 87 - 94.

16. Kolodzei AV, “Entropy of discrete distributions and probabilities of large deviations of functions from filling cells in generalized allocation schemes,” Obozreniye Prikl. and industry. mat. - 2005. - T. 12. 2. - S. 248 - 252.

17. Kolodzey A. V. Statistical criteria for detecting covert channels based on changing the order of messages // Research work "Apology": Report / FSTEC RF, Head A. V. Knyazev. Inv. 7 chipboard. - M., 2004 .-- S. 96 - 128.

18. Kolodzei A. V., Ronzhin A. F On some statistics related to checking the homogeneity of random discrete sequences // Research work "Development of mathematical problems of cryptography" N 4 2004 .: Report / AK RF, - M., 2004 ...

19. Kolchin AV, “Limit theorems for a generalized allocation scheme,” Diskretn. mat. 2003 .-- T. 15. - 4. - S. 148 - 157.

20. Kolchin VF, “One class of limit theorems for conditional distributions,” Lit. mat. Sat. - 1968. - T. 8. - 1. - S. 111 - 126.

21. Kolchin V. F. Random graphs. 2nd ed. - M .: FIZMATLIT, 2004 .-- 256s.

22. Kolchin V. F. Random mappings. - M .: Nauka, 1984 .-- 208p.

23. Kolchin V. F., Sevast'yanov B. A., Chistyakov V. P. Random placements. Moscow: Nauka, 1976 .-- 223p.

24. Kramer G. // Uspekhi Mat. science. - 1944. - you. 10. - S. 166 - 178.

25. Kulback S. Information theory and statistics. - M .: Nauka, 1967 .-- 408p.

26. Yu. I. Medvedev, “Some theorems on the asymptotic distribution of the chi-square statistic,” Dokl. Academy of Sciences of the USSR. - 1970 .-- T. 192.5 .-- S. 997 - 989.

27. Medvedev Yu. I. Separable statistics in the polynomial scheme I; II. // Theory probable. and her nrimen. - 1977. - T. 22. - 1. - S. 3 - 17; 1977.Vol. 22 .-- 3. - S. 623 - 631.

28. Mikhailov VG Limit distributions of random variables associated with multiple long repetitions in a sequence of independent tests // Theory Probab. and its application. - 1974.Vol. 19. - 1. - S. 182 - 187.

29. Mikhailov VG Central limit theorem for the number of incomplete long repetitions // Theory Probab. and its application. - 1975. - T. 20. 4. - S. 880 - 884.

30. Mikhailov VG, Shoitov AM Structural equivalence of s - chains in random discrete sequences // Diskretn. mat. 2003. - T. 15, - 4. - S. 7 - 34.

31. Nagaev A.V. Integral limit theorems taking into account the probabilities of large deviations. I. // Theory Probab. and its applied. -1969. T. 14. 1. - S. 51 - 63.

32. Petrov VV Sums of independent random variables. - M .: Nauka, 1972.416p.

33. Prokhorov Yu. V. Limit theorems for sums of random vectors whose dimension tends to infinity // Theory Probab. and its application. 1990 .-- T. 35 .-- 4. - S. 751 - 753.

34. Ronzhin A.F. Criteria for generalized particle placement schemes // Theory Probab. and its application. - 1988 .-- T. 33 .-- 1. - S. 94 - 104.

35. Ronzhin A.F. A theorem on the probabilities of large deviations for separable statistics and its statistical application // Mat. notes. 1984 .-- T. 36 .-- 4. - S. 610 - 615.

36. Sanov IN, “On the probabilities of large deviations of random variables,” Mat. Sat. 1957 .-- T. 42 .-- 1 (84). - S. I - 44.

37. Seneta E. Correctly changing functions. Moscow: Nauka, 1985 .-- 144p.

38. Timashev AN, “A multidimensional integral theorem on large deviations in an equiprobable allocation scheme,” Diskreta, Mat. - 1992. T. 4. - 4. - S. 74 - 81.

39. Timashev, AN, A multidimensional local large deviation theorem in an equiprobable allocation scheme, Diskretn. mat. - 1990.Vol. 2. - 2. - S. 143 - 149.

40. Fedoryuk M.V. Pass method. Moscow: Nauka, 1977.368s.

41. Feller V. Introduction to the theory of probability and its applications. T. 2. - M .: Mir, 1984.738s.

42. Shannon K. Mathematical theory of communication // Works on information theory and cybernetics: Per. from English / M., IL, 1963, p. 243 - 332.

43. Conrad K. Probability Distribution and Maximum Entropy // http://www.math.uconn.edu/~kconrad/blurbs/entropypost.pdf

44. Hoeffding W. Asymptotically optimal tests for multinomial distribution // Ann. Math. Statist. 1965. - T. 36. - C. 369 - 408.

45. Inglot T ,. Rallenberg W. C. M., Ledwina T. Vanishing shortcoming and asymptotic relative efficiency // Ann. Statist. - 2000. - T. 28. - C. 215 238.

46. \u200b\u200bJurdas C., Pecaric J., Roki R., Sarapa N., On an inequality for theentropy of probability distribution // Math. Inequal. and Appl. - 2001. T. 4. - 2. - C. 209 - 214. (RZhMat. - 2005. - 05.07-13B.16).

47. Kolodzey A. V., Ronzhin A. F., Goodness of Fit Tests for Random Combinatoric Objects // Tez. report int. conf. Modern Problems and new Trends in Probability Theory, (Chernivtsi, 19 - 26 June 2005) - Kiev: Institute of Mathematics, 2005. Part 1.P. 122.

48. Kullback S. and Leibler R. A. On information and sufficiency // Ann. Math. Statist. 1951. - T. 22. - P. 79 - 86.

49. Quine M. P., Robinson J. Efficience of chi-square and likelihood ratio goodness of fit tests // Ann. Statist. 1985. - T. 13. - 2. - C. 727 -742.

Please note that the above scientific texts are posted for review and obtained by means of recognition of original dissertation texts (OCR). In this connection, they may contain errors related to imperfection of recognition algorithms. There are no such errors in the PDF files of dissertations and abstracts that we deliver.

Definition... The direction defined by a nonzero vector is called asymptotic direction relative to the second-order line if any the straight line of this direction (that is, parallel to the vector) either has at most one common point with the line, or is contained in this line.

? How many common points can a second-order line and a straight line of asymptotic direction have with respect to this line?

In the general theory of second-order lines, it is proved that if

Then a nonzero vector (specifies the asymptotic direction relative to the line

(general criterion for asymptotic direction).

For lines of the second order

if, then there are no asymptotic directions,

if then there are two asymptotic directions,

if then there is only one asymptotic direction.

The following lemma turns out to be useful ( criterion for the asymptotic direction of a parabolic line).

Lemma ... Let be a line of parabolic type.

A nonzero vector has an asymptotic direction

relatively . (5)

(Task. Prove the lemma.)

Definition... The straight line of asymptotic direction is called asymptote lines of the second order, if this line either does not intersect with, or is contained in it.

Theorem ... If it has an asymptotic direction with respect to, then the asymptote parallel to the vector is determined by the equation

We fill in the table.

TASKS.

1. Find vectors of asymptotic directions for the following lines of the second order:

4 - hyperbolic type, two asymptotic directions.

We will use the asymptotic direction criterion:

Has an asymptotic direction relative to this line 4.

If \u003d 0, then \u003d 0, that is, zero. Then Divide by We get a quadratic equation: , where t \u003d. We solve this quadratic equation and find two solutions: t \u003d 4 and t \u003d 1. Then the asymptotic directions of the line .

(There are two ways to consider, since the line is parabolic.)

2. Find out if the coordinate axes have asymptotic directions relative to the lines of the second order:

3. Write the general equation of the second order line for which

a) the abscissa axis has an asymptotic direction;

b) Both coordinate axes have asymptotic directions;

c) the coordinate axes have asymptotic directions and O is the center of the line.

4. Write the equations of the asymptotes for the lines:

a) ng w: val \u003d "EN-US" /\u003ey=0"> ;

5. Prove that if a second-order line has two non-parallel asymptotes, then their intersection point is the center of this line.

Note: Since there are two non-parallel asymptotes, then there are two asymptotic directions, then, and, therefore, the line is central.

Write down the equations of the asymptotes in general form and the system for finding the center. Everything is obvious.

6. (№920) Write the equation of a hyperbola passing through point A (0, -5) and having asymptotes x - 1 \u003d 0 and 2x - y + 1 \u003d 0.

Indication... Use the statement from the previous task.

Homework... , No. 915 (c, d, f), No. 916 (c, d, e), No. 920 (if you didn't have time);

Cheat sheets;

Silaev, Tymoshenko. Practical exercises in geometry,

1 semester. P.67, questions 1-8, p.70, questions 1-3 (orally).

DIAMETERS OF THE SECOND ORDER LINE.

CONNECTED DIAMETERS.

An affine coordinate system is given.

Definition. Diameter a line of the second order conjugate to a vector of non-asymptotic direction with respect to is the set of midpoints of all chords of the line parallel to the vector.

At the lecture, it was proved that the diameter is a straight line and its equation was obtained

Recommendations: Show (on an ellipse) how it is being constructed (we set a non-asymptotic direction; draw [two] straight lines of this direction, intersecting the line; find the midpoints of the cut off chords; draw a straight line through the midpoints - this is the diameter).

Discuss:

1. Why is the vector of non-asymptotic direction taken in determining the diameter? If they cannot answer, then ask to build a diameter, for example, for a parabola.

2. Does any second-order line have at least one diameter? Why?

3. The lecture proved that the diameter is a straight line. The midpoint of which chord is point M in the figure?


4. Look at the parentheses in equation (7). What do they resemble?

Conclusion: 1) each center belongs to each diameter;

2) if there is a straight line of centers, then there is a single diameter.

5. What direction do the diameters of the parabolic line have? (Asymptotic)

Proof (probably in a lecture).

Let the diameter d given by equation (7`) be conjugate to a vector of non-asymptotic direction. Then its direction vector

(-(), ). Let us show that this vector has an asymptotic direction. Let us use the criterion for the asymptotic direction vector for a parabolic line (see (5)). Substitute and make sure (don't forget that.

6. How many diameters does a parabola have? Their mutual arrangement? How many diameters do the rest of the parabolic lines have? Why?

7. How to construct the total diameter of some pairs of lines of the second order (see questions 30, 31 below).

8. We fill in the table, be sure to draw pictures.

one. . Write an equation for the set of midpoints of all chords parallel to a vector

2. Write the equation for the diameter d passing through the point K (1, -2) for the line.

Solution steps:

1st way.

1. Determine the type (to know how the diameters of this line behave).

In this case, the line is central, then all diameters pass through the center of C.

2. We form the equation of a straight line passing through two points K and C. This is the required diameter.

2nd way.

1. We write the equation of the diameter d in the form (7`).

2. Substituting the coordinates of the point K into this equation, we find the relationship between the coordinates of the vector conjugate to the diameter d.

3. We set this vector, taking into account the found dependence, and compose the equation for the diameter d.

In this problem, it is easier to calculate in the second way.

3.. Write the equation for the diameter parallel to the abscissa axis.

4. Find the midpoint of the chord to be cut off by the line

on the line x + 3y - 12 \u003d 0.

Indication to the solution: Of course, you can find the intersection points of the line and line data, and then the midpoint of the resulting segment. The desire to do so disappears if we take, for example, a straight line with the equation x + 3y - 2009 \u003d 0.

Exact Tests provides two additional methods for calculating significance levels for the statistics available through the Crosstabs and Nonparametric Tests procedures. These methods, the exact and Monte Carlo methods, provide a means for obtaining accurate results when your data fail to meet any of the underlying assumptions necessary for reliable results using the standard asymptotic method. Available only if you have purchased the Exact Tests Options.

Example. Asymptotic results obtained from small datasets or sparse or unbalanced tables can be misleading. Exact tests enable you to obtain an accurate significance level without relying on assumptions that might not be met by your data. For example, results of an entrance exam for 20 fire fighters in a small township show that all five white applicants received a pass result, whereas the results for Black, Asian and Hispanic applicants are mixed. A Pearson chi-square testing the null hypothesis that results are independent of race produces an asymptotic significance level of 0.07. This result leads to the conclusion that exam results are independent of the race of the examinee. However, because the data contain only 20 cases and the cells have expected frequencies of less than 5, this result is not trustworthy. The exact significance of the Pearson chi-square is 0.04, which leads to the opposite conclusion. Based on the exact significance, you would conclude that exam results and race of the examinee are related. This demonstrates the importance of obtaining exact results when the assumptions of the asymptotic method cannot be met. The exact significance is always reliable, regardless of the size, distribution, sparseness, or balance of the data.

Statistics.Asymptotic significance. Monte Carlo approximation with confidence level, or exact significance.

  • Asymptotic. The significance level based on the asymptotic distribution of a test statistic. Typically, a value of less than 0.05 is considered significant. The asymptotic significance is based on the assumption that the data set is large. If the data set is small or poorly distributed, this may not be a good indication of significance.
  • Monte Carlo Estimate. An unbiased estimate of the exact significance level, calculated by repeatedly sampling from a reference set of tables with the same dimensions and row and column margins as the observed table. The Monte Carlo method allows you to estimate exact significance without relying on the assumptions required for the asymptotic method. This method is most useful when the data set is too large to compute exact significance, but the data do not meet the assumptions of the asymptotic method.
  • Exact. The probability of the observed outcome or an outcome more extreme is calculated exactly. Typically, a significance level less than 0.05 is considered significant, indicating that there is some relationship between the row and column variables.

Close