As noted in the previous section, the study of classical algorithms in many cases can be carried out using asymptotic methods of mathematical statistics, in particular, using CLT and convergence inheritance methods. The separation of classical mathematical statistics from the needs of applied research manifested itself, in particular, in the fact that popular monographs lack the mathematical apparatus necessary, in particular, for the study of two-sample statistics. The bottom line is that you have to go to the limit not by one parameter, but by two - the volumes of two samples. I had to develop an appropriate theory - the theory of inheritance of convergence, set out in our monograph.

However, the results of such a study will have to be applied with finite sample sizes. There is a whole bunch of problems associated with such a transition. Some of them were discussed in connection with the study of the properties of statistics constructed from samples from specific distributions.

However, when discussing the influence of deviations from initial assumptions on the properties of statistical procedures, additional problems arise. What deviations are considered typical? Should one focus on the most "harmful" deviations that distort the properties of algorithms to the greatest extent, or should one focus on "typical" deviations?

With the first approach, we get a guaranteed result, but the "price" of this result may be unnecessarily high. As an example, we point to the universal Berry-Esseen inequality for the error in the CLT. Quite rightly emphasizes A.A. Borovkov that "the rate of convergence in real problems, as a rule, turns out to be better."

In the second approach, the question arises which deviations are considered "typical". You can try to answer this question by analyzing large arrays of real data. It is quite natural that the answers of different research groups will differ, as can be seen, for example, from the results presented in the article.

One of the false ideas is the use in the analysis of possible deviations of only any specific parametric family - the Weibull-Gnedenko distributions, the three-parameter family of gamma distributions, etc. Back in 1927, acad. USSR Academy of Sciences S.N. Bernstein discussed the methodological error of reducing all empirical distributions to a four-parameter Pearson family. However, parametric methods of statistics are still very popular, especially among applied scientists, and the fault for this misconception lies primarily with teachers of statistical methods (see below, as well as the article).

15. Choosing one of many criteria to test a particular hypothesis

In many cases, many methods have been developed to solve a specific practical problem, and a specialist in mathematical research methods faces a problem: which one should be offered to an applied person for analyzing specific data?

As an example, consider the problem of checking the homogeneity of two independent samples. As you know, for its solution, you can offer a lot of criteria: Student, Cramer-Welch, Lord, chi-square, Wilcoxon (Mann-Whitney), Van - der - Waerden, Savage, N.V. Smirnov, such as omega-square (Lehmann -Rosenblatt), G.V. Martynova and others. Which one to choose?

The idea of ​​"voting" naturally comes to mind: to test by many criteria, and then decide "by a majority of votes". From the point of view of statistical theory, such a procedure simply leads to the construction of another criterion, which is a priori no better than the previous ones, but is more difficult to study. On the other hand, if the solutions are the same for all considered statistical criteria based on different principles, then, in accordance with the concept of stability, this increases the confidence in the overall solution obtained.

There is a widespread, especially among mathematicians, false and harmful opinion about the need to search for optimal methods, solutions, etc. The fact is that optimality usually disappears when there is a deviation from the initial assumptions. Thus, the arithmetic mean as an estimate of the mathematical expectation is optimal only when the original distribution is normal, while a consistent estimate is always, if only the mathematical expectation exists. On the other hand, for any arbitrary method of estimation or testing of hypotheses, one can usually formulate the concept of optimality in such a way that the method under consideration becomes optimal - from this specially chosen point of view. Take, for example, the sample median as an estimate of the mathematical expectation. It is, of course, optimal, although in a different sense than the arithmetic mean (optimal for a normal distribution). Namely, for the Laplace distribution, the sample median is the maximum likelihood estimate, and therefore optimal (in the sense specified in the monograph).

The homogeneity criteria have been analyzed in a monograph. There are several natural approaches to comparing criteria - based on the asymptotic relative efficiency according to Bahadur, Hodges-Lehman, Pitman. And it turned out that each criterion is optimal with the corresponding alternative or a suitable distribution on the set of alternatives. At the same time, mathematical calculations usually use the shift alternative, which is relatively rare in the practice of analyzing real statistical data (in connection with the Wilcoxon criterion, this alternative was discussed and criticized by us in ). The result is sad - the brilliant mathematical technique demonstrated in , does not allow us to give recommendations for choosing a test for homogeneity when analyzing real data. In other words, from the point of view of the application worker, i.e. analysis of specific data, the monograph is useless. Brilliant mastery of mathematics and great diligence demonstrated by the author of this monograph, alas, brought nothing to practice.

Of course, every practically working statistician in one way or another solves for himself the problem of choosing a statistical criterion. Based on a number of methodological considerations, we opted for the omega-square type criterion (Lehmann-Rosenblatt) that is consistent against any alternative. However, there is a feeling of dissatisfaction due to the insufficient validity of this choice.

EFFICIENCY ASYMPTOTIC CRITERION

The concept that allows to carry out in the case of large samples quantitative two different statistical. criteria used to test the false and the same statistic. hypotheses. The need to measure the effectiveness of criteria arose in the 1930s and 1940s, when simple, from the point of view of calculations, but ineffective criteria appeared.

Mathematical encyclopedia. - M.: Soviet Encyclopedia. I. M. Vinogradov. 1977-1985.

See what "EFFICIENCY OF ASYMPTOTIC CRITERION" is in other dictionaries:

    Correlation coefficient- (Correlation coefficient) The correlation coefficient is a statistical indicator of the dependence of two random variables Definition of the correlation coefficient, types of correlation coefficients, properties of the correlation coefficient, calculation and application ... ... Encyclopedia of the investor

    Mathematical methods. statistics that do not require knowledge of the functional form of general distributions. The name non-parametric methods emphasizes their difference from classical parametric methods, in which it is assumed that the general ... ... Mathematical Encyclopedia

    The process of presenting information in a certain standard form and the reverse process of restoring information from such a representation. In mathematical literature coding called. mapping of an arbitrary set Av a set of finite ... ... Mathematical Encyclopedia

1 Entropy and information distance

1.1 Basic definitions and notation.

1.2 Entropy of discrete distributions with limited expectation.

1.3 Logarithmic generalized metric on the set of discrete distributions.

1.4 Compactness of functions of a countable set of arguments

1.5 Continuity of Kullback-Leibler-Sanov information distance

1.6 Conclusions.

2 Large deviation probabilities

2.1 Probabilities of large deviations of functions from the number of cells with a given filling.

2.1.1 Local limit theorem.

2.1.2 Integral limit theorem.

2.1.3 Information distance and probabilities of large deviations of separable statistics

2.2 Large deviation probabilities of separable statistics that do not satisfy the Cramer condition.

2.3 Conclusions.

3 Asymptotic properties of goodness-of-fit tests

3.1 Goodwill criteria for a no-return selection scheme

3.2 Asymptotic relative efficiency of goodness-of-fit tests.

3.3 Criteria based on the number of cells in generalized layouts.

3.4 Conclusions.

Recommended list of dissertations

  • Asymptotic efficiency of goodness-of-fit tests based on characterization properties of distributions 2011, Candidate of Physical and Mathematical Sciences Volkova, Ksenia Yurievna

  • Large deviations and limit theorems for some random walk functionals 2011, candidate of physical and mathematical sciences Shklyaev, Alexander Viktorovich

  • Limit theorems and large deviations for random walk increments 2004, Candidate of Physical and Mathematical Sciences Kozlov, Andrey Mikhailovich

  • On the rate of convergence of statistics of goodness-of-fit tests with power-law measures of divergence to a chi-square distribution 2010, candidate of physical and mathematical sciences Zubov, Vasily Nikolaevich

  • Probabilities of large deviations of space-asymptotically homogeneous ergodic Markov chains 2004, Doctor of Physical and Mathematical Sciences Korshunov, Dmitry Alekseevich

Introduction to the thesis (part of the abstract) on the topic "Asymptotic properties of goodness-of-fit criteria for testing hypotheses in a selection scheme without replacement, based on the filling of cells in a generalized allocation scheme"

The object of research and the relevance of the topic. In the theory of statistical analysis of discrete sequences, a special place is occupied by goodness-of-fit tests for testing the possibly complex null hypothesis, which is that for a random sequence such that

Xi e hi,i = 1, ,n, where hi = (0,1,. ,M), for any i = 1,., n, and for any k £ 1m the probability of the event

Xi = k) does not depend on r. This means that the sequence is in some sense stationary.

In a number of applied problems, the sequence (Xr-)™=1 is considered to be the sequence of colors of balls when choosing without returning to exhaustion from an urn containing u - 1 > 0 balls of color k, k € 1m- 1, .,pm - 1). Let the urn contain n - 1 balls, m k=0

Denote by r(k) (fc) Jk) rw - Г! , . . . , the sequence of numbers of balls of color A; in the sample. Consider the sequence where k)

Kk-p-GPk1.

The sequence h^ is defined using the distances between places of adjacent balls of color k in such a way that

Pk Kf \u003d p. 1> \u003d 1

The set of sequences h(fc) for all k £ 1m uniquely determines the sequence The sequences hk for different k are mutually dependent. In particular, any of them is uniquely determined by all the others. If the cardinality of the set 1m is equal to 2, then the sequence of colors of the balls is uniquely determined by the sequence of distances between the places of adjacent balls of the same fixed color. Let an urn containing n - 1 balls of two different colors contain N - 1 balls of color 0. One can establish a one-to-one correspondence between the set ffl(N - l,n - N) and the set 9\n,N of vectors h(n, N ) = (hi,., hjf) with positive integer components such that K = P. (0.1)

The set 9p)dz corresponds to the set of all different partitions of the whole positive number n into N ordered terms.

Having given some probability distribution on the set of vectors £Hn,dz, we obtain the corresponding probability distribution on the set Wl(N - 1,n - N). The set is a subset of the set of vectors with non-negative integer components satisfying (0.1). As probability distributions on a set of vectors in the dissertation work, distributions of the form

P(t,N) = (n,.,rN)) = P(tn = ru,v = l,.,N\jr^ = n), (0.2) where. , £dz - independent non-negative integer random variables.

Distributions of the form (0.2) in /24/ are called generalized schemes for placing n particles in N cells. In particular, if the random variables £b. , £n in (0.2) are distributed according to the Poisson laws with the parameters Ai,., λ, respectively, then the vector h(n,N) has a polynomial distribution with the probabilities of outcomes

Ri = . , A" ,V = \,.,N.

L\ + . . . +AN

If the random variables £b >&v in (0-2) are equally distributed according to the geometric law where p is any in the interval 0< р < 1, то, как отмечено в /25/,/26/, получающаяся обобщенная схема размещения соответствует равномерному распределению на множестве В силу взаимнооднозначного соответствия между множеством dft(N - 1 ,п - N) и множеством tRn,N получаем равномерное распределение на множестве выборов без возвращения. При этом, вектору расстояний между местами шаров одного цвета взаимно однозначно соответствует вектор частот в обобщенной схеме размещения, и, соответственно, числу расстояний длины г - число ячеек, содержащих ровно г частиц. Для проверки по единственной последовательности гипотезы о том, что она получена как результат выбора без возвращения, и каждая такая выборка имеет одну и ту же вероятность можно проверить гипотезу о том, что вектор расстояний между местами шаров цвета 0 распределен как вектор частот в соответствующей обобщенной схеме размещения п частиц по N ячейкам.

As noted in /14/,/38/, a special place in testing hypotheses about the distribution of frequency vectors h(n, N) = (hi,., /rz) in generalized schemes for placing n particles in N cells is occupied by criteria built on based on statistics of the form 1 m(N -l,n-N)\ N

LN(h(n,N))=Zfv(hv)

Fn \u003d F (-T7, flQ Hi II-

0.4) where fu, v = 1,2,. and φ are some real-valued functions, N

Mr = E = r), r = 0.1,. 1/=1

The values ​​in /27/ were called the number of cells containing exactly r particles.

Statistics of the form (0.3) in /30/ are called separable (additively separable) statistics. If the functions /n in (0.3) do not depend on u, then such statistics were called in /31/ symmetric separable statistics.

For any r, the statistic /xr is a symmetric separable statistic. From equality

E DM = E DFg (0.5) it follows that the class of symmetric separable statistics in hv coincides with the class linear functions from fir. Moreover, the class of functions of the form (0.4) is wider than the class of symmetric separable statistics.

But = (#o(n, N)) is a sequence of simple null hypotheses that the distribution of the vector h(n,N) is (0.2), where random variables,. in (0.2) are identically distributed and k) = pk,k = 0,1,2,., the parameters n and N vary in the central region.

Let us consider some Р £ (0,1) and a sequence of, generally speaking, complex alternatives

H \u003d (H (n, N)) such that there is - maximum number, for which, for any simple hypothesis H\ ∈ H(n, N), the inequality

РШ > an,N(P)) > Р

We will reject the hypothesis Hq(ti,N) if fm > awm((3). If there exists a limit

Wn ~1nP(0lg > an,N(P))=u(p,Н), where the probability for each N is calculated under the hypothesis Нц(п, N), then the value of ^(/З, Н) is named in /38/ criterion index φ at the point (j3, H). The last limit may, generally speaking, not exist. Therefore, in the dissertation work, in addition to the criterion index, the value

Ish (~1pR(fm > al(/?)))

JV->oo N-oo mean respectively the lower and upper limits of the sequence (odr) when N -> oo,

If a criterion index exists, then the subscript of the criterion matches it. The subscript of the criterion always exists. The greater the value of the criterion index (lower index of the criterion), the better the statistical criterion in the considered sense. In /38/ the problem of constructing goodness-of-fit criteria for generalized layouts with highest value index of the criterion in the class of criteria that reject the hypothesis Ho(n,N) at /MO Ml Mtch HF iV" iV""""" ~yv" " ^ " where m > 0 is some fixed number, the sequence of constants from a given value of the power of the criterion for a sequence of alternatives, ft is a real function of m + 1 arguments.

The criteria indices are determined by the probabilities of large deviations. As was shown in /38/, the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of separable statistics under the Cramer condition for the random variable f(t) is determined by the corresponding Kullback-Leibler-Sanov information distance (the random variable rj satisfies the condition Cramer, if for some λ > 0 the moment generating function Metr] is finite in the interval \t\< Н /28/).

The question of the probabilities of large deviations of statistics from an unlimited number of fir, as well as arbitrary separable statistics that do not satisfy the Cramer condition, remained open. This did not make it possible to finally solve the problem of constructing criteria for testing hypotheses in generalized allocation schemes with the highest rate of convergence to zero of the probability of error of the first kind with approaching alternatives in the class of criteria based on statistics of the form (0.4). The relevance of the dissertation research is determined by the need to complete the solution of this problem.

The purpose of the dissertation work is to construct goodness-of-fit criteria with the highest value of the criterion index (lower index of the criterion) for testing hypotheses in the selection scheme without recurrence in the class of criteria that reject the hypothesis W(n, N) at $.<>,■ ■)><*. (0-7) где ф - функция от счетного количества аргументов, и параметры п, N изменяются в центральной области.

In accordance with the purpose of the study, the following tasks were set:

Investigate the properties of entropy and Kullback - Leibler - Sanov information distance for discrete distributions with a countable number of outcomes;

Investigate the probabilities of large deviations of statistics of the form (0.4);

Investigate the probabilities of large deviations of symmetric separable statistics (0.3) that do not satisfy the Cramer condition;

Find a statistic such that the goodness-of-fit criterion constructed on its basis for testing hypotheses in generalized allocation schemes has the largest index value in the class of criteria of the form (0.7).

Scientific novelty:

Scientific and practical value. In the paper, a number of questions about the behavior of large deviation probabilities in generalized allocation schemes are solved. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences and were used in /3/, /21/ when justifying the security of one class of information systems. Provisions for defense:

Reduction of the problem of checking, by a single sequence of colors of the balls of the hypothesis, from the fact that this sequence was obtained as a result of a choice without replacement until the exhaustion of balls from the urn containing balls of two colors, and each such choice has the same probability, to the construction of goodness-of-fit criteria for testing hypotheses in the corresponding generalized placement scheme;

Continuity of the functions of entropy and Kullback - Leibler - Sanov information distance on an infinite-dimensional simplex with the introduced logarithmic generalized metric;

A theorem on the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition in the generalized allocation scheme in the seven exionential case;

A theorem on the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations for statistics of the form (0.4);

Construction of a goodness-of-fit criterion for testing hypotheses in generalized layouts with the largest index value in the class of criteria of the form (0.7).

Approbation of work. The results were reported at the seminars of the Department of Discrete Mathematics of the Mathematical Institute. V. A. Steklov RAS, Department of Information Security ITMiVT them. S. A. Lebedev RAS and at:

Fifth All-Russian Symposium on Applied and Industrial Mathematics. Spring session, Kislovodsk, May 2 - 8, 2004;

Sixth International Petrozavodsk Conference "Probabilistic Methods in Discrete Mathematics" June 10 - 16, 2004;

Second International Conference "Information Systems and Technologies (IST"2004)", Minsk, November 8-10, 2004;

International conference "Modern Problems and new Trends in Probability Theory", Chernivtsi, Ukraine, June 19 - 26, 2005.

The main results of the work were used in the research work "Apologia", carried out by ITMiVT RAS. S. A. Lebedev in the interests of the Federal Service for Technical and Export Control of the Russian Federation, and were included in the report on the implementation of the research stage /21/. Separate results of the dissertation were included in the research report "Development of mathematical problems of cryptography" of the Academy of Cryptography of the Russian Federation for 2004 /22/.

The author expresses his deep gratitude to the scientific adviser, Doctor of Physical and Mathematical Sciences Ronzhin A.F. and scientific consultant, Doctor of Physical and Mathematical Sciences, Senior Researcher Knyazev A.V. Mathematical Sciences I. A. Kruglov for the attention shown to the work and a number of valuable remarks.

Structure and content of the work.

The first chapter investigates the properties of entropy and information distance for distributions on the set of non-negative integers.

In the first paragraph of the first chapter, the notation is introduced and the necessary definitions are given. In particular, the following notation is used: x = (xq, x\, . ) is an infinite-dimensional vector with a countable number of components;

H(x) - -Ex^oXvlnx,-, truncm(x) = (x0,x1,.,xm,0.0,.)] f2* = (x, xi > 0, zy = 0.1,. , Oh"< 1}; Q = {х, х, >0,u = 0,1,., o xv = 1); = (x G 0, ££L0 = 7);

Ml = o Ue>1|5 € o< Ml - 7МГ1 < 00}. Понятно, что множество £1 соответствует семейству вероятностных распределений на множестве неотрицательных целых чисел, П7 - семейству вероятностных распределений на множестве неотрицательных целых чисел с математическим ожиданием 7.

If y 6 E Π, then for e > 0 Oe(y) will denote the set

Oe(y) - (x ^< уие£ для всех v = 0,1,.}.

In the second paragraph of the first chapter, we prove a theorem on the boundedness of the entropy of discrete distributions with bounded mathematical expectation.

Theorem 1. On the boundedness of the entropy of discrete distributions with bounded mathematical expectation.

For any f 6 P7

H(x)

If x € fly corresponds to a geometric distribution with mathematical definition 7, that is, 7

1 + 7 then the equality

H(x) = F(<7).

The assertion of the theorem can be viewed as the result of a formal application of the method of conditional Lagrange multipliers in the case of an infinite number of variables. The theorem that the only distribution on the set (k, k + 1, k + 2,.) with a given mathematical expectation and maximum entropy is a geometric distribution with a given mathematical expectation is given (without proof) in /47/. The author, however, gave a rigorous proof.

In the third paragraph of the first chapter, a definition of a generalized metric is given - a metric that admits infinite values.

For x, y ∈ Q, the function p(x, y) is defined as the minimal e > 0 with the property<хи< уиее для всех и = 0,1,. Если такого е не существует, то полагается, что р(х,у) = оо.

It is proved that the function p(x, y) is a generalized metric on the family of distributions on the set of non-negative integers, as well as on the entire set Cl*. Instead of e in the definition of the metric p(x, y), you can use any other positive number other than 1. The resulting metrics will differ by a multiplicative constant. Denote by J(x, y) the information distance

00 £ J(x, y) = E In-.

Here and below it is assumed that 0 In 0 = 0.0 In jj = 0. The information distance is defined for such x, y that xn = 0 for all and such that yi = 0. If this condition is not satisfied, then we will put J(x,ij) = oo. Let L SP. Then we will denote

J (A Y) = |nf J(x, y).

In the fourth section of the first chapter, a definition is given for the compactness of functions defined on the set Q*. The compactness of a function with a countable number of arguments means that, with any degree of accuracy, the value of the function can be approximated by the values ​​of this function at points where only a finite number of arguments are nonzero. The compactness of the entropy and information distance functions is proved.

1. For any 0< 7 < оо функция Н(х) компактна на

2. If for some 0< 70 < оо

P e then for any 0<7<оо,г>0 the function χ) = J(x, p) is compact on

In the fifth paragraph of the first chapter, the properties of the information distance given on an infinite-dimensional space are considered. Compared to the finite-dimensional case, the situation with the continuity of the information distance function changes qualitatively. It is shown that the information distance function is not continuous on the set in any of the metrics

Pl&V) = E\Xu~Y"\, u=0

E (xv - Yi) 2 v \u003d Q

Pz(x, y) = 8Up\xu-yv\. v

The validity of the following inequalities for the functions of entropy H(x) and information distance J(x,p) is proved:

1. For any x, x" € fi

H(x) - H(x")\< - 1){Н{х) + Н{х")).

2. If for some x, p e Π there exists e > 0 such that x 6 0 £(p), then for any x" £ Q J(x, p) - J(x", p)|< (е"М - 1){Н{х) + Н{х") + ееН(р)).

From these inequalities, taking into account Theorem 1, it follows that the functions of entropy and information distance are uniformly continuous on the corresponding subsets of Q in the metric p(x,y)t, namely,

1. For any 7 such that 0< 7 < оо, функция Н(х) равномерно непрерывна на Г2 в метрике р(ж,у);

2. If for some 70, 0< 70 < оо

TO for any 0<7<оои£>0 function

A p(x) = J(x, p) is uniformly continuous on the set Π Oe(p) in the metric p(x, y).

The definition of non-extremality of a function is given. The non-extremality condition means that the function does not have local extrema, or the function takes the same values ​​in local minima (local maxima). The non-extremality condition weakens the requirement that there are no local extrema. For example, the function sin x on the set of real numbers has local extrema, but satisfies the non-extremality condition.

Let for some 7 > 0, the area A is given by the condition

A = (x € VLv4>(x) > a), (0.9) where φ(x) is a real-valued function, a is some real constant, inf φ(x)< а < inf ф(х).

The question was studied under what conditions on the function φ when changing parameters n,N in the central region, ^ -; 7, for all their sufficiently large values ​​there exist non-negative integers ko, k\,., kn such that k0 + ki + . + kn = N, k\ + 2k2. + pkp - N and

F (ko k \ kp

-£,0,0,.)>a.

It is proved that for this it suffices to require that the function φ be non-extremal, compact and continuous in the metric p(x, y), and also that for at least one point x satisfying (0.9) for some e > 0 there exists a finite moment degree 1 + e and xn > 0 for any v = 0.1,.

In the second chapter, we study the rough (up to logarithmic equivalence) asymptotics of the probability of large deviations of functions from D = (^0) ■ ) T"n, 0, .) - the number of cells with a given filling in the central region of the parameters N, n. Rough The asymptotics of the probabilities of large deviations is sufficient for studying the indices of the goodness of fit tests.

Let the random variables ^ in (0.2) be identically distributed and

P(z) - generating function of a random variable - converges in a circle of radius 1< R < оо. Следуя /38/, для 0 < z < R обозначим через £(z) случайную величину такую, что

Ml+£ = £ i1+ex„< 00.

0.10) k] = Pk, k = 0.1,.

Denote

If there is a solution of the equation m Z(z) = ъ, then it is unique /38/. Everywhere below we will assume that pk > 0,A; = 0.1,.

In the first paragraph of the first paragraph of the second chapter, there is an asymptotics of the logarithms of probabilities of the form

npP(/x0 = ko,., cp = kn).

The following theorem is proved.

Theorem 2. A rough local theorem on the probabilities of large deviations. Let n, N -» oo so that jj -> 7,0<7 < оо, существует z7 - корень уравнения M£(z) = 7, с. в. £(г7) имеет положительную дисперсию. Тогда для любого k G Cl(n,N)

lnP(A = k) = JftpK)) + O(^lniV).

The statement of the theorem follows directly from the formula for the joint distribution fii,. fin in /26/ and the following estimate: if non-negative integer values ​​, Нп satisfy the condition

Hi + 2d2 + + PNp = n, then the number of nonzero values ​​among them is 0(l/n). This is a rough estimate that does not claim to be new. The number of non-zero zg in generalized layouts does not exceed the value of the maximum filling of cells, which in the central region with a probability tending to 1 does not exceed the value O(lnn) /25/,/27/. Nevertheless, the resulting estimate 0(y/n) is satisfied with probability 1, and it is sufficient to obtain a rough asymptotics.

In the second paragraph of the first paragraph of the second chapter, the value of the limit is found where adz is a sequence of real numbers converging to some a G R, φ(x) is a real-valued function. The following theorem is proved.

Theorem 3. A rough integral theorem on the probabilities of large deviations. Let the conditions of Theorem 2 be satisfied, for some r > 0, C > 0 the real function φ(x) is compact, uniformly continuous in the metric p on the set

A = 0r+<;(p(z7)) П Ц7+с] и удовлетворяет условию неэкстремальности на множестве fly. Если для некоторой константы а такой, что inf ф(х) < а < sup ф(х). xeily существует вектор ра € fi7 П 0r(p(z7)); такой, что

Ф(ra) > a and j(( (x) >a,xe n7),p(2;7)) = 7(pa,p(*y)) mo for any sequence a^ converging to a,

Jim -vbPW%%,.)>aN) = J(pa,p(2h)). (0.11)

Under additional restrictions on the function φ(x), the information distance J(pa,p(z7)) in (2.3) can be calculated more specifically. Namely, the following theorem is true. Theorem 4. Information distance. Let for some 0< 7 < оо для некоторвх г >0, C > 0, the real function φ(x) and its first-order partial derivatives are compact and uniformly continuous in the generalized metric p(x, y) on the set p G

A = Or(p) n + c] there exist T > 0, R > 0 such that for all \t\<Т,0 < z < R,x е А

E^exp^-f(x))< оо,

0(a;)exp(t-< со, i/=o oxv 0X1/ для некоторого е >0 oo Q pvv1+£zu exp(t-φ(x))< оо, (0.13) и существует единственный вектор x(z,t), удовлетворяющий системе уравнений xv(z, t) = pvzv ехр {Ь-ф(х(г, t))}, v = 0,1,. функция ф(х) удовлетворяет на множестве А условию неэкстремальности, а - некоторая константа, ф(р) < а < sup ф(:x)(z,t),

0

00 vpv(za,ta) = 7, 1/=0

0(p(*aL)) = a, where

Then p(za, ta) € and

J((x e A, f(x) = a), p) = J(p(za, ta), p)

00 d 00 d \u003d l\nza + taYl ir- (x(za,ta)) - In E^z/exp(ta-z- (p(zatta))). j/=0 C^i/ t^=0

If the function φ(x) is a linear function, and the function f(x) is defined using equality (0.5), then condition (0.12) becomes the Cramer condition for the random variable f(ζ(z)). Condition (0.13) is a form of condition (0.10) and is used to prove the presence in domains of the form (x ∈ φ(x) > a) of at least one point from 0(n, N) for all sufficiently large n, N.

Let ^)(n, N) = (hi,., /r) be the frequency vector in the generalized allocation scheme (0.2). As a consequence of Theorems 3 and 4, the following theorem is formulated.

Theorem 5. A rough integral theorem on the probabilities of large deviations of symmetric separable statistics in a generalized allocation scheme.

Let n, N -» oo so that ^ - 7, 0< 7 < оо, существует z1 - корень уравнения М£(,г) = 7, с. в. £(27) имеет положительную дисперсию и максимальный шаг распределения 1, а - некоторая константа, f(x) - действительная функция, а < Mf(^(z1)), существуют Т >0,R > 0 such that for all |t|<Т,0 < z < R,

00 oo, u=0 there are such ta\

E vVi/("01 ta) = b where f(v)p"(za,ta) = a, 1/=0

Then for any sequence adj converging to a,

Jim - - InF"(- £ f(hn) > aN) = J(p(za,ta),p(z7))

00 7 In 2a + taa - In £ p^/e^M i/=0

This theorem was first proved by AF Ronzhin in /38/ using the saddle point method.

In the second section of the second chapter, we study the probabilities of large deviations of separable statistics in generalized cxj^iax arrangements in the case of non-fulfillment of the Cramer condition for the random variable f(€(z)). Cramer's condition for the random variable f(£(z)) is not satisfied, in particular, if £(z) is a Poisson random variable and f(x) - x2. Note that Cramer's condition for the separable statistics themselves in generalized allocation schemes is always satisfied, since for any fixed n, N the number of possible outcomes in these schemes is finite.

As noted in /2/, if Cramer's condition is not satisfied, then to find the asymptotics of the probabilities of large deviations of sums of identically distributed random variables, additional execution is required. f

V i. . I conditions for a correct change on the distribution of the term. In work j

O, 5 the case corresponding to the fulfillment of condition (3) in /2/, that is, the seven-exponential case, is considered. Let P(£i = k) > 0 for all k = 0,1,. and the function p(k) = -\nP(k = k), can be extended to a function of continuous argument - a regularly varying function of order p, 0< р < со /45/, то есть положительной функции такой, что при t ->oo p(tx) xp.

Let the function f(x) for sufficiently large values ​​of the argument be a positive, strictly increasing, regularly varying order function.

On the rest of the real axis, ip(x) can be given in an arbitrary bounded measurable way.

Then s. V. /(£i) has moments of any order and does not satisfy Cramer's condition, p(x) = o(x) as x -> ω, and the following theorem holds. fg^ktion is monotonically non-increasing, n, N -> oo, so that jj - A, 0< Л < оо; гд - единственный корень уравнения M^i(^) = Л, тогда для любого с >b(z\), where b(z) = M/(£i(.z)), there exists the limit CN) = -(c - b(z\))4.

It follows from Theorem b that, if Cramer's condition is not satisfied, the limit lim 1 InP(LN(h(n, N)) > cN) = 0, ^ ^ iv-too iv, which proves the validity of the conjecture stated in /39/. Thus, the value of the index of the goodness-of-fit criterion in generalized schemes of placement and if the Cramer condition is not met is always equal to zero. In this case, in the class of criteria, when the Cramer condition is satisfied, criteria with a non-zero index value are constructed. From this we can conclude that using criteria whose statistics do not satisfy the Cramer condition, for example, the chi-square test in a polynomial scheme, to construct goodness-of-fit tests for testing hypotheses with non-approaching alternatives is asymptotically inefficient in this sense. A similar conclusion was made in /54/ based on the results of comparing the chi-square statistics and the maximum likelihood ratio in a polynomial scheme.

In the third chapter, we solve the problem of constructing goodness-of-fit criteria with the highest value of the criterion index (the largest value of the lower index of the criterion) for testing hypotheses in generalized layouts. Based on the results of the first and second chapters on the properties of entropy functions, information distance, and probabilities of large deviations, in the third chapter, a function of the form (0.4) is found such that the goodness-of-fit criterion built on its basis has the largest value of the exact lower index in the class of criteria under consideration. The following theorem is proved.

Theorem 7. On the existence of an index. Let the conditions of Theorem 3 be satisfied, 0< /3 < 1, Н = Hp(i),Hp(2>,. is a sequence of alternative distributions, a,φ((3, N) is the maximum number for which, under the hypothesis Нр<ло выполнено неравенство существует предел lim^-оо о>φ(P, N) - a. Then at the point (/3, H) there exists an index of the criterion φ

3ff, H) = 3((φ(x) >a, x£ ^.PW).

sh)<ШН)>where w/fo fh h v^l ^

The Conclusion outlines the results obtained in their relationship with the general goal and specific tasks set in the dissertation, formulates conclusions based on the results of the dissertation research, indicates the scientific novelty, theoretical and practical value of the work, as well as specific scientific problems that have been identified by the author and the solution of which seems relevant. .

Short review literature on the research topic. The dissertation work considers the problem of constructing goodness of fit criteria in generalized allocation schemes with the largest value of the criterion index in the class of functions of the form (0.4) with non-approaching alternatives.

Generalized allocation schemes were introduced by VF Kolchin in /24/. The values ​​in the polynomial scheme were called the number of cells with r shots and were studied in detail in the monograph by V. F. Kolchin, B. A. Sevastyanov, V. P. Chistyakov /27/. The values ​​of fir in generalized layouts were studied by VF Kolchin in /25/,/26/. Statistics of the form (0.3) were first considered by Yu. I. Medvedev in /30/ and were called separable (additively separable) statistics. If the functions /„ in (0.3) do not depend on u, such statistics were called in /31/ symmetric separable statistics. The asymptotics of the moments of separable statistics in generalized allocation schemes was obtained by GI Ivchenko in /9/. Limit theorems for a generalized allocation scheme were also considered in /23/. Reviews of the results of limit theorems and goodness of fit in discrete probabilistic schemes of type (0.2) were given by V. A. Ivanov, G. I. Ivchenko, Yu. I. Medvedev in /8/ and G. I. Ivchenko, Yu. I. Medvedev , A.F. Ronzhin in /14/. Goodness-of-fit criteria for generalized layouts were considered by A.F. Ronzhin in /38/.

Comparison of the properties of statistical tests in these works was carried out from the point of view of relative asymptotic efficiency. The case of approaching (contigual) hypotheses - efficiency in the sense of Pitman and non-converging hypotheses - efficiency in the sense of Bahadur, Hodges - Lehman and Chernov were considered. Connection between various types the relative effectiveness of statistical criteria is discussed, for example, in /49/. As follows from the results of 10. I. Medvedev in /31/ on the distribution of separable statistics in a polynomial scheme, the test based on the chi-square statistic has the highest asymptotic power under converging hypotheses in the class of separable statistics on the frequencies of outcomes in a polynomial scheme. This result was generalized by A.F. Ronzhin for schemes of type (0.2) in /38/. II Viktorova and VP Chistyakov in /4/ constructed an optimal criterion for a polynomial scheme in the class of linear functions of /xr. A. F. Ronzhin in /38/ constructed a criterion that, in the case of a sequence of alternatives not approaching the null hypothesis, minimizes the logarithmic rate of the probability of an error of the first kind tending to zero in the class of statistics of the form (0.6). A comparison of the relative performance of the chi-square statistic and the maximum likelihood ratio for converging and non-converging hypotheses was made in /54/.

In the dissertation work, the case of non-approaching hypotheses was considered. The study of the relative statistical efficiency of criteria under nonconverging hypotheses requires the study of the probabilities of superlarge deviations - of the order of 0(i/n). For the first time such a problem for a polynomial distribution with a fixed number of outcomes was solved by IN Sanov in /40/. The asymptotic optimality of goodness-of-fit criteria for testing simple and complex hypotheses for a polynomial distribution in the case of a finite number of outcomes with non-approaching alternatives was considered in /48/. Properties of the information distance were previously considered by Kullback, Leibler /29/,/53/ and I. II. Sanov /40/, as well as Heffding /48/. In these papers, the continuity of the information distance was considered on finite-dimensional spaces in the Euclidean metric. The author also considered a sequence of spaces with increasing dimension, for example, in the work of Yu. V. Prokhorov /37/ or in the work of V. I. Bogachev, A. V. Kolesnikov /1/. Rough (up to logarithmic equivalence) theorems on the probabilities of large deviations of separable statistics in generalized allocation schemes under Cramer's condition were obtained by AF Ronzhin in /38/. A. N. Timashev in /42/,/43/ obtained exact (up to equivalence) multidimensional integral and local limit theorems on the probabilities of large deviations of the vector fir^n, N),., iir.(n,N), where s, r\,., rs are fixed integers,

ABOUT<П < .

The study of the probabilities of large deviations when Cramer's condition is not met for the case of independent random variables was carried out in the works of A. V. Nagaev /35/. The method of conjugate distributions is described by Feller /45/.

Statistical problems of testing hypotheses and estimating parameters in a selection scheme without replacement in a slightly different formulation were considered by G. I. Ivchenko, V. V. Levin, E. E. Timonina /10/, /15/, where estimation problems were solved for a finite population, when the number of its elements is an unknown value, the asymptotic normality of multivariate S-statistics from s independent samples in a selection scheme without replacement was proved. The problem of studying random variables associated with repetitions in sequences of independent trials was studied by A. M. Zubkov, V. G. Mikhailov, A. M. Shoitov in /6/, /7/, /32/, /33/, /34/ . The analysis of the main statistical problems of estimation and testing of hypotheses in the framework of the general Markov-Poya model was carried out by G. I. Ivchenko, Yu. I. Medvedev in /13/, the probabilistic analysis of which was given in /11/. A method for specifying non-equiprobable measures on a set of combinatorial objects that is not reducible to a generalized allocation scheme (0.2) was described in GI Ivchenko, Yu. I. Medvedev /12/. A number of problems in probability theory, in which the answer can be obtained as a result of calculations using recurrent formulas, is indicated by AM Zubkov in /5/.

The inequalities for the entropy of discrete distributions were obtained in /50/ (quoted from an abstract by A. M. Zubkov in RZhMat). If (pn)^Lo is a probability distribution, oo

Pp \u003d E Rk, k \u003d tg

A = supp^Pn+i< оо (0.14) п>0 and

F(x) = (x + 1) In (x + 1) - x In x, then for the entropy R of this probability distribution

00 i \u003d - 5Z Pk ^ Pk k \u003d 0, the inequalities are valid -L 1 00 00 P

I + (In -f-) £ (Arp - Rp + 1)< F(А) < Я + £ (АРп - P„+i)(ln

L D p \u003d P -t p.4-1 and inequalities turn into equalities if

Pn= (xf1)n+vn>Q. (0.15)

Note that the extremal distribution (0.15) is a geometric distribution with the expectation A, and the function F(A) of the parameter (0.14) coincides with the function of the expectation in Theorem 1.

Similar theses in the specialty "Probability Theory and Mathematical Statistics", 01.01.05 code HAC

  • Asymptotic efficiency of scale-free exponentiality criteria 2005, candidate of physical and mathematical sciences Chirina, Anna Vladimirovna

  • Some problems of probability theory and mathematical statistics related to the Laplace distribution 2010, candidate of physical and mathematical sciences Lyamin, Oleg Olegovich

  • Limit theorems in dense embedding and dense series problems in discrete random sequences 2009, Candidate of Physical and Mathematical Sciences Mezhennaya, Natalya Mikhailovna

  • Limit theorems for the number of intersections of a strip by trajectories of a random walk 2006, candidate of physical and mathematical sciences Orlova, Nina Gennadievna

  • Optimization of the Structure of Moment Estimations of Accuracy of Normal Approximation for Distributions of Sums of Independent Random Variables 2013, Doctor of Physical and Mathematical Sciences Shevtsova, Irina Gennadievna

Dissertation conclusion on the topic "Probability Theory and Mathematical Statistics", Kolodzei, Alexander Vladimirovich

3.4. conclusions

In this chapter, based on the results of the previous chapters, it is possible to construct a goodness of fit test for testing hypotheses in generalized allocation schemes with the highest logarithmic rate of convergence to zero probabilities of type I errors under a fixed probabilities of type I error and non-approaching alternatives. ~"

Conclusion

The purpose of the dissertation work was to construct goodness-of-fit criteria for testing hypotheses in a selection scheme without returning from an urn containing balls of 2 colors. The author decided to study statistics based on the frequency of distances between balls of the same color. In this formulation, the problem was reduced to the problem of testing hypotheses in a suitable generalized layout.

In the dissertation work

The properties of entropy and informational distance of discrete distributions with an unlimited number of outcomes with a limited mathematical expectation are investigated;

A rough (up to logarithmic equivalence) asymptotics for the probabilities of large deviations of a wide class of statistics in a generalized allocation scheme is obtained;

On the basis of the obtained results, a criterion function with the highest logarithmic rate of convergence to zero of the probability of an error of the first kind is constructed for a fixed probability of an error of the second kind and non-approaching alternatives;

It is proved that statistics that do not satisfy the Cramer condition have a lower rate of tending to zero of the probabilities of large deviations compared to statistics that satisfy such a condition.

The scientific novelty of the work is as follows.

The concept of a generalized metric is given - a function that admits infinite values ​​and satisfies the axioms of identity, symmetry, and the triangle inequality. A generalized metric is found and sets are indicated on which the functions of entropy and information distance, given on a family of discrete distributions with a countable number of outcomes, are continuous in this metric;

In the generalized allocation scheme, a rough (up to logarithmic equivalence) asymptotics is found for the probabilities of large deviations of statistics of the form (0.4) satisfying the corresponding form of Cramer's condition;

In the generalized allocation scheme, a rough (up to logarithmic equivalence) asymptotics is found for the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition;

In the class of criteria of the form (0.7), a criterion with the largest value of the criterion index is constructed.

In the paper, a number of questions about the behavior of large deviation probabilities in generalized allocation schemes are solved. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences and were used in /3/, /21/ when justifying the security of one class of information systems.

However, a number of questions remain open. The author limited himself to the consideration of the central zone of change in the parameters n, N of generalized schemes for placing n particles in N cells. If the carrier of the distribution of random variables generating the generalized allocation scheme (0.2) is not a set of the form r, r + 1, r + 2,. was not considered in the author's work. For practical application criteria built on the basis of the proposed function with the maximum value of the index, it is required to study its distribution both under the null hypothesis and under alternatives, including converging ones. It is also of interest to transfer the developed methods and to generalize the obtained results to other probabilistic schemes other than generalized allocation schemes.

If - the frequencies of the distances between the outcome numbers 0 in the binomial scheme with the probabilities of outcomes r0> 1 - Rho, then it can be shown that in this case

Pb = kh.t fin = kn) = I(± iki = n)(kl + --, (3.3) v=\ K\ \ . Kn\ where

O* = Po~1(1 ~Po),v =

From the analysis of the formula for the joint distribution of the values ​​of z in the generalized arrangement of particles proved in /26/, it follows that the distribution (3.3), generally speaking, cannot be represented in the general case as the joint distribution of the values ​​of z in any generalized arrangement of particles by cells. This distribution is a special case of distributions on the set of combinatorial objects introduced in /12/. Seems urgent task transferring the results of the dissertation work for generalized layouts to this case, which was discussed in /52/.

If the number of outcomes in a choice-without-replacement scheme or in a polynomial allocation scheme is greater than two, then the joint frequency distribution of distances between adjacent identical outcomes can no longer be represented in such a simple way. So far, it has been possible to calculate only the mathematical expectation and the variance of the number of such distances /51/.

List of references for dissertation research Candidate of Physical and Mathematical Sciences Kolodzey, Alexander Vladimirovich, 2006

1. V. I. Bogachev and A. V. Kolesnikov, “Nonlinear transformations of convex measures and the entropy of Radon-Nikodim densities,” Dokl. - 2004. - T. 207. - 2. - S. 155 - 159.

2. V. V. Vidyakin and A. V. Kolodzey, “Statistical detection of covert channels in data transmission networks,” Tez. report II Intern. conf. "Information systems and technologies IST" 2004 "(Minsk, 8-10 Oct. 2004) Minsk: BGU, 2004. - Part 1. - P. 116 - 117.

3. I. I. Viktorova and V. P. Chistyakov, “Some generalizations of the empty box criterion,” Teor. Veroyatnost. and its application. - 1966. - T. XI. - 2. S. 306-313.

4. A. M. Zubkov, “Recursive Formulas for Computing Functionals of Ods of Discrete Random Variables,” Obozrenie Prikl. and industrial math. 1996. - T. 3. - 4. - S. 567 - 573.

5. G. A. M. Zubkov and V. G. Mikhailov, “Limit distributions of random variables associated with long repetitions in a sequence of independent trials,” Teor. Veroyatnost. and its application. - 1974. - T. XIX. 1. - S. 173 - 181.

6. A. M. Zubkov and V. G. Mikhailov, “On repetitions of s-strings in a sequence of independent variables,” Teor. Veroyatnost. and its application. - 1979. T. XXIV. - 2. - S. 267 - 273.

7. V. A. Ivanov, G. I. Ivchenko, and Yu. I. Medvedev, “Discrete Problems in Probability Theory,” Itogi Nauki i Tekhniki. Ser. theory of probability, math. statistician, theor. cybern. T. 23. - M.: VINITI, 1984. S. 3 -60.

8. G. I. Ivchenko, “On the moments of separable statistics in a generalized allocation scheme,” Math. notes. 1986. - T. 39. - 2. - S. 284 - 293.

9. G. I. Ivchenko and V. V. Levin, “Asymptotic normality in a selection scheme without replacement,” Teor. Veroyatnost. and its applied. - 1978.- T. XXIII. 1. - S. 97 - 108.

10. G. I. Ivchenko and Yu. I. Medvedev, “On the Markov-Poya urn scheme: from 1917 to the present day,” Obozrenie prikl. and industrial math. - 1996.- T. 3. 4. - S. 484-511.

11. G. I. Ivchenko and Yu. I. Medvedev, “Random combinatorial objects,” Dokl. 2004. - T. 396. - 2. - S. 151 - 154.

12. G. I. Ivchenko and Yu. I. Medvedev, “Statistical Problems Related to the Organization of Control over the Processes of Generation of Discrete Random Sequences,” Diskretn. math. - 2000. - T. 12. - 2. S. 3 - 24.

13. G. I. Ivchenko, Yu. I. Medvedev, and A. F. Ronzhin, “Separable statistics and goodness of fit tests for polynomial samples,” Trudy Mat. Institute of the Academy of Sciences of the USSR. 1986. - T. 177. - S. 60 - 74.

14. G. I. Ivchenko and E. E. Timonina, “On estimation when choosing from a finite population,” Math. notes. - 1980. - T. 28. - 4. - S. 623 - 633.

15. A. V. Kolodzei, “Large deviation probabilities theorem for separable statistics that do not satisfy the Cramer condition,” Diskretn. math. 2005. - T. 17. - 2. - S. 87 - 94.

16. A. V. Kolodzei, “Entropy of Discrete Distributions and Probabilities of Large Deviations of Functions from Filling Cells in Generalized Allocation Schemes,” Obozrenie Prikl. and industrial math. - 2005. - T. 12. 2. - S. 248 - 252.

17. Kolodzey A. V. Statistical criteria for detecting covert channels based on changing the order of messages // Research work "Apology": Report / FSTEC RF, Head A. V. Knyazev. Inv. 7 chipboard - M., 2004. - S. 96 - 128.

18. Kolodzey A. V., Ronzhin A. F. On some statistics related to checking the homogeneity of random discrete sequences // Research work "Development of mathematical problems of cryptography" N 4 2004.: Report / AC RF, - M., 2004 .

19. A. V. Kolchin, “Limit theorems for a generalized allocation scheme,” Diskretn. math. 2003. - T. 15. - 4. - S. 148 - 157.

20. V. F. Kolchin, “One class of limit theorems for conditional distributions,” Lit. math. Sat. - 1968. - T. 8. - 1. - S. 111 - 126.

21. V. F. Kolchin, Random Graphs. 2nd ed. - M.: FIZMATLIT, 2004. - 256s.

22. V. F. Kolchin, Random mappings. - M.: Nauka, 1984. - 208s.

23. V. F. Kolchin, B. A. Sevastyanov, and V. P. Chistyakov, Random Allocations. M.: Nauka, 1976. - 223p.

24. G. Kramer, Uspekhi Mat. Sciences. - 1944. - vyi. 10. - S. 166 - 178.

25. Kulbak S. Information Theory and Statistics. - M.: Nauka, 1967. - 408s.

26. Yu. I. Medvedev, “Some theorems on the asymptotic distribution of the chi-square statistic,” Dokl. Academy of Sciences of the USSR. - 1970. - T. 192. 5. - S. 997 - 989.

27. Yu. I. Medvedev, Separable statistics in a polynomial scheme I; II. // Theory Prob. and her example. - 1977. - T. 22. - 1. - S. 3 - 17; 1977. T. 22. - 3. - S. 623 - 631.

28. V. G. Mikhailov, “Limit distributions of random variables associated with multiple long repetitions in a sequence of independent trials,” Teor. Veroyatnost. and its application. - 1974. T. 19. - 1. - S. 182 - 187.

29. V. G. Mikhailov, “Central limit theorem for the number of incomplete long repetitions,” Teor. Veroyatnost. and its application. - 1975. - T. 20. 4. - S. 880 - 884.

30. V. G. Mikhailov and A. M. Shoitov, “Structural equivalence of s-strings in random discrete sequences,” Diskret. math. 2003. - T. 15, - 4. - S. 7 - 34.

31. Nagaev A.V. Integral limit theorems taking into account the probabilities of large deviations. I. // Teor. Veroyatnost. and its applied. -1969. T. 14. 1. - S. 51 - 63.

32. V. V. Petrov, Sums of Independent Random Variables. - M.: Nauka, 1972. 416s.

33. Yu. V. Prokhorov, “Limit theorems for sums of random vectors whose dimension tends to infinity,” Teor. Veroyatnost. and its application. 1990. - T. 35. - 4. - S. 751 - 753.

34. Ronzhin A.F. Criteria for generalized particle placement schemes // Teor. Veroyatnost. and its application. - 1988. - T. 33. - 1. - S. 94 - 104.

35. Ronzhin A.F. A theorem on probabilities of large deviations for separable statistics and its statistical application // Math. notes. 1984. - T. 36. - 4. - S. 610 - 615.

36. I. N. Sanov, “On the probabilities of large deviations of random variables,” Math. Sat. 1957. - T. 42. - 1 (84). - S. I - 44.

37. Seneta E. Correctly changing functions. M.: Nauka, 1985. - 144 p.

38. A. N. Timashev, “Multidimensional integral theorem on large deviations in an equiprobable allocation scheme,” Diskreta, Mat. - 1992. T. 4. - 4. - S. 74 - 81.

39. A. N. Timashev, “Multidimensional local large deviation theorem in an equiprobable allocation scheme,” Diskretn. math. - 1990. T. 2. - 2. - S. 143 - 149.

40. Fedoruk M.V. Pass method. M.: Nauka, 1977. 368s.

41. Feller V. Introduction to the theory of probability and its applications. T. 2. - M.: Mir, 1984. 738s.

42. Shannon K. Mathematical theory of communication // Works on information theory and cybernetics: Per. from English. / M., IL, 1963, p. 243 - 332.

43. Conrad K. Probability Distribution and Maximum Entropy // http://www.math.uconn.edu/~kconrad/blurbs/entropypost.pdf

44. Hoeffding W. Asymptotically optimal tests for multinomial distribution, Ann. Math. statist. 1965. - T. 36. - C. 369 - 408.

45. Inglot T,. Rallenberg W. C. M., Ledwina T. Vanishing shortcoming and asymptotic relative efficiency // Ann. statist. - 2000. - T. 28. - C. 215 238.

46. ​​Jurdas C., Pecaric J., Roki R., Sarapa N., On an inequality for theentropy of probability distribution, Math. Inequal. and Appl. - 2001. T. 4. - 2. - C. 209 - 214. (RZhMat. - 2005. - 05.07-13B.16).

47. Kolodzey A. V., Ronzhin A. F., Goodness of Fit Tests for Random Combinatoric Objects, Tez. report int. conf. Modern Problems and new Trends in Probability Theory, (Chernivtsi, June 19 - 26, 2005) - Kyiv: Institute of Mathematics, 2005. Part 1. P. 122.

48. Kullback S. and Leibler R. A. On information and sufficiency // Ann. Math. statist. 1951. - T. 22. - C. 79 - 86.

49. Quine M.P., Robinson J. Efficience of chi-square and likelihood ratio goodness of fit tests, Ann. statist. 1985. - T. 13. - 2. - C. 727 -742.

Please note that the scientific texts presented above are posted for review and obtained through original dissertation text recognition (OCR). In this connection, they may contain errors related to the imperfection of recognition algorithms. There are no such errors in the PDF files of dissertations and abstracts that we deliver.

Definition. The direction defined by a non-zero vector is called asymptotic direction relative to the second order line, if any the line of this direction (that is, parallel to the vector ) either has at most one common point with the line, or is contained in this line.

? How many common points can a line of the second order and a straight line of asymptotic direction relative to this line have?

In the general theory of second-order lines, it is proved that if

Then the non-zero vector ( defines the asymptotic direction with respect to the line

(general criterion for asymptotic direction).

For second order lines

if , then there are no asymptotic directions,

if then there are two asymptotic directions,

if then there is only one asymptotic direction.

The following lemma turns out to be useful ( criterion for the asymptotic direction of a line of parabolic type).

Lemma . Let be a line of parabolic type.

A non-zero vector has an asymptotic direction

relatively . (5)

(Problem. Prove the lemma.)

Definition. The straight line of asymptotic direction is called asymptote lines of the second order, if this line either does not intersect with or is contained in it.

Theorem . If has an asymptotic direction with respect to , then the asymptote parallel to the vector is determined by the equation

We fill in the table.

TASKS.

1. Find the asymptotic direction vectors for the following second order lines:

4 - hyperbolic type, two asymptotic directions.

Let us use the asymptotic direction criterion:

Has an asymptotic direction with respect to the given line 4 .

If =0, then =0, that is, zero. Then Divide by We get a quadratic equation: , where t = . We solve this quadratic equation and find two solutions: t = 4 and t = 1. Then the asymptotic directions of the line .

(Two ways can be considered, since the line is of parabolic type.)

2. Find out if the coordinate axes have asymptotic directions relative to the lines of the second order:

3. Write the general equation of a second order line for which

a) the abscissa axis has an asymptotic direction;

b) Both coordinate axes have asymptotic directions;

c) the coordinate axes have asymptotic directions and O is the center of the line.

4. Write the asymptote equations for the lines:

a) ng w:val="EN-US"/>y=0"> ;

5. Prove that if a second-order line has two non-parallel asymptotes, then their intersection point is the center of this line.

Note: Since there are two non-parallel asymptotes, there are two asymptotic directions, then , and, therefore, the line is central.

Write the asymptote equations in general view and a system for finding the center. Everything is obvious.

6.(№920) Write the equation of a hyperbola passing through point A(0, -5) and having asymptotes x - 1 = 0 and 2x - y + 1 = 0.

indication. Use the statement of the previous problem.

Homework . , No. 915 (c, e, e), No. 916 (c, d, e), No. 920 (if you didn’t have time);

Cribs;

Silaev, Timoshenko. Practical tasks in geometry,

1 semester P.67, questions 1-8, p.70, questions 1-3 (oral).

SECOND-ORDER LINE DIAMETERS.

MATED DIAMETERS.

An affine coordinate system is given.

Definition. diameter line of the second order, conjugate to a vector of non-asymptotic direction with respect to , is the set of midpoints of all chords of the line parallel to the vector .

At the lecture, it was proved that the diameter is a straight line and its equation was obtained

Recommendations: Show (on an ellipse) how it is constructed (set a non-asymptotic direction; draw [two] straight lines of this direction intersecting the line; find the midpoints of the cut off chords; draw a straight line through the midpoints - this is the diameter).

Discuss:

1. Why is a vector of non-asymptotic direction taken in the definition of the diameter. If they cannot answer, then ask them to build a diameter, for example, for a parabola.

2. Does any line of the second order have at least one diameter? Why?

3. At the lecture it was proved that the diameter is a straight line. The middle of which chord is the point M in the figure?


4. Look at the brackets in equation (7). What do they remind?

Conclusion: 1) each center belongs to each diameter;

2) if there is a straight line of centers, then there is a single diameter.

5. What is the direction of the parabolic line diameters? (Asymptotic)

Proof (probably in a lecture).

Let the diameter d given by equation (7`) be conjugate to a vector of non-asymptotic direction. Then its direction vector

(-(), ). Let us show that this vector has an asymptotic direction. Let us use the criterion of the asymptotic direction vector for a parabolic line (see (5)). We substitute and make sure (do not forget that .

6. How many diameters does a parabola have? Their relative position? How many diameters do the rest of the parabolic lines have? Why?

7. How to construct the total diameter of some pairs of second-order lines (see questions 30, 31 below).

8. We fill in the table, be sure to make drawings.

1. . Write the equation for the set of midpoints of all chords parallel to the vector

2. Write an equation for the diameter d passing through the point K(1,-2) for the line.

Solution steps:

1st way.

1. Determine the type (to know how the diameters of this line behave).

In this case, the line is central, then all diameters pass through the center C.

2. We compose the equation of a straight line passing through two points K and C. This is the desired diameter.

2nd way.

1. We write the equation for the diameter d in the form (7`).

2. Substituting the coordinates of the point K into this equation, we find the relationship between the coordinates of the vector conjugate to the diameter d.

3. We set this vector, taking into account the found dependence, and compose the equation for the diameter d.

In this problem, it is easier to calculate in the second way.

3. . Write the equation for the diameter parallel to the x-axis.

4. Find the middle of the chord cut off by the line

on the line x + 3y – 12 =0.

Suggestion for a decision: Of course, you can find the points of intersection of the given line and line , and then - the middle of the resulting segment. The desire to do so disappears if we take, for example, a straight line with the equation x + 3y - 2009 = 0.

Exact Tests provides two additional methods for calculating significance levels for the statistics available through the Crosstabs and Nonparametric Tests procedures. These methods, the exact and Monte Carlo methods, provide a means for obtaining accurate results when your data fail to meet any of the underlying assumptions necessary for reliable results using the standard asymptotic method. Available only if you have purchased the Exact Tests Options.

example. Asymptotic results obtained from small datasets or sparse or unbalanced tables can be misleading. Exact tests enable you to obtain an accurate significance level without relying on assumptions that might not be met by your data. For example, the results of an entrance exam for 20 fire fighters in a small township show that all five white applicants received a pass result, whereas the results for Black, Asian and Hispanic applicants are mixed. A Pearson chi-square testing the null hypothesis that results are independent of race produces an asymptotic significance level of 0.07. This result leads to the conclusion that exam results are independent of the race of the examinee. However, because the data contains only 20 cases and the cells have expected frequencies of less than 5, this result is not trustworthy. The exact significance of the Pearson chi-square is 0.04, which leads to the opposite conclusion. Based on the exact significance, you would conclude that exam results and race of the examinee are related. This demonstrates the importance of obtaining exact results when the assumptions of the asymptotic method cannot be met. The exact significance is always reliable, regardless of the size, distribution, sparseness, or balance of the data.

statistics. asymptomatic significance. Monte Carlo approximation with confidence level, or exact significance.

  • asymptotic. The significance level based on the asymptotic distribution of a test statistic. Typically, a value of less than 0.05 is considered significant. The asymptotic significance is based on the assumption that the data set is large. If the data set is small or poorly distributed, this may not be a good indication of significance.
  • Monte Carlo Estimate. An unbiased estimate of the exact significance level, calculated by repeatedly sampling from a reference set of tables with the same dimensions and row and column margins as the observed table. The Monte Carlo method allows you to estimate exact significance without relying on the assumptions required for the asymptotic method. This method is most useful when the data set is too large to compute exact significance, but the data do not meet the assumptions of the asymptotic method.
  • Exact. The probability of the observed outcome or an outcome more extreme is calculated exactly. Typically, a significance level less than 0.05 is considered significant, indicating that there is some relationship between the row and column variables.

close