By Prem K. Goel

Incomplete-data difficulties come up clearly often of statistical perform. One category of incomplete-data difficulties, that's rather now not good understood through statisticians, is that of merging micro-data documents. Many Federal enterprises use the method of file-merging to create accomplished documents from a number of yet incomplete resources of information. the most goal of this pastime is to accomplish statistical analyses at the man made info set generated through dossier­ merging. usually, those analyses can't be played by way of studying the unfinished facts units individually. The validity and the efficacy of the file-merging method could be assessed by way of statistical types underlying the mechanisms which can generate the unfinished records. even though, a totally passable and unified idea of file-merging has no longer but been built This monograph is just a minor try to fill this void for unifying identified versions. right here, we evaluation the optimum homes of a few identified matching suggestions and derive new effects thereof. even if, a large number of unsolved difficulties nonetheless want the eye of very many researchers. One major challenge nonetheless to be resolved is the advance of acceptable inference technique from merged documents if one insists on utilizing dossier merging technique. If this monograph succeeds in attracting quite a few extra mathematical statisticians to paintings in this type of difficulties, then we are going to consider that our efforts were successful.

6) possess a weaker version of the LPQD property. 1. 2..... 11) Then Tlk and 30 are PQD Proof. Fix k E {l. 2 ..... n-l}. Then. 1. Tlk and 30 are PQD iff Because 3i's are binary random variables. 13) if x ~ 1. 1). For 0 S X2 < 1. (3n>xv;;;; (30=1). It therefore remains to be shown that. 1 ..... k. 52 2. 14) in a more useful form P(11 k ~ '15n = 0) ~ P(11 k ~ '15n = 1), ,= 0, 1,2,... ,k. 16) by means of a combinatorial argument. Since the event (Sn n-l . 17) Ja = (11 k ~ , ,Rn=a), a = 1,2,... , (n-1).

B = (bt. , is the matching strategy we started with. W n> . •n). Then. •Wan ) is the indicator function of the event or. equivalently. ) of Us. the indicator function of the event [Gn (Ua j - e) S (,(Raj» InS Gn (Uaj + e»). Since Gn-l(k In) = U(k). •n. •W an) is 1 iff I Ucp «Raj» - Uaj I S e. Consequently. 8. lOa) Similarly. 8. 9). l0b). We shall now fix the parameters describing dependence in the population of W. and allow n to tend to infinity in order to study the behavior of N(,· ,e). In view of the fact that federal files often consist of a large number of records.

An ). g(bl ... ·• bn )is any point in R2n. 6;1 we get d (g(Tl ... ·• Tn). g(Ult .. ·• Un» = (g(U1o .... Un) • g(Tl ... 9) Hence. (g(T). g(U» is an exchangeable-pair of random variables. Uj). 2..... d. vectors ~Oi. ~li, ~2i). 2..... n and a measurable function f such that (i) For each j. ~lj. ~2j are LLd. univariate random variables and the vector ~Oj is independent of ~lj and ~2j. (li) For each j. 10) Introducing the random variables. 6 An Optimality Property of the Matching Strategy cp* -~o * = (~12 .....

