I am new to Stata and hope that one of you can help me with this problem!
I have three data sets that I want to merge. The first two include the same variables; therefore I already merged these in the first step. In the second step I now want to merge the other file. After doing so, I end up with more observations than before. Am I doing something wrong?
This is what the first two data sets look like before I merged them with the third one:
Variable | Obs Mean Std. Dev. Min Max
mergeid | 0
ch001_ | 46,398 2.215785 1.570519 -2 17
wave | 67,576 1.549781 .4975194 1 2
br001_ | 46,389 3.113993 1.998745 -2 5
bmi2 | 66,916 2.694468 .939573 -3 4
eurod | 65,568 2.322566 2.277759 0 12
spheu | 30,284 2.304913 .940147 -2 5
isced1997y_r | 30,344 10.5346 8.95301 -2 97
This is what the third data set looks like before the merge:
Variable. | Obs Mean Std. Dev. Min Max
mergeid | 0
country | 28,472 18.6361 5.445199 11 30
yob | 28,472 1942.239 10.05177 1907 1984
gender | 28,472 1.56034 .4963544 1 2
sl_cs004d1 | 27,894 .9609235 .2006878 -2 1
sl_cs004d2 | 27,894 .9078655 .2938936 -2 1
sl_cs007dno| 28,391 .2837167 .4594766 -2 1
sl_cs008_ | 28,392 2.046598 1.235409 -2 5
sl_rp002_ | 28,418 1.236188 .9437631 -1 5
Now, after I have merged them m:1 using mergeid as the key, this is what I end up with:
Variable | Obs Mean Std. Dev. Min Max
mergeid | 0
ch001_ | 28,951 2.162792 1.445806 -2 14
wave | 41,967 1.603188 .4892422 1 2
br001_ | 27,201 3.108709 1.997605 -2 5
bmi2 | 41,741 2.730553 .8908915 -3 4
eurod | 41,193 2.233923 2.189107 0 12
spheu | 16,616 2.216057 .8818345 -2 5
isced1997y_r | 16,638 10.51301 8.484587 -2 97
country | 41,967 17.91288 4.886818 11 30
yob | 41,967 1941.642 9.929027 1907 1978
gender | 41,967 1.563919 .4959034 1 2
sl_cs004d1 | 41,162 .9599145 .2029794 -2 1
sl_cs004d2 | 41,162 .9060055 .29645 -2 1
sl_cs007dno | 41,866 .2776955 .4567399 -2 1
sl_cs008_ | 41,868 2.029426 1.233308 -2 5
sl_rp002_ | 41,906 1.242113 .9549139 -1 5
All variables from the third data set have more observations now than they had before the merge. Does anyone know what I can do to solve this problem?
This is a description of all variables, in case this is helpful.
storage display value
variable name type format label variable label
mergeid str12 %12s Person identifier (fix across modules and waves)
ch001_ byte %10.0f dkrf Number of children
wave float %9.0g
br001_ byte %10.0f yesno Ever smoked daily
bmi2 byte %27.0g bmi2 Bmi categories
eurod byte %14.0g eurod Depression scale EURO-D - high is depressed
spheu byte %10.0g spheu Self-perceived health - european version
isced1997y_r float %25.0g iscedy Respondent: years of education derived from ISCED-97
country byte %14.0g country Country identifier
yob int %10.0g dkrf Year of birth of respondent
gender byte %10.0g gender Gender of respondent
sl_cs004d1 byte %12.0g dummi Lived in hh when ten: biological mother
sl_cs004d2 byte %12.0g dummi Lived in hh when ten: biological father
sl_cs007dno byte %12.0g dummi Features of accommodation when ten: none of these
sl_cs008_ byte %58.0g cs008 Number of books when ten
sl_rp002_ byte %10.0g yesno Ever been married