SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008
Data versions and ID-variables
sampid rules (old) Digits 1-2: country code (e.g. 23 for Belgium French speaking) Digits 3-5: wave indicator (042 for wave 1 and 062 for wave 2 main survey) Digits 6-11: household ID Digits 12-13: longitudinal household split indicator 00 by default, if respondent moves out based on respid, e.g. if ‘moving out respondent’ has respid 01 it is changed to 01 Examples 1104200010000: Austria, starting in wave 1 (longitudinal sample) 2306214010300: Belgium (French), starting in wave 2 (refresher) One needs to combine sampid with the respondent ID (respid) to identify and merge cases on the respondent level Merging problems esp. for split households / ‘moving’ respondents across waves
We will change the system and We will divide sampid into different parts: - household id (fixed part and split indicator if needed)
- new wave indictor variable ‘wi’ indicates when a household first entered the sample
New household identifier hhid1 (internal) & hhid (public) Digits 1-2: country code in letters. e.g. AT for Austria, Bf for Belgium French speaking (internal) Digits 3-8: fixed household ID This part will not change across waves if household splits off Digit 9: one digit added to the fixed household id to identify whether it is an ‘additional’ household that resulted from a split, - A for all ‘original’ household (all in wave 1, refresher in wave 2)
- B used only if a household has split. A is than still used for the ‘first’ part of the household and B for the ‘splitting part’ (the one that is interviewed second, normally the one that moved out)
- C is used for very rare case of split off household when original household in wave 1 consisted of 3 eligible sisters for example and split in 3 parts.
Examples for new household id AT100100A: Austria, ‘original’ household AT100100B: Austria, split off household Bf140103A: Belgium French speaking household (internal)
New person identifier: person1 Digits 1-2: country code (CC) in letters e.g. AT for Austria, Bf for Belgium French speaking Digits 3-8: fixed household ID this part will not change across waves. Digit 9-10: respondent id, e.g if respid is 1 it will be 01
Old and new ids
In addition: A dataset will be generated that shows to which households a respondent belonged during her or his ‘SHARE history’, e.g.:
Data cleaning always use the unscrambled version that includes sampid for data cleaning use sampid and respid to identify respondents generate/compute sampid_original, respid_original and cvid_original before you change ids
Do'stlaringiz bilan baham: |