Share data versions & ids Stephanie Stuck

SHARE data versions & IDs

  • Stephanie Stuck

  • MEA

  • Antwerpen February 2008

Data versions and ID-variables

sampid rules (old)

  • Digits 1-2: country code (e.g. 23 for Belgium French speaking)

  • Digits 3-5: wave indicator (042 for wave 1 and 062 for wave 2 main survey)

  • Digits 6-11: household ID

  • Digits 12-13: longitudinal household split indicator 00 by default, if respondent moves out based on respid, e.g. if ‘moving out respondent’ has respid 01 it is changed to 01

  • Examples 1104200010000: Austria, starting in wave 1 (longitudinal sample) 2306214010300: Belgium (French), starting in wave 2 (refresher)

  • One needs to combine sampid with the respondent ID (respid) to identify and merge cases on the respondent level

  • Merging problems esp. for split households / ‘moving’ respondents across waves

  • We will change the system and

  • We will divide sampid into different parts:

    • household id (fixed part and split indicator if needed)
    • new wave indictor variable ‘wi’ indicates when a household first entered the sample

New household identifier hhid1 (internal) & hhid (public)

  • Digits 1-2: country code in letters. e.g. AT for Austria, Bf for Belgium French speaking (internal)

  • Digits 3-8: fixed household ID This part will not change across waves if household splits off

  • Digit 9: one digit added to the fixed household id to identify whether it is an ‘additional’ household that resulted from a split,

    • A for all ‘original’ household (all in wave 1, refresher in wave 2)
    • B used only if a household has split. A is than still used for the ‘first’ part of the household and B for the ‘splitting part’ (the one that is interviewed second, normally the one that moved out)
    • C is used for very rare case of split off household when original household in wave 1 consisted of 3 eligible sisters for example and split in 3 parts.
  • Examples for new household id AT100100A: Austria, ‘original’ household AT100100B: Austria, split off household Bf140103A: Belgium French speaking household (internal)

New person identifier: person1

  • Digits 1-2: country code (CC) in letters e.g. AT for Austria, Bf for Belgium French speaking

  • Digits 3-8: fixed household ID this part will not change across waves.

  • Digit 9-10: respondent id, e.g if respid is 1 it will be 01

Old and new ids

In addition:

  • A dataset will be generated that shows to which households a respondent belonged during her or his ‘SHARE history’, e.g.:

Data cleaning

  • always use the unscrambled version that includes sampid for data cleaning

  • use sampid and respid to identify respondents

  • generate/compute sampid_original, respid_original and cvid_original before you change ids

