Description
Our load_pnadc function uses the internal function
build_pnadc_panel to identify households and individuals
across quarters. The base method used for the identification draws from
the paper of Rafael Perez Ribas and Sergei Suarez Dillon Soares (2008):
“Sobre o painel da Pesquisa Mensal de Emprego (PME) do IBGE”, with
modernizations implemented by the Data Zoom team to handle missing data
and typographical errors.
Usage:
Basic Panel:
Advanced Panel (Stages 1, 2, or 3):
# Stage 1: Exact matching with donated birth dates
panel_data <- build_pnadc_panel(dat = pnad_sample, panel = "advanced_1")
# Stage 2: Relaxed matching constraints
panel_data <- build_pnadc_panel(dat = pnad_sample, panel = "advanced_2")
# Stage 3: Fuzzy matching using Graph Theory (Recommended)
panel_data <- build_pnadc_panel(dat = pnad_sample, panel = "advanced_3")The household identifier – stored as id_dom – combines
the variables:
UPA – Primary Sampling Unit - PSU;V1008 – Household;V1014 – Panel Number;In order to create a unique number for every combination of those variables.
The basic individual identifier – stored as id_ind –
combines the household id with:
V2007 – Sex;V20082 (year), V20081
(month), V2008 (day)];In order to create a unique number for every combination of those variables.
On individuals who were not matched across all interviews using the basic method, we apply a progressive multi-stage algorithm to increase matching power without compromising uniqueness.
First, we reproduce the birth date donation method based on the
methodology described in the IPEA technical note (Osório, 2019). It
estimates and imputes missing birth dates (day, month, and year) by
matching individuals with donors from different interviews within the
same household based on sex, acceptable household condition changes, and
estimated age. This process is executed by our internal function
donate_birth_dates.
advanced_1): Repeats the
basic identification logic, but utilizing the donated dates. The
identifier – stored as id_rs1 – combines:
id_dom – Household IDV2007 – Sexbirth_day – Donated day of birthbirth_month – Donated month of birthbirth_year – Donated year of birthadvanced_2): For individuals
not completely matched in Stage 1, we relax the year of birth constraint
(assuming it is often misreported). The identifier – stored as
id_rs2 – combines:
id_dom – Household IDbirth_month – Donated month of birthbirth_day – Donated day of birthV2003 – Order number in the householdadvanced_3): Targets
candidates with fragmented interviews (less than 5 matches in the
previous stages). It considers a match successful if there is a unique
individual in the same household, in a different quarter, that satisfies
the acceptable difference criteria established by Ribas and Soares (up
to 4 days difference in the day of birth, 2 months in the month of
birth, and a dynamically adjusted year-of-birth difference based on the
individual’s reported age). The final identifier is stored as
id_rs3.The table below shows the unconditional attrition rate for households. This represents the percentage of household units observed in Wave 1 that were successfully re-interviewed and tracked in subsequent waves.
| Interview (Wave) | Household Attrition Rate (%) |
|---|---|
| 1 | 100.00000 |
| 2 | 93.75667 |
| 3 | 91.84360 |
| 4 | 90.59256 |
| 5 | 89.60861 |
This table reports the percentage of raw PNADC individual observations (lines) in Wave 1 for which we successfully built a valid identifier. Data is lost in this stage exclusively due to the inability to construct the identifier (e.g., missing essential data) or household grouping constraints.
| Interview (Wave) | Basic Rate (%) | Adv 1 Rate (%) | Adv 2 Rate (%) | Adv 3 Rate (%) |
|---|---|---|---|---|
| 1 | 93.82378 | 95.82954 | 96.40170 | 96.39606 |
This table demonstrates the cumulative retention of tracked individuals over time. It uses the total number of uniquely identified individuals from Wave 1 as the universal denominator (starting at 100%), showing how much tracking power is gained by using the advanced algorithms.
| Interview (Wave) | Basic Rate (%) | Adv 1 Rate (%) | Adv 2 Rate (%) | Adv 3 Rate (%) | Difference (Adv 3 - Basic) |
|---|---|---|---|---|---|
| 1 | 100.00000 | 100.00000 | 100.00000 | 100.00000 | 0.00000 p.p. |
| 2 | 87.01360 | 88.20828 | 88.50570 | 88.83374 | + 1.82014 p.p. |
| 3 | 80.55773 | 82.33729 | 82.85546 | 83.38268 | + 2.82495 p.p. |
| 4 | 75.81465 | 77.91677 | 78.57830 | 79.25447 | + 3.43982 p.p. |
| 5 | 72.01655 | 74.27815 | 75.01486 | 75.79868 | + 3.78213 p.p. |