The load_pnadc function is a wrapper for get_pnadc
from the package PNADcIBGE, with added identification
algorithms for panel construction. For details on the identification
algorithms, see vignette("BUILD_PNADC_PANEL").
Panel Structure:
The table below shows the first and last quarter
(ANOtrimestre, e.g. 20121 = 2012 Q1) covered
by each PNADC rotating panel:
| Panel | Start | End |
|---|---|---|
| 1 | 20121 | 20124 |
| 2 | 20121 | 20141 |
| 3 | 20132 | 20152 |
| 4 | 20143 | 20163 |
| 5 | 20154 | 20174 |
| 6 | 20171 | 20191 |
| 7 | 20182 | 20202 |
| 8 | 20193 | 20213 |
| 9 | 20204 | 20224 |
| 10 | 20221 | 20241 |
| 11 | 20232 | 20252 |
| 12 | 20243 | 20263 |
| 13 | 20254 | 20274 |
| 14 | 20271 | 20291 |
Options:
save_to: The directory in which the user desires to save the downloaded files.
years: picks the years for which the data will be downloaded
quarters: The quarters within those years to be
downloaded. Can be either a vector such as 1:4 for
consistent quarters across years, or a list of vectors, if quarters are
different for each year (e.g. list(1:4, 1:2) for four
quarters in the first year and two in the second).
panel: Which panel algorithm to apply to this data. There are five options:
none: No panel is built. If
raw_data = TRUE, returns the original data. Otherwise,
creates some extra treated variables. Quarterly files are saved
depending on save_options.basic: Performs basic identification steps using
household IDs, sex, and exact dates of birth.advanced_1: Performs Stage 1 advanced identification,
imputing missing birth dates using within-household donors.advanced_2: Performs Stage 2 advanced identification,
relaxing the year of birth constraint.advanced_3 (Recommended): Performs Stage 3 advanced
identification, utilizing Graph Theory for fuzzy matching of fragmented
interviews to account for typographical errors.raw_data: A command to define if the user would like to download the raw or treated data. There are two options:
TRUE: if you want the PNADC variables as they
come.FALSE: if you want the treated version of the PNADC
variables.deflator: A logical argument forwarded to
PNADcIBGE::get_pnadc().
TRUE (default): downloads the deflator variables made
available by PNADcIBGE.FALSE: downloads the microdata without those deflator
variables.defyear: The deflator year forwarded to
PNADcIBGE::get_pnadc() for annual microdata. It is ignored
for quarterly downloads and used only when
deflator = TRUE.
defperiod: The deflator period forwarded to
PNADcIBGE::get_pnadc() for annual per-topic microdata. It
is ignored for quarterly downloads and used only when
deflator = TRUE.
save_options: A logical vector of length 2 controlling file saving behaviour:
c(TRUE, TRUE) (default): saves quarterly and panel
files as .rds.c(FALSE, TRUE): does not save quarterly files; saves
panel files as .rds.c(TRUE, FALSE): saves quarterly and panel files as
.parquet datasets.c(FALSE, FALSE): does not save quarterly files; saves
panel files as a .parquet dataset.vars: A character vector of additional variable
names to download, following the same convention as vars in
PNADcIBGE::get_pnadc(). Use NULL (the default)
to download all available microdata columns. See the note above
regarding the structural columns that are always returned by
PNADcIBGE::get_pnadc() regardless of this argument
Details:
The function performs the following steps:
PNADcIBGE::get_pnadc
to download the data. All quarters are collected in memory and saved
depending on save_options.V1014.build_pnadc_panel..rds or .parquet,
depending on save_options.build_pnadc_panel was
originally drawn from Ribas, Rafael Perez, and Sergei Suarez Dillon
Soares (2008): “Sobre o painel da Pesquisa Mensal de Emprego (PME) do
IBGE”, with extensive modernizations, missing-data imputation, and
graph-based fuzzy matching introduced by the Data Zoom team.Usage:
Default:
load_pnadc(
save_to = getwd(),
years,
quarters = 1:4,
panel = "advanced_3",
raw_data = FALSE,
deflator = TRUE,
defyear = NULL,
defperiod = NULL,
save_options = c(TRUE, TRUE),
vars = NULL
)To download PNADC data for all quarters of 2022 and 2023, with advanced fuzzy identification (Stage 3), simply run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023,
panel = "advanced_3"
)To download PNADC data for all of 2022, but only the first quarter of 2023, run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023,
quarters = list(1:4, 1)
)To download PNADC data without any variables treatment or identification (e.g., for all quarters of 2021), run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2021,
panel = "none",
raw_data = TRUE
)To download PNADC data without the deflator variables supplied by
PNADcIBGE, run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
deflator = FALSE
)To download PNADC data, save quarters on disk, and save panels as Parquet, run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(TRUE, FALSE)
)To download PNADC data and save panels as RDS but discard the quarterly files, run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(FALSE, TRUE)
)To download only a specific subset of variables - for example, age
(V2009) and habitual income (VD4019) -
alongside the structural columns that PNADcIBGE always
returns, run: To download only a specific subset of variables - for
example, age (V2009) and habitual income
(VD4019) - alongside the structural columns that
PNADcIBGE always returns, run:
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
vars = c("V2009", "VD4019")
)(
V1027,V1028,V1028001-V1028200,posest,posest_sxi) and columns regardless of thevarsargument. These include survey design weights (V1027,V1028,V1028001-V1028200,posest,posest_sxi) and identifiers such asUF,Estrato,V1029,V1033, andID_DOMICILIO. Whendeflator = TRUE, deflator variables (Habitual,Efetivo) are also included. Thevarsargument adds columns on top of those; it does not restrict them. Usevars = NULL(the default) to download all available microdata columns.
Deflation: Deflation support in
load_pnadc()is provided byPNADcIBGE. For the deflator methodology and the deflator files themselves, seePNADcIBGE::pnadc_deflator()and the corresponding documentation in that package.
If you specify vars and also request panel
identification, any columns # V2007, V20082, V20081, V2008 and V2003 -
these are added automatically added automatically and a warning will
tell you which ones were added. For example, when using
panel = "advanced_3", the columns V2007,
V20082, V20081, V2008, and
V2003 must be present. If you omit them from
vars, the function adds them for you: