---
title: "LOAD_PNADC"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{LOAD_PNADC}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The `load_pnadc` function is a wrapper for *[`get_pnadc`](https://www.rdocumentation.org/packages/PNADcIBGE/versions/0.7.0/topics/get_pnadc)* from the package `PNADcIBGE`, with added identification algorithms for panel construction. For details on the identification algorithms, see `vignette("BUILD_PNADC_PANEL")`.
---
**Panel Structure:**
The table below shows the first and last quarter (`ANOtrimestre`, e.g. `20121` = 2012 Q1) covered by each PNADC rotating panel:
| Panel | Start | End |
| ----: | ----- | ----- |
| 1 | 20121 | 20124 |
| 2 | 20121 | 20141 |
| 3 | 20132 | 20152 |
| 4 | 20143 | 20163 |
| 5 | 20154 | 20174 |
| 6 | 20171 | 20191 |
| 7 | 20182 | 20202 |
| 8 | 20193 | 20213 |
| 9 | 20204 | 20224 |
| 10 | 20221 | 20241 |
| 11 | 20232 | 20252 |
| 12 | 20243 | 20263 |
| 13 | 20254 | 20274 |
| 14 | 20271 | 20291 |
---
**Options:**
1. **save_to**: The directory in which the user desires to save the downloaded files.
2. **years**: picks the years for which the data will be downloaded
3. **quarters**: The quarters within those years to be downloaded. Can be either a vector such as `1:4` for consistent quarters across years, or a list of vectors, if quarters are different for each year (e.g. `list(1:4, 1:2)` for four quarters in the first year and two in the second).
4. **panel**: Which panel algorithm to apply to this data. There are five options:
* `none`: No panel is built. If `raw_data = TRUE`, returns the original data. Otherwise, creates some extra treated variables. Quarterly files are saved depending on `save_options`.
* `basic`: Performs basic identification steps using household IDs, sex, and exact dates of birth.
* `advanced_1`: Performs Stage 1 advanced identification, imputing missing birth dates using within-household donors.
* `advanced_2`: Performs Stage 2 advanced identification, relaxing the year of birth constraint.
* `advanced_3` (Recommended): Performs Stage 3 advanced identification, utilizing Graph Theory for fuzzy matching of fragmented interviews to account for typographical errors.
5. **raw_data**: A command to define if the user would like to download the raw or treated data. There are two options:
* `TRUE`: if you want the PNADC variables as they come.
* `FALSE`: if you want the treated version of the PNADC variables.
6. **deflator**: A logical argument forwarded to `PNADcIBGE::get_pnadc()`.
* `TRUE` (default): downloads the deflator variables made available by `PNADcIBGE`.
* `FALSE`: downloads the microdata without those deflator variables.
7. **defyear**: The deflator year forwarded to `PNADcIBGE::get_pnadc()` for annual microdata. It is ignored for quarterly downloads and used only when `deflator = TRUE`.
8. **defperiod**: The deflator period forwarded to `PNADcIBGE::get_pnadc()` for annual per-topic microdata. It is ignored for quarterly downloads and used only when `deflator = TRUE`.
9. **save_options**: A logical vector of length 2 controlling file saving behaviour:
* `c(TRUE, TRUE)` (default): saves quarterly and panel files as `.rds`.
* `c(FALSE, TRUE)`: does not save quarterly files; saves panel files as `.rds`.
* `c(TRUE, FALSE)`: saves quarterly and panel files as `.parquet` datasets.
* `c(FALSE, FALSE)`: does not save quarterly files; saves panel files as a `.parquet` dataset.
10. **vars**: A character vector of additional variable names to download, following the same convention as `vars` in `PNADcIBGE::get_pnadc()`. Use `NULL` (the default) to download all available microdata columns. See the note above regarding the structural columns that are always returned by `PNADcIBGE::get_pnadc()` regardless of this argument
---
**Details:**
The function performs the following steps:
1. Loop over years and quarters using `PNADcIBGE::get_pnadc` to download the data. All quarters are collected in memory and saved depending on `save_options`.
2. Split the data into panels by the panel variable `V1014`.
3. Apply the identification algorithms defined in `build_pnadc_panel`.
4. Save the panel files as `.rds` or `.parquet`, depending on `save_options`.
* The base identification logic in `build_pnadc_panel` was originally drawn from Ribas, Rafael Perez, and Sergei Suarez Dillon Soares (2008): "Sobre o painel da Pesquisa Mensal de Emprego (PME) do IBGE", with extensive modernizations, missing-data imputation, and graph-based fuzzy matching introduced by the Data Zoom team.
---
**Usage:**
Default:
```{r eval=FALSE}
load_pnadc(
save_to = getwd(),
years,
quarters = 1:4,
panel = "advanced_3",
raw_data = FALSE,
deflator = TRUE,
defyear = NULL,
defperiod = NULL,
save_options = c(TRUE, TRUE),
vars = NULL
)
```
To download PNADC data for all quarters of 2022 and 2023, with advanced fuzzy identification (Stage 3), simply run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023,
panel = "advanced_3"
)
```
To download PNADC data for all of 2022, but only the first quarter of 2023, run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022:2023,
quarters = list(1:4, 1)
)
```
To download PNADC data without any variables treatment or identification (e.g., for all quarters of 2021), run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2021,
panel = "none",
raw_data = TRUE
)
```
To download PNADC data without the deflator variables supplied by `PNADcIBGE`, run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
deflator = FALSE
)
```
To download PNADC data, save quarters on disk, and save panels as Parquet, run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(TRUE, FALSE)
)
```
To download PNADC data and save panels as RDS but discard the quarterly files, run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
save_options = c(FALSE, TRUE)
)
```
To download only a specific subset of variables - for example, age (`V2009`) and habitual income (`VD4019`) - alongside the structural columns that `PNADcIBGE` always returns, run:
To download only a specific subset of variables - for example, age (`V2009`) and habitual income (`VD4019`) - alongside the structural columns that `PNADcIBGE` always returns, run:
```{r eval=FALSE}
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
vars = c("V2009", "VD4019")
)
```
> (`V1027`, `V1028`, `V1028001`-`V1028200`, `posest`, `posest_sxi`) and
> columns regardless of the `vars` argument. These include survey design weights
> (`V1027`, `V1028`, `V1028001`-`V1028200`, `posest`, `posest_sxi`) and
> identifiers such as `UF`, `Estrato`, `V1029`, `V1033`, and `ID_DOMICILIO`.
> When `deflator = TRUE`, deflator variables (`Habitual`, `Efetivo`) are also
> included. The `vars` argument adds columns **on top of** those; it does not
> restrict them. Use `vars = NULL` (the default) to download all available
> microdata columns.
> **Deflation:** Deflation support in `load_pnadc()` is provided by
> `PNADcIBGE`. For the deflator methodology and the deflator files themselves,
> see `PNADcIBGE::pnadc_deflator()` and the corresponding documentation in that
> package.
If you specify `vars` and also request panel identification, any columns
# V2007, V20082, V20081, V2008 and V2003 - these are added automatically
added automatically and a warning will tell you which ones were added. For
example, when using `panel = "advanced_3"`, the columns `V2007`, `V20082`,
`V20081`, `V2008`, and `V2003` must be present. If you omit them from `vars`,
the function adds them for you:
```{r eval=FALSE}
# Only V2009 requested, but panel = "advanced_3" needs
# V2007, V20082, V20081, V2008 and V2003 - these are added automatically
# with a warning.
load_pnadc(
save_to = "Directory/You/Would/like/to/save/the/files",
years = 2022,
panel = "advanced_3",
vars = c("V2009", "VD4019")
)
```