Load data
There are two different versions of the data—an SPSS file and an Excel file—and neither are perfectly identical. These columns have issues:
ID
: This columnn is full of NAs in SPSS; it exists in Excel.
ACT
, BIRTH
, SAT
, UGA_ID
: These columns are almost full of NAs in SPSS; they kind of exist in Excel—when values do exist, they’re separated by commas, like 2,9
, 1,3
, 1, 8, 9, 0
, etc.
BIRTH
: This column is almost full of NAs in SPSS; it kind of exists in Excel—when values do exist, they’re also separated by commas.
- Most other columns are fine, though some cells in Excel (like row 10 in
EDU_DAD
) have comma-separated values; in SPSS, the last value is used.
Data from SPSS
NA |
NA |
NA |
2 |
3 |
NA |
NA |
NA |
5 |
5 |
NA |
NA |
NA |
7 |
5 |
NA |
NA |
NA |
4 |
6 |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
7 |
6 |
Data from Excel
A001 |
NA |
0,9 |
2 |
3 |
A002 |
2,9 |
3,9 |
5 |
5 |
A003 |
NA |
3,9 |
7 |
5 |
A005 |
2,2 |
2,9 |
4 |
6 |
A007 |
NA |
NA |
NA |
NA |
A008 |
NA |
3,9 |
7 |
6 |
Because of this weird discrepancy, I use all columnns from the SPSS data (since it takes care of the strange comma-separated values), but I add the ID column from Excel for identification and grouping purposes later. For now, I ignore ACT
, BIRTH
, and other columns that behave strangely.
Final raw data
A001 |
|
|
2 |
3 |
A002 |
|
|
5 |
5 |
A003 |
|
|
7 |
5 |
A005 |
|
|
4 |
6 |
A007 |
|
|
|
|
A008 |
|
|
7 |
6 |
Calculate index totals
Map questions to indexes
There is a simplified codebook that matches each of the questions in the survey to specific items in the four main PSM surveys:
psm_codebook_simple <- read_excel(file.path(here(), "data", "data_raw", "psm measures_1.xls"),
sheet = "Sheet1", col_names = c("variable", "details", "question")) %>%
mutate(index = ifelse(str_detect(variable, "psm_"), NA, variable)) %>%
fill(index) %>%
filter(!is.na(details)) %>%
mutate(index = recode(index, `Int'l Measure` = "International"),
variable = str_to_upper(variable),
variable = recode(variable, PSM_25 = "PSM_25_R", PSM_41 = "PSM_41_R"))
psm_codebook_simple %>% datatable()
Calculate totals
Using this cleaner data, I extract the questions that apply to each of the four indexes:
grant_vars <- psm_codebook_simple %>% filter(index == "Grant") %>% pull(variable)
intl_vars <- psm_codebook_simple %>% filter(index == "International") %>% pull(variable)
msp_vars <- psm_codebook_simple %>% filter(index == "MSPB5") %>% pull(variable)
perry_vars <- psm_codebook_simple %>% filter(index == "Perry") %>% pull(variable)
index_vars <- tribble(
~Index, ~Columns,
"Perry", perry_vars,
"MSPB5", msp_vars,
"Grant", grant_vars,
"International", intl_vars
) %>%
mutate(Variables = Columns %>% map_int(~ length(.)),
Columns = Columns %>% map_chr(~ paste0("`", ., "`", collapse = ", ")))
index_vars %>% pandoc.table(justify = "lll")
Perry |
PSM_25_R , PSM_06 , PSM_27 , PSM_28 , PSM_29 , PSM_30 , PSM_31 , PSM_32 , PSM_33 , PSM_34 , PSM_35 , PSM_36 , PSM_07 , PSM_38 , PSM_39 , PSM_46 , PSM_40 , PSM_41_R , PSM_42 , PSM_43 , PSM_44 , PSM_45 , PSM_21 , PSM_20 |
24 |
MSPB5 |
PSM_20 , PSM_06 , PSM_07 , PSM_46 , PSM_09 |
5 |
Grant |
PSM_16 , PSM_17 , PSM_18 , PSM_19 |
4 |
International |
PSM_10 , PSM_11 , PSM_20 , PSM_12 , PSM_13 , PSM_14 , PSM_15 , PSM_05 , PSM_01 , PSM_02 , PSM_03 , PSM_04 , PSM_08 , PSM_22 , PSM_23 , PSM_24 |
16 |
I add each of those columns up to create the index for each individual (and calculate the z-score):
psm_indexes <- psm_raw %>%
mutate(index_perry = rowSums(select(., one_of(perry_vars))),
index_msp = rowSums(select(., one_of(msp_vars))),
index_grant = rowSums(select(., one_of(grant_vars))),
index_intl = rowSums(select(., one_of(intl_vars)))) %>%
mutate_at(vars(starts_with("index")), funs(z = scale(.)))
psm_indexes %>%
select(ID, starts_with("index")) %>%
head() %>%
pandoc.table(split.table = Inf, missing = "")
A001 |
|
|
16 |
|
|
|
-0.4207 |
|
A002 |
72 |
19 |
17 |
66 |
-0.4323 |
0.4633 |
-0.04333 |
0.3885 |
A003 |
66 |
14 |
18 |
59 |
-0.9209 |
-1.138 |
0.334 |
-0.4546 |
A005 |
81 |
19 |
20 |
66 |
0.3005 |
0.4633 |
1.089 |
0.3885 |
A007 |
|
|
|
|
|
|
|
|
A008 |
|
16 |
19 |
68 |
|
-0.4975 |
0.7114 |
0.6294 |
Save data
Finally, I save a clean version of the data for use elsewhere.
LS0tCnRpdGxlOiAiRGF0YSBjbGVhbmluZyIKZGF0ZTogIjIwMTctMTAtMzEiCmVkaXRvcl9vcHRpb25zOiAKICBjaHVua19vdXRwdXRfdHlwZTogY29uc29sZQotLS0KCmBgYHtyIGxvYWQtbGlicmFyaWVzLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShoYXZlbikKbGlicmFyeShyZWFkeGwpCmxpYnJhcnkocGFuZGVyKQpsaWJyYXJ5KHN0cmluZ3IpCmxpYnJhcnkoRFQpCmxpYnJhcnkoaGVyZSkKYGBgCgojIyBMb2FkIGRhdGEKClRoZXJlIGFyZSB0d28gZGlmZmVyZW50IHZlcnNpb25zIG9mIHRoZSBkYXRh4oCUYW4gU1BTUyBmaWxlIGFuZCBhbiBFeGNlbCBmaWxl4oCUYW5kIG5laXRoZXIgYXJlIHBlcmZlY3RseSBpZGVudGljYWwuIFRoZXNlIGNvbHVtbnMgaGF2ZSBpc3N1ZXM6CgotIGBJRGA6IFRoaXMgY29sdW1ubiBpcyBmdWxsIG9mIE5BcyBpbiBTUFNTOyBpdCBleGlzdHMgaW4gRXhjZWwuCi0gYEFDVGAsIGBCSVJUSGAsIGBTQVRgLCBgVUdBX0lEYDogVGhlc2UgY29sdW1ucyBhcmUgYWxtb3N0IGZ1bGwgb2YgTkFzIGluIFNQU1M7IHRoZXkga2luZCBvZiBleGlzdCBpbiBFeGNlbOKAlHdoZW4gdmFsdWVzIGRvIGV4aXN0LCB0aGV5J3JlIHNlcGFyYXRlZCBieSBjb21tYXMsIGxpa2UgYDIsOWAsIGAxLDNgLCBgMSwgOCwgOSwgMGAsIGV0Yy4KLSBgQklSVEhgOiBUaGlzIGNvbHVtbiBpcyBhbG1vc3QgZnVsbCBvZiBOQXMgaW4gU1BTUzsgaXQga2luZCBvZiBleGlzdHMgaW4gRXhjZWzigJR3aGVuIHZhbHVlcyBkbyBleGlzdCwgdGhleSdyZSBhbHNvIHNlcGFyYXRlZCBieSBjb21tYXMuCi0gTW9zdCBvdGhlciBjb2x1bW5zIGFyZSBmaW5lLCB0aG91Z2ggc29tZSBjZWxscyBpbiBFeGNlbCAobGlrZSByb3cgMTAgaW4gYEVEVV9EQURgKSBoYXZlIGNvbW1hLXNlcGFyYXRlZCB2YWx1ZXM7IGluIFNQU1MsIHRoZSBsYXN0IHZhbHVlIGlzIHVzZWQuCgpgYGB7ciBsb2FkLWRhdGF9CnBzbV9yYXdfc3BzcyA8LSByZWFkX3Nwc3MoZmlsZS5wYXRoKGhlcmUoKSwgImRhdGEiLCAiZGF0YV9yYXciLCAiU3R1ZGVudCBQU00gU3VydmV5IERhdGEuc2F2IikpICU+JSAKICB6YXBfZm9ybWF0cygpICU+JSB6YXBfd2lkdGhzKCkKCnBzbV9yYXdfZXhjZWwgPC0gcmVhZF9leGNlbChmaWxlLnBhdGgoaGVyZSgpLCAiZGF0YSIsICJkYXRhX3JhdyIsICJTdHVkZW50IFBTTSBTdXJ2ZXkgTWFzdGVyIERhdGEueGxzeCIpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgc2hlZXQgPSAiTWFzdGVyIERhdGEiKQpgYGAKCiMjIyMgRGF0YSBmcm9tIFNQU1MKCmBgYHtyIHNob3ctc3BzcywgcmVzdWx0cz0iYXNpcyJ9CnBzbV9yYXdfc3BzcyAlPiUKICBzZWxlY3QoMTo1KSAlPiUgaGVhZCgpICU+JSAKICBwYW5kb2MudGFibGUoKQpgYGAKCiMjIyMgRGF0YSBmcm9tIEV4Y2VsCgpgYGB7ciBzaG93LWV4Y2VsLCByZXN1bHRzPSJhc2lzIn0KcHNtX3Jhd19leGNlbCAlPiUKICBzZWxlY3QoMTo1KSAlPiUgaGVhZCgpICU+JQogIHBhbmRvYy50YWJsZSgpCmBgYAoKQmVjYXVzZSBvZiB0aGlzIHdlaXJkIGRpc2NyZXBhbmN5LCBJIHVzZSBhbGwgY29sdW1ubnMgZnJvbSB0aGUgU1BTUyBkYXRhIChzaW5jZSBpdCB0YWtlcyBjYXJlIG9mIHRoZSBzdHJhbmdlIGNvbW1hLXNlcGFyYXRlZCB2YWx1ZXMpLCBidXQgSSBhZGQgdGhlIElEIGNvbHVtbiBmcm9tIEV4Y2VsIGZvciBpZGVudGlmaWNhdGlvbiBhbmQgZ3JvdXBpbmcgcHVycG9zZXMgbGF0ZXIuIEZvciBub3csIEkgaWdub3JlIGBBQ1RgLCBgQklSVEhgLCBhbmQgb3RoZXIgY29sdW1ucyB0aGF0IGJlaGF2ZSBzdHJhbmdlbHkuCgojIyMjIEZpbmFsIHJhdyBkYXRhCgpgYGB7ciBmaW5hbC1yYXctZGF0YSwgcmVzdWx0cz0iYXNpcyJ9CnBzbV9yYXcgPC0gYmluZF9jb2xzKAogIHNlbGVjdChwc21fcmF3X2V4Y2VsLCAxKSwKICBzZWxlY3QocHNtX3Jhd19zcHNzLCAtMSkKKQoKcHNtX3JhdyAlPiUgCiAgc2VsZWN0KDE6NSkgJT4lIGhlYWQoKSAlPiUKICBwYW5kb2MudGFibGUobWlzc2luZyA9ICIiKQpgYGAKCgojIyBDYWxjdWxhdGUgaW5kZXggdG90YWxzCgojIyMgTWFwIHF1ZXN0aW9ucyB0byBpbmRleGVzCgpUaGVyZSBpcyBhIHNpbXBsaWZpZWQgY29kZWJvb2sgdGhhdCBtYXRjaGVzIGVhY2ggb2YgdGhlIHF1ZXN0aW9ucyBpbiB0aGUgc3VydmV5IHRvIHNwZWNpZmljIGl0ZW1zIGluIHRoZSBmb3VyIG1haW4gUFNNIHN1cnZleXM6CgpgYGB7ciBsb2FkbC1jbGVhbi1pbmRleCwgcmVzdWx0cz0iYXNpcyJ9CnBzbV9jb2RlYm9va19zaW1wbGUgPC0gcmVhZF9leGNlbChmaWxlLnBhdGgoaGVyZSgpLCAiZGF0YSIsICJkYXRhX3JhdyIsICJwc20gbWVhc3VyZXNfMS54bHMiKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHNoZWV0ID0gIlNoZWV0MSIsIGNvbF9uYW1lcyA9IGMoInZhcmlhYmxlIiwgImRldGFpbHMiLCAicXVlc3Rpb24iKSkgJT4lCiAgbXV0YXRlKGluZGV4ID0gaWZlbHNlKHN0cl9kZXRlY3QodmFyaWFibGUsICJwc21fIiksIE5BLCB2YXJpYWJsZSkpICU+JQogIGZpbGwoaW5kZXgpICU+JQogIGZpbHRlcighaXMubmEoZGV0YWlscykpICU+JQogIG11dGF0ZShpbmRleCA9IHJlY29kZShpbmRleCwgYEludCdsIE1lYXN1cmVgID0gIkludGVybmF0aW9uYWwiKSwKICAgICAgICAgdmFyaWFibGUgPSBzdHJfdG9fdXBwZXIodmFyaWFibGUpLAogICAgICAgICB2YXJpYWJsZSA9IHJlY29kZSh2YXJpYWJsZSwgUFNNXzI1ID0gIlBTTV8yNV9SIiwgUFNNXzQxID0gIlBTTV80MV9SIikpCgpwc21fY29kZWJvb2tfc2ltcGxlICU+JSBkYXRhdGFibGUoKQpgYGAKCiMjIyBDYWxjdWxhdGUgdG90YWxzCgpVc2luZyB0aGlzIGNsZWFuZXIgZGF0YSwgSSBleHRyYWN0IHRoZSBxdWVzdGlvbnMgdGhhdCBhcHBseSB0byBlYWNoIG9mIHRoZSBmb3VyIGluZGV4ZXM6CgpgYGB7ciBhZGQtaW5kZXgtdG90YWxzLCByZXN1bHRzPSJhc2lzIn0KZ3JhbnRfdmFycyA8LSBwc21fY29kZWJvb2tfc2ltcGxlICU+JSBmaWx0ZXIoaW5kZXggPT0gIkdyYW50IikgJT4lIHB1bGwodmFyaWFibGUpCmludGxfdmFycyA8LSBwc21fY29kZWJvb2tfc2ltcGxlICU+JSBmaWx0ZXIoaW5kZXggPT0gIkludGVybmF0aW9uYWwiKSAlPiUgcHVsbCh2YXJpYWJsZSkKbXNwX3ZhcnMgPC0gcHNtX2NvZGVib29rX3NpbXBsZSAlPiUgZmlsdGVyKGluZGV4ID09ICJNU1BCNSIpICU+JSBwdWxsKHZhcmlhYmxlKQpwZXJyeV92YXJzIDwtIHBzbV9jb2RlYm9va19zaW1wbGUgJT4lIGZpbHRlcihpbmRleCA9PSAiUGVycnkiKSAlPiUgcHVsbCh2YXJpYWJsZSkKCmluZGV4X3ZhcnMgPC0gdHJpYmJsZSgKICB+SW5kZXgsIH5Db2x1bW5zLAogICJQZXJyeSIsIHBlcnJ5X3ZhcnMsCiAgIk1TUEI1IiwgbXNwX3ZhcnMsCiAgIkdyYW50IiwgZ3JhbnRfdmFycywKICAiSW50ZXJuYXRpb25hbCIsIGludGxfdmFycwopICU+JQogIG11dGF0ZShWYXJpYWJsZXMgPSBDb2x1bW5zICU+JSBtYXBfaW50KH4gbGVuZ3RoKC4pKSwKICAgICAgICAgQ29sdW1ucyA9IENvbHVtbnMgJT4lIG1hcF9jaHIofiBwYXN0ZTAoImAiLCAuLCAiYCIsIGNvbGxhcHNlID0gIiwgIikpKQoKaW5kZXhfdmFycyAlPiUgcGFuZG9jLnRhYmxlKGp1c3RpZnkgPSAibGxsIikKYGBgCgpJIGFkZCBlYWNoIG9mIHRob3NlIGNvbHVtbnMgdXAgdG8gY3JlYXRlIHRoZSBpbmRleCBmb3IgZWFjaCBpbmRpdmlkdWFsIChhbmQgY2FsY3VsYXRlIHRoZSB6LXNjb3JlKToKCmBgYHtyIGNhbGN1bGF0ZS1pbmRleC10b3RhbHMsIHJlc3VsdHM9ImFzaXMifQpwc21faW5kZXhlcyA8LSBwc21fcmF3ICU+JQogIG11dGF0ZShpbmRleF9wZXJyeSA9IHJvd1N1bXMoc2VsZWN0KC4sIG9uZV9vZihwZXJyeV92YXJzKSkpLAogICAgICAgICBpbmRleF9tc3AgPSByb3dTdW1zKHNlbGVjdCguLCBvbmVfb2YobXNwX3ZhcnMpKSksCiAgICAgICAgIGluZGV4X2dyYW50ID0gcm93U3VtcyhzZWxlY3QoLiwgb25lX29mKGdyYW50X3ZhcnMpKSksCiAgICAgICAgIGluZGV4X2ludGwgPSByb3dTdW1zKHNlbGVjdCguLCBvbmVfb2YoaW50bF92YXJzKSkpKSAlPiUKICBtdXRhdGVfYXQodmFycyhzdGFydHNfd2l0aCgiaW5kZXgiKSksIGZ1bnMoeiA9IHNjYWxlKC4pKSkKCnBzbV9pbmRleGVzICU+JSAKICBzZWxlY3QoSUQsIHN0YXJ0c193aXRoKCJpbmRleCIpKSAlPiUKICBoZWFkKCkgJT4lCiAgcGFuZG9jLnRhYmxlKHNwbGl0LnRhYmxlID0gSW5mLCBtaXNzaW5nID0gIiIpCmBgYAoKIyMgU2F2ZSBkYXRhCgpGaW5hbGx5LCBJIHNhdmUgYSBjbGVhbiB2ZXJzaW9uIG9mIHRoZSBkYXRhIGZvciB1c2UgZWxzZXdoZXJlLgoKYGBge3Igc2F2ZS1jbGVhbi1kYXRhfQp3cml0ZV9jc3YocHNtX2luZGV4ZXMsIGZpbGUucGF0aChoZXJlKCksICJkYXRhIiwgImRhdGFfY2xlYW4iLCAicHNtX2NsZWFuLmNzdiIpKQpgYGAK