Load data

There are two different versions of the data—an SPSS file and an Excel file—and neither are perfectly identical. These columns have issues:

  • ID: This columnn is full of NAs in SPSS; it exists in Excel.
  • ACT, BIRTH, SAT, UGA_ID: These columns are almost full of NAs in SPSS; they kind of exist in Excel—when values do exist, they’re separated by commas, like 2,9, 1,3, 1, 8, 9, 0, etc.
  • BIRTH: This column is almost full of NAs in SPSS; it kind of exists in Excel—when values do exist, they’re also separated by commas.
  • Most other columns are fine, though some cells in Excel (like row 10 in EDU_DAD) have comma-separated values; in SPSS, the last value is used.

Data from SPSS

ID ACT BIRTH EDU_DAD EDU_MOM
NA NA NA 2 3
NA NA NA 5 5
NA NA NA 7 5
NA NA NA 4 6
NA NA NA NA NA
NA NA NA 7 6

Data from Excel

ID ACT BIRTH EDU_DAD EDU_MOM
A001 NA 0,9 2 3
A002 2,9 3,9 5 5
A003 NA 3,9 7 5
A005 2,2 2,9 4 6
A007 NA NA NA NA
A008 NA 3,9 7 6

Because of this weird discrepancy, I use all columnns from the SPSS data (since it takes care of the strange comma-separated values), but I add the ID column from Excel for identification and grouping purposes later. For now, I ignore ACT, BIRTH, and other columns that behave strangely.

Final raw data

ID ACT BIRTH EDU_DAD EDU_MOM
A001 2 3
A002 5 5
A003 7 5
A005 4 6
A007
A008 7 6

Calculate index totals

Map questions to indexes

There is a simplified codebook that matches each of the questions in the survey to specific items in the four main PSM surveys:

Calculate totals

Using this cleaner data, I extract the questions that apply to each of the four indexes:

Index Columns Variables
Perry PSM_25_R, PSM_06, PSM_27, PSM_28, PSM_29, PSM_30, PSM_31, PSM_32, PSM_33, PSM_34, PSM_35, PSM_36, PSM_07, PSM_38, PSM_39, PSM_46, PSM_40, PSM_41_R, PSM_42, PSM_43, PSM_44, PSM_45, PSM_21, PSM_20 24
MSPB5 PSM_20, PSM_06, PSM_07, PSM_46, PSM_09 5
Grant PSM_16, PSM_17, PSM_18, PSM_19 4
International PSM_10, PSM_11, PSM_20, PSM_12, PSM_13, PSM_14, PSM_15, PSM_05, PSM_01, PSM_02, PSM_03, PSM_04, PSM_08, PSM_22, PSM_23, PSM_24 16

I add each of those columns up to create the index for each individual (and calculate the z-score):

ID index_perry index_msp index_grant index_intl index_perry_z index_msp_z index_grant_z index_intl_z
A001 16 -0.4207
A002 72 19 17 66 -0.4323 0.4633 -0.04333 0.3885
A003 66 14 18 59 -0.9209 -1.138 0.334 -0.4546
A005 81 19 20 66 0.3005 0.4633 1.089 0.3885
A007
A008 16 19 68 -0.4975 0.7114 0.6294

Save data

Finally, I save a clean version of the data for use elsewhere.

LS0tCnRpdGxlOiAiRGF0YSBjbGVhbmluZyIKZGF0ZTogIjIwMTctMTAtMzEiCmVkaXRvcl9vcHRpb25zOiAKICBjaHVua19vdXRwdXRfdHlwZTogY29uc29sZQotLS0KCmBgYHtyIGxvYWQtbGlicmFyaWVzLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShoYXZlbikKbGlicmFyeShyZWFkeGwpCmxpYnJhcnkocGFuZGVyKQpsaWJyYXJ5KHN0cmluZ3IpCmxpYnJhcnkoRFQpCmxpYnJhcnkoaGVyZSkKYGBgCgojIyBMb2FkIGRhdGEKClRoZXJlIGFyZSB0d28gZGlmZmVyZW50IHZlcnNpb25zIG9mIHRoZSBkYXRh4oCUYW4gU1BTUyBmaWxlIGFuZCBhbiBFeGNlbCBmaWxl4oCUYW5kIG5laXRoZXIgYXJlIHBlcmZlY3RseSBpZGVudGljYWwuIFRoZXNlIGNvbHVtbnMgaGF2ZSBpc3N1ZXM6CgotIGBJRGA6IFRoaXMgY29sdW1ubiBpcyBmdWxsIG9mIE5BcyBpbiBTUFNTOyBpdCBleGlzdHMgaW4gRXhjZWwuCi0gYEFDVGAsIGBCSVJUSGAsIGBTQVRgLCBgVUdBX0lEYDogVGhlc2UgY29sdW1ucyBhcmUgYWxtb3N0IGZ1bGwgb2YgTkFzIGluIFNQU1M7IHRoZXkga2luZCBvZiBleGlzdCBpbiBFeGNlbOKAlHdoZW4gdmFsdWVzIGRvIGV4aXN0LCB0aGV5J3JlIHNlcGFyYXRlZCBieSBjb21tYXMsIGxpa2UgYDIsOWAsIGAxLDNgLCBgMSwgOCwgOSwgMGAsIGV0Yy4KLSBgQklSVEhgOiBUaGlzIGNvbHVtbiBpcyBhbG1vc3QgZnVsbCBvZiBOQXMgaW4gU1BTUzsgaXQga2luZCBvZiBleGlzdHMgaW4gRXhjZWzigJR3aGVuIHZhbHVlcyBkbyBleGlzdCwgdGhleSdyZSBhbHNvIHNlcGFyYXRlZCBieSBjb21tYXMuCi0gTW9zdCBvdGhlciBjb2x1bW5zIGFyZSBmaW5lLCB0aG91Z2ggc29tZSBjZWxscyBpbiBFeGNlbCAobGlrZSByb3cgMTAgaW4gYEVEVV9EQURgKSBoYXZlIGNvbW1hLXNlcGFyYXRlZCB2YWx1ZXM7IGluIFNQU1MsIHRoZSBsYXN0IHZhbHVlIGlzIHVzZWQuCgpgYGB7ciBsb2FkLWRhdGF9CnBzbV9yYXdfc3BzcyA8LSByZWFkX3Nwc3MoZmlsZS5wYXRoKGhlcmUoKSwgImRhdGEiLCAiZGF0YV9yYXciLCAiU3R1ZGVudCBQU00gU3VydmV5IERhdGEuc2F2IikpICU+JSAKICB6YXBfZm9ybWF0cygpICU+JSB6YXBfd2lkdGhzKCkKCnBzbV9yYXdfZXhjZWwgPC0gcmVhZF9leGNlbChmaWxlLnBhdGgoaGVyZSgpLCAiZGF0YSIsICJkYXRhX3JhdyIsICJTdHVkZW50IFBTTSBTdXJ2ZXkgTWFzdGVyIERhdGEueGxzeCIpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgc2hlZXQgPSAiTWFzdGVyIERhdGEiKQpgYGAKCiMjIyMgRGF0YSBmcm9tIFNQU1MKCmBgYHtyIHNob3ctc3BzcywgcmVzdWx0cz0iYXNpcyJ9CnBzbV9yYXdfc3BzcyAlPiUKICBzZWxlY3QoMTo1KSAlPiUgaGVhZCgpICU+JSAKICBwYW5kb2MudGFibGUoKQpgYGAKCiMjIyMgRGF0YSBmcm9tIEV4Y2VsCgpgYGB7ciBzaG93LWV4Y2VsLCByZXN1bHRzPSJhc2lzIn0KcHNtX3Jhd19leGNlbCAlPiUKICBzZWxlY3QoMTo1KSAlPiUgaGVhZCgpICU+JQogIHBhbmRvYy50YWJsZSgpCmBgYAoKQmVjYXVzZSBvZiB0aGlzIHdlaXJkIGRpc2NyZXBhbmN5LCBJIHVzZSBhbGwgY29sdW1ubnMgZnJvbSB0aGUgU1BTUyBkYXRhIChzaW5jZSBpdCB0YWtlcyBjYXJlIG9mIHRoZSBzdHJhbmdlIGNvbW1hLXNlcGFyYXRlZCB2YWx1ZXMpLCBidXQgSSBhZGQgdGhlIElEIGNvbHVtbiBmcm9tIEV4Y2VsIGZvciBpZGVudGlmaWNhdGlvbiBhbmQgZ3JvdXBpbmcgcHVycG9zZXMgbGF0ZXIuIEZvciBub3csIEkgaWdub3JlIGBBQ1RgLCBgQklSVEhgLCBhbmQgb3RoZXIgY29sdW1ucyB0aGF0IGJlaGF2ZSBzdHJhbmdlbHkuCgojIyMjIEZpbmFsIHJhdyBkYXRhCgpgYGB7ciBmaW5hbC1yYXctZGF0YSwgcmVzdWx0cz0iYXNpcyJ9CnBzbV9yYXcgPC0gYmluZF9jb2xzKAogIHNlbGVjdChwc21fcmF3X2V4Y2VsLCAxKSwKICBzZWxlY3QocHNtX3Jhd19zcHNzLCAtMSkKKQoKcHNtX3JhdyAlPiUgCiAgc2VsZWN0KDE6NSkgJT4lIGhlYWQoKSAlPiUKICBwYW5kb2MudGFibGUobWlzc2luZyA9ICIiKQpgYGAKCgojIyBDYWxjdWxhdGUgaW5kZXggdG90YWxzCgojIyMgTWFwIHF1ZXN0aW9ucyB0byBpbmRleGVzCgpUaGVyZSBpcyBhIHNpbXBsaWZpZWQgY29kZWJvb2sgdGhhdCBtYXRjaGVzIGVhY2ggb2YgdGhlIHF1ZXN0aW9ucyBpbiB0aGUgc3VydmV5IHRvIHNwZWNpZmljIGl0ZW1zIGluIHRoZSBmb3VyIG1haW4gUFNNIHN1cnZleXM6CgpgYGB7ciBsb2FkbC1jbGVhbi1pbmRleCwgcmVzdWx0cz0iYXNpcyJ9CnBzbV9jb2RlYm9va19zaW1wbGUgPC0gcmVhZF9leGNlbChmaWxlLnBhdGgoaGVyZSgpLCAiZGF0YSIsICJkYXRhX3JhdyIsICJwc20gbWVhc3VyZXNfMS54bHMiKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHNoZWV0ID0gIlNoZWV0MSIsIGNvbF9uYW1lcyA9IGMoInZhcmlhYmxlIiwgImRldGFpbHMiLCAicXVlc3Rpb24iKSkgJT4lCiAgbXV0YXRlKGluZGV4ID0gaWZlbHNlKHN0cl9kZXRlY3QodmFyaWFibGUsICJwc21fIiksIE5BLCB2YXJpYWJsZSkpICU+JQogIGZpbGwoaW5kZXgpICU+JQogIGZpbHRlcighaXMubmEoZGV0YWlscykpICU+JQogIG11dGF0ZShpbmRleCA9IHJlY29kZShpbmRleCwgYEludCdsIE1lYXN1cmVgID0gIkludGVybmF0aW9uYWwiKSwKICAgICAgICAgdmFyaWFibGUgPSBzdHJfdG9fdXBwZXIodmFyaWFibGUpLAogICAgICAgICB2YXJpYWJsZSA9IHJlY29kZSh2YXJpYWJsZSwgUFNNXzI1ID0gIlBTTV8yNV9SIiwgUFNNXzQxID0gIlBTTV80MV9SIikpCgpwc21fY29kZWJvb2tfc2ltcGxlICU+JSBkYXRhdGFibGUoKQpgYGAKCiMjIyBDYWxjdWxhdGUgdG90YWxzCgpVc2luZyB0aGlzIGNsZWFuZXIgZGF0YSwgSSBleHRyYWN0IHRoZSBxdWVzdGlvbnMgdGhhdCBhcHBseSB0byBlYWNoIG9mIHRoZSBmb3VyIGluZGV4ZXM6CgpgYGB7ciBhZGQtaW5kZXgtdG90YWxzLCByZXN1bHRzPSJhc2lzIn0KZ3JhbnRfdmFycyA8LSBwc21fY29kZWJvb2tfc2ltcGxlICU+JSBmaWx0ZXIoaW5kZXggPT0gIkdyYW50IikgJT4lIHB1bGwodmFyaWFibGUpCmludGxfdmFycyA8LSBwc21fY29kZWJvb2tfc2ltcGxlICU+JSBmaWx0ZXIoaW5kZXggPT0gIkludGVybmF0aW9uYWwiKSAlPiUgcHVsbCh2YXJpYWJsZSkKbXNwX3ZhcnMgPC0gcHNtX2NvZGVib29rX3NpbXBsZSAlPiUgZmlsdGVyKGluZGV4ID09ICJNU1BCNSIpICU+JSBwdWxsKHZhcmlhYmxlKQpwZXJyeV92YXJzIDwtIHBzbV9jb2RlYm9va19zaW1wbGUgJT4lIGZpbHRlcihpbmRleCA9PSAiUGVycnkiKSAlPiUgcHVsbCh2YXJpYWJsZSkKCmluZGV4X3ZhcnMgPC0gdHJpYmJsZSgKICB+SW5kZXgsIH5Db2x1bW5zLAogICJQZXJyeSIsIHBlcnJ5X3ZhcnMsCiAgIk1TUEI1IiwgbXNwX3ZhcnMsCiAgIkdyYW50IiwgZ3JhbnRfdmFycywKICAiSW50ZXJuYXRpb25hbCIsIGludGxfdmFycwopICU+JQogIG11dGF0ZShWYXJpYWJsZXMgPSBDb2x1bW5zICU+JSBtYXBfaW50KH4gbGVuZ3RoKC4pKSwKICAgICAgICAgQ29sdW1ucyA9IENvbHVtbnMgJT4lIG1hcF9jaHIofiBwYXN0ZTAoImAiLCAuLCAiYCIsIGNvbGxhcHNlID0gIiwgIikpKQoKaW5kZXhfdmFycyAlPiUgcGFuZG9jLnRhYmxlKGp1c3RpZnkgPSAibGxsIikKYGBgCgpJIGFkZCBlYWNoIG9mIHRob3NlIGNvbHVtbnMgdXAgdG8gY3JlYXRlIHRoZSBpbmRleCBmb3IgZWFjaCBpbmRpdmlkdWFsIChhbmQgY2FsY3VsYXRlIHRoZSB6LXNjb3JlKToKCmBgYHtyIGNhbGN1bGF0ZS1pbmRleC10b3RhbHMsIHJlc3VsdHM9ImFzaXMifQpwc21faW5kZXhlcyA8LSBwc21fcmF3ICU+JQogIG11dGF0ZShpbmRleF9wZXJyeSA9IHJvd1N1bXMoc2VsZWN0KC4sIG9uZV9vZihwZXJyeV92YXJzKSkpLAogICAgICAgICBpbmRleF9tc3AgPSByb3dTdW1zKHNlbGVjdCguLCBvbmVfb2YobXNwX3ZhcnMpKSksCiAgICAgICAgIGluZGV4X2dyYW50ID0gcm93U3VtcyhzZWxlY3QoLiwgb25lX29mKGdyYW50X3ZhcnMpKSksCiAgICAgICAgIGluZGV4X2ludGwgPSByb3dTdW1zKHNlbGVjdCguLCBvbmVfb2YoaW50bF92YXJzKSkpKSAlPiUKICBtdXRhdGVfYXQodmFycyhzdGFydHNfd2l0aCgiaW5kZXgiKSksIGZ1bnMoeiA9IHNjYWxlKC4pKSkKCnBzbV9pbmRleGVzICU+JSAKICBzZWxlY3QoSUQsIHN0YXJ0c193aXRoKCJpbmRleCIpKSAlPiUKICBoZWFkKCkgJT4lCiAgcGFuZG9jLnRhYmxlKHNwbGl0LnRhYmxlID0gSW5mLCBtaXNzaW5nID0gIiIpCmBgYAoKIyMgU2F2ZSBkYXRhCgpGaW5hbGx5LCBJIHNhdmUgYSBjbGVhbiB2ZXJzaW9uIG9mIHRoZSBkYXRhIGZvciB1c2UgZWxzZXdoZXJlLgoKYGBge3Igc2F2ZS1jbGVhbi1kYXRhfQp3cml0ZV9jc3YocHNtX2luZGV4ZXMsIGZpbGUucGF0aChoZXJlKCksICJkYXRhIiwgImRhdGFfY2xlYW4iLCAicHNtX2NsZWFuLmNzdiIpKQpgYGAK