Author

Meng Ye

Last updated

March 28, 2023

Code
library(tidyverse)
library(targets)
library(here)
library(lubridate)
library(sf)
library(mapchina)

# Point to the _targets folder location since this qmd is in a subfolder
tar_config_set(store = here::here('_targets'),
               script = here::here('_targets.R'))

# Load targets
ongo <- tar_read(ongo)

Documentation of the pre-processing of scraped data

The inspected raw data is the version derived after inspecting the scraped csv file against the source website. The blank lines in the scraped csv file are from ROs1 that have been de-registered. The website keep their names in the general list but their profiles are all blank now.

A couple of scraping errors have also been corrected in the process. That is cases where the RO is active but shown as blank rows in the scraped. I manually added those rows back.

For those that already been de-registered, I added back their index, ro_name, and ro_id, and generated a dichotomous indicator that they are no more “active” in the still_active column.

For the still_active variable, the values are manually input for the blank lines. After checking against the official website, FALSE for those have indeed de-register and TRUE for those that turned out to be parsing errors. The rest NAs will be recoded to TRUE in data cleaning.

After the cross-checking process, the index number of ROs in our dataset is the same as the index in the official database as of 2022-10-04.

The following columns are also manually coded:

  • home: home county or region of the ONGO. Not using the name “country” to be flexible enough to denote, Hong Kong, Macau and Taiwan

  • psu_level: dummy variable indicating whether the official sponsoring unit (for registration and annual audit) is at the national or local level.

    It is clearly correlated with geo_scope_num, but should be a mediating factor for the relationship we are examining.

  • ro_count: There are cases where ONGOs are not granted to operate across China, so they register multiple ROs to cover enough number of provinces where they need to work. We surely want to include this variable because it is directly correlated with geo_scope_num

    A related multiple_index is an index number created to identify siblings, in the form of 100-1, 100-2…(starting from 100 to ensure all have three digits) NA for ONGOs only having one RO. Note there can be cases where one of the siblings has been de-registered, but only several.

  • local_aim: There are cases where the aim/mission of the ONGO is attached to a locality. So I created this dummy to indicate such situation. However, there seems to be some inconsistency in different ONGO’s choices whether to use a location name in their work field. Also, it might be the case that the location limitation is added as requested by the authority. So probably should not use this variable.

  • Index: Index number larger than 700 are ROs that were active at time of data scraping (Jan. 2022) but have been deregistered as of Oct. 2022.

  • work_field: original mission statement in Chinese. The name “field” is chosen for the moment, because it seems “area” is more easily to be confused with the geographical scope. It is not called “mission” to differentiate from the mission of the mother organization which is not subject to approval of the authority. Also, the statement of the work field tends to be more concrete than missions.

    work_field_code1 is the main work field code; work_field_code2 is an alternative code for the work field if the ONGO’s work touches on overlapping fields

    The field categories listed on Article 3 of the law.

    Overseas NGOs may, in accordance with the provisions of this Law, engage in undertakings of benefit to the public in the areas of the economy, education, science, culture, health, sports and environmental protection, as well as in the areas of poverty and disaster relief.

    Earlier fields used:

    • Commerce and Trade Promotion
    • Science and Technology
    • Education
    • Charity and Public Benefit
    • Environment
    • Industry Promotion and Self-regulation
    • Public Health
    • International Exchange
    • Multiple Fields (usually grant-making foundations)

Cross checking of ONGO work field coding

Google translation of the work field statements (disclosed in Chinese) are included. Researcher 1 is the main coder of the work field. When there is some ambiguity of the coding, Researcher 1 marks “TRUE” in the coding_need_check document and Researcher 2 cross check the coding of the work field.

Work Field coding notes

Decided by how the work fields of registered NGO ROs cluster

  • [Economy and trade]

  • [Environment] (including animal protection)

  • [Education] (including youth development and international cultural exchange)

  • [Charity and humanitarian] poverty alleviation, disaster relief, etc., including rural development and social work If rural development is researched at the macroeconomics level, coded as science and technology, eg. #375

  • [Science and technology]

  • [Industry association]

  • [Health] public health and health care system support (professional training)

  • [Arts and Culture] including sports (noted as “sports” in work_field_code2) and international culture exchange

  • [General] usually grant-making organizations, Or working on general international communication, including among governments, Or working on multiple fields not overlapping

(Chinese background) (ONGOs set up by Chinese communities abroad or by Chinese entities in foreign jurisdictions). Such ONGOs tend to have strong local connection with its origins. Not sure yet if should be kept as a separate variable. Now coded only in work_field_code2. (equality)

Only in work_field_code2, aim to right-based Charity and humanitarian and non-right-based Charity and humanitarian orgs

Coding rules used and notes

  1. Education vs. Charity

    If charitable activities are all education related, e.g. building schools, donating education facilities and books, coded as education

    If the aids provided include other community support, e.g. building bridges, roads, medical facilities, coded as charity

  2. Economy and trade vs. Industry association

    If the trade promoted is focused on a particular industry (poultry, grains etc.), coded as industry association

  3. Arts and culture vs. education

    Now includes sports organizations too.

  4. Industry association vs. health, science and technology, arts and culture

    There are professional association on health providers and certain techniques. Coded as health, science & technology at the moment because they also conducting activities of promoting certain techniques . And their nature of work can be quite different from industry associations on grains, wines, pecans etc. The are more of a professional association with specialized expertise.

    Same as arts and culture organizations: e.g., ballet dancer association

    Exception: specialized association coded as industry associations if only serving the member companies #91

  5. Health vs. Charity

    If only providing medical aids to underprivileged communities, not building schools etc., coded as health

  6. Charity vs. General

    If multiple work, e.g., poverty alleviation, education, health can be encompassed by Charity and humanitarian, then former, or latter (e.g.)

  7. International exchange category is dropped because it is assimilated by other fields, i.e., international exchange in education, arts and culture, science and technology

  8. Limitations: The sample/population only covers those INGOs that got successfully registered

  9. Chinese background, equality and sports are coded in work_field_code2, we can derive separate variables for them if needed

  10. Multiple ROs are coded based on the overall work area spans of the mother organization, not the specific RO, so ensure the work_area is consistent across ROs

Clean data

Province names

Code
province_list <- ongo |> 
  group_by(province_cn) |> 
  summarise(province_cn = first(province_cn)) |>
  drop_na() |> #de-registered ones
  pull(province_cn)
province_list
##  [1] "上海" "云南" "内蒙" "北京" "吉林" "四川" "天津" "宁夏" "安徽" "山东"
## [11] "广东" "广西" "江苏" "江西" "河北" "河南" "浙江" "海南" "湖北" "湖南"
## [21] "甘肃" "福建" "西藏" "贵州" "辽宁" "重庆" "陕西" "青海" "黑龙"

There are totally 29 provinces (including provincial-level cities e.g. Beijing and Shanghai) that have ONGO rep offices.

The maximum number of provinces that a RO can possible operate in is 32.

English organization names

Code
table(is.na(ongo$org_name_en))
## 
## FALSE  TRUE 
##   499    94

Footnotes

  1. RO/ro = representative office of ONGOs↩︎