Instructions

1 Data management

You have been given 2 datasets.

Dataset A includes the following variables:

Variable name Variable label
B1004 Country-year code
B1006_NAM Country name
B1008 Year
B2001 Age (number of years)
B2002 Gender (1=male; 2=female)
B2005 Union membership (1=is a member; 2=is not a member)
B2020 Income in household (1=low, 5=high)
B3014 It matters who people vote for (1 = it doesn’t matter; 5 = it matters a lot)

Dataset B includes the following variables:

Variable name Variable label
cname Country name
year Year of measurement
gle_cgdpc GDP per capita (in current prices)
p_polity2 Polity score (measure of democracy)
undp_hdi UNDP’s Human Development Index

1.1 Read data

Read in both datasets into R using the needed functions for their format. Note that dataset A is a .csv (Comma-Separated Value) file, while dataset B is an .xlsx (Excel) file.

1.2 Clean data

Compare the list of unique countries that are present in datasets A and B. There are 5 countries that are named differently between these 2 datasets. Find these countries and then rename those in dataset B so that they match the names used in dataset A.

1.3 Merge data

Merge dataset B into dataset A using country and year to match observations. The dataset resulting from the merging procedure can be called merged_df. Please perform the merging so that the resulting dataset, has the same number of rows as dataset A.

2 Descriptive statistics

From this point onward, please continue your work only with the merged dataset merged_df. (We also provide this dataset here in case you have not merged properly)

2.1 Summary statistics

For each country-year pair in the merged dataset, please compute the percentage of respondents who report being members of a union, as well as the GDP per capita recorded in that country-year.

Store this resulting country-year data as a data frame in a new R object called summary_df.

2.2 Display table

Display the first rows of summary_df as a table.

3 Analysis

The next questions use summary_df. If you did not create summary_df successfully you can access it here.

3.1 Scatterplot

Please produce a scatterplot of the relationship between union membership and GDP per capita. Plot union membership on the X-axis, and GDP per capita on the Y-axis.

3.2 Regression

Run an OLS regression of union membership on year. Display your output.

4 Editing

Please read this text and suggest improvements.

“One really has to say that in the last twelve month’s (See Masters 2022) there have been no
less protests about the conflict then any time before, this has lead some to wonder what affect that might have on future stability (see eg Masterson 2022 and Peters 2021, 2022.)”