Lab 04: Visualizing spatial data

due Monday, September 20 at 11:59p

Goals

In this lab you will…

Meet your team!

Click here to see the team assignments for STA 199. This will be your team for labs and the final project.

Before you get started on the lab, your TA will walk you through the following:

✅ Icebreaker activity to get to know your teammates.

✅ Come up with a team name. You can’t use the same name as another team, so I encourage you to be creative! Your TA will get your team name by the end of lab.

✅ Fill out the team agreement. This will help you figure out a plan for communication,and working together during labs and outside of lab times. You can find the team agreement in the GitHub repo team-agreement-[github_team_name].

Getting started

Workflow: Using git and GitHub as a team

Assign each person on your team a number 1 through 4. For teams of three, Team Member 1 can take on the role of Team Member 4.

The following exercises must be done in order. Only one person should type in the .Rmd file and push updates at a time. When it is not your turn to type, you should still share ideas and contribute to the team’s discussion.

Update YAML

Team Member 1: Change the author to your team name and include each team member’s name in the author field of the YAML in the following format. Team Name: Member 1, Member 2, Member 3, Member 4. Knit, commit, and push the changes to GitHub.

Team Members 2, 3, 4: Click the Pull** button in the Git pane to get the updated document. You should see the updated name in the .Rmd file.**

Packages

We will use the following packages:

library(tidyverse)
library(sf)

Data: North Carolina Congressional Districts and Redistricting.

In this lab you will use the sf package to visualize district-level congressional district data in the most recent congressional and presidential elections in North Carolina. Given the upcoming redistricting following the 2020 census, we will consider which districts may be overpopulated and thus may shrink as a result of redistricting.

The variables in nc_distsare as follows:

The variables in nc_newdata are as follows:

The presidential election data is from DailyKos Elections and the House election data is from the The CQ Voting and Elections Collection, accessed through Duke Libraries. The population data is from the 2020 census and has been compiled by Daily Kos Elections.

Exercises

Do the following exercises in order, following each step carefully.

Only one person at a time should type in the .Rmd file and push updates.

If you are working on any portion of the lab virtually, the person working should share their screen and the others should follow along.

🧶 ✅ ⬆️ Team Member 1: If you haven’t already, change the author to your team name and include each team member’s name in the author field of the YAML in the following format. Team Name: Team member 1, Team member 2, Team member 3, Team member 4.

Type the team’s response to Exercises 1 - 2.

  1. Join the nc and nc_newdata data frames to create a new data frame called nc_data. Hint: Include the argument by = c("DISTRICT" = "District") in the join function to join the data frames based on their respective district variable. The district is stored as numeric data in nc_new.. Use as.character() to make district a character data type before joining.

Click here for documentation on scale_fill_gradient2.

  1. Create a visualization with the congressional districts in North Carolina filled in based trump_pct_2020, the percent of votes in that congressional district for Donald Trump. Use scale_fill_gradient2() to apply an informative color palette. In the function, specify the low and high colors and set the midpoint to 50 (representing 50% of the vote).

    The colors “#DE0100” and “#0015BC” may be good choices for Republicans and Democrats, respectively, but you are welcome to choose different colors. Give the plot an informative title, subtitle, and label fill.

    • Write 2 observations from the plot.

🧶 ✅ ⬆️ Team member 1: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 1- 2.

Team Member 2: It’s your turn! Type the team’s response to exercises 3 - 4.

  1. Create a new variable called trump_change that measures the difference in the Republican vote share in 2020 and 2016. This variable should be calculated in way that represents how much better (or worse) Trump did in a district in 2020 compared to 2016.

  2. Create a visualization with the congressional districts in North Carolina filled in based on trump_change. Similar to Exercise 2, fill use informative colors, and set the midpoint to 0 (representing no change from 2016 to 2020.)

    • Use the plot to describe the change in votes for Trump in 2020 compared to 2016 across different congressional districts in North Carolina.

🧶 ✅ ⬆️ Team member 2: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 3 - 4.

Team Member 3: It’s your turn! Type the team’s response to exercise 5.

  1. Now let’s compare the Republican presidential performance in 2020 to the Republican congressional performance at the district level. The variable house_gop_pct_2020 represents the two-party Republican percentage of the U.S. House vote in 2020.

    • First, create a new variable gop_diff that measures the difference between the percentage of the vote received by the Republican U.S. House candidate and the percentage received by the Republican presidential candidate in 2020.
    • Then, create a plot with the congressional districts in North Carolina filled in based on gop_diff. Choose different colors and set the midpoint at 0 (thus representing no difference between the House Republican candidate and Trump’s vote percentage).
    • It may be helpful to add the district numbers to the plot. You can do so by adding geom_sf_text(aes(label = DISTRICT)) to the code used to make the plot.
    • Describe what you observe from this plot. Why was gop_diff in one congressional district so much different from the others? (You may need to do brief research on the 2020 North Carolina U.S. House elections to answer this question - any major news website may be useful.)

🧶 ✅ ⬆️ Team member 3: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercise 5.

Team Member 4: It’s your turn! Type the team’s response to exercise 6.

  1. In 2022, North Carolina will be required to draw new congressional districts, and they will all have to have equal population. The population is based on the 2020 U.S. Census. North Carolina will also have a 14th Congressional District for the first time in its history! The variable population_2020 contains a variable measuring the population of the current (2010 census) districts.

    Make a map of each district’s population based on the 2020 U.S. Census. Choose a new color palette for the scale. You can use the scale_fill_gradient function and do not have to set a midpoint.

    • What do you observe?
    • Which which area(s) of the state have the most overpopulated districts and thus the districts that will have to reduce in size in redistricting?

🧶 ✅ ⬆️ Team member 4: Knit, commit and push your changes to GitHub with an informative commit message. Make sure to commit and push all changed files so that your Git pane is empty afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the team’s completed lab!

Wrapping up

Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code).

Team Member 2: Make any edits as needed. Then knit, commit, and push the updated documents to GitHub if you made any changes.

All other team members can click to pull the finalized document.

Submission

There should only be one submission per team on Gradescope.

Grading

Component Points
Team Agreement 4
Ex 1 6
Ex 2 8
Ex 3 2
Ex 4 8
Ex 5 8
Ex 6 8
Workflow & formatting 6