In this homework assignment you will…
Go to the sta199-fa21-003 organization on GitHub. Click on the repo with the prefix hw-04. It contains the starter documents you need to complete the assignment.
Clone the repo and start a new project in RStudio. See the Lab 01 instructions for details on cloning a repo and starting a new R project.
You will work with the following packages:
library(tidyverse)
library(tidymodels)
The dataset is adapted from Little et al. (2007), and contains voice measurements from individuals both with and without Parkinson’s Disease (PD), a progressive neurological disorder that affects the motor system. The aim of Little et al.’s study was to examine whether Parkinson’s Disease could be diagnosed by examining the spectral (sound-wave) properties of patients’ voices.
147 measurements were taken from patients with PD, and 48 measurements were taken from healthy patients who served as controls. For the purposes of this assignment, you may assume that measurements are representative of the underlying populations (PD vs. healthy).
The variables in the dataset are as follows:
clip
: ID of the recording numberjitter
: a measure of variation in fundamental frequencyshimmer
: a measure of variation in amplitudehnr
: a ratio of total components vs. noise in the voice recordingstatus
: PD vs. Healthyavg.f.q
: 1, 2, or 3, corresponding to average vocal fundamental frequency
The data are in parkinsons.csv
located in the data
folder. Write code to load the data into your R Markdown file.
Is there enough evidence to suggest that the mean HNR in the voice recordings of adults with Parkinson’s Disease is significantly different from 20? State the null and alternative hypotheses in words and mathematical notation.
Describe precisely how you would set up the simulation to construct the null distribution for the test in Exercise 1. In your description, you can imagine using index cards to represent the data. Your description should also include specifics about the size of the sample drawn at each iteration and what statistic is calculated. You can assume the number of reps for the simulation is 10,000.
Construct the null distribution using set.seed(3)
.
Calculate a 95% bootstrap confidence interval for the mean HNR in the voice recordings of adults with Parkinson’s disease. Use set.seed(4)
.
Do the data provide evidence that a majority of healthy adults have a “high” average vocal fundamental frequency? State the null and alternative hypotheses in words and mathematical notation.
Describe precisely how you would set up the simulation to construct the null distribution for the test in Exercise 5. In your description, you can imagine using blue and red marbles to represent the data. Your description should also include specifics about the size of the sample drawn at each iteration and what statistic is calculated. You can assume the number of reps for the simulation is 10,000.
Construct the null distribution using set.seed(7)
.
Calculate a 90% bootstrap confidence interval for the proportion of healthy adults who have a “high” vocal average vocal fundamental frequency. Interpret the interval in the context of the data. Use set.seed(8)
.
Are a patient’s status and average vocal fundamental frequency independent? To answer this question, conduct a test of the following hypotheses:
\(H_0: p_{H} = p_{PD} \text{ vs. }H_a: p_{H} \neq p_{PD}\)
where \(p_{H}\) is the proportion of healthy adults who have “low” average vocal fundamental frequency and \(p_{PD}\) is the proportion of adults with Parkinson’s Disease who have “low” average vocal fundamental frequency.
set.seed(9)
.Given your conclusion in Exercise 9, which type of error could you possibly have made? What would making such an error mean in the context of the research question?
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Workflow & formatting” section with the first page.
Component | Points |
---|---|
Ex 1 | 3 |
Ex 2 | 4 |
Ex 3 | 5 |
Ex 4 | 6 |
Ex 5 | 3 |
Ex 6 | 4 |
Ex 7 | 5 |
Ex 8 | 6 |
Ex 9 | 8 |
Ex 10 | 3 |
Workflow & formatting | 3 |
Workflow and formatting includes having at least three meaningful commits, a neatly formatted PDF document with readable headers, updating the name and date, using the tidyverse syntax, and naming all code chunks.