In this lab you will…
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to the sta199-fa21-003 course organization on GitHub.
You should see a repo with the *lab-07** prefix.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
The goal of today’s lab is to use simulation-based inference to assess what makes a good burrito.
Today’s dataset has been adapted from Scott Cole’s Burritos of San Diego project, located here. The goal of the project was to identify the best and worst burritos in San Diego, characterize variance in burrito quality, and generate predictive models for what makes a burrito great.
As part of this project, 71 participants reviewed burritos from 79 different taco shops. Reviewers captured objective measures of the burrito (such as whether it contains certain ingredients) and reviewed it on a number of metrics (such as quality of the tortilla, the temperature, quality of meat, etc.). For the purposes of this lab, you may consider each of these observations to be an independent and representative sample of all burritos.
The subjective ratings in the dataset are as follows. Each variable is ranked on a 0 to 5 point scale, with 0 being the worst and 5 being the best.
tortilla
: quality of the tortillatemp
: temperature of the burritomeat
: quality of the meatfillings
: quality of non-meat fillingssalsa
: quality of the salsamfr
: meat-to-filling ratiouniformity
: whether each bite contains a uniform slew of ingredients (e.g., a bite entirely composed of tortilla and sour cream would probably be terrible)synergy
: how well it all comes togetherIn addition, the reviewers noted the presence of the following burrito components. Each of the following variables is a binary variable taking on values present
or none
:
guac
: guacamolecheese
: cheesefries
: fries (it’s a thing, look it up.)sourcream
: sour creamrice
: ricebeans
: beansThe data are available in burritos.csv
Sour cream on burritos: yay or nay? Explain.
Suppose you are worried that the presence of sour cream adversely affects the uniformity of the burrito. You decide to conduct a hypothesis test to evaluate whether the mean uniformity of burritos with sour cream is lower than burritos without sour cream.
Construct the null distribution for this test using set.see(3)
.
Calculate and interpret a 90% confidence interval for the difference you investigated in Exercises 2 and 3. Use set.seed(4)
.
Describe precisely how the simulation is set up to construct the bootstrap distribution used to calculate the confidence interval in Exercise 4. In your description, you can imagine using index cards to represent the data. Your description should also include specifics about the size of the sample drawn at each iteration and what statistic is calculated. You can assume the number of reps for the simulation is 10,000.
Your friend suggests that having sour cream and having guacamole on a burrito are dependent events. You decide to conduct a hypothesis test to assess your friend’s claim. The hypotheses for the test are \(H_0: p_{S} = p_{NS} \text{ vs. }H_a: p_{S} \neq p_{NS}\)
where \(p_{S}\) is the proportion of burritos with sour cream that have guacamole and \(p_{NS}\) is the proportion of burritos with no sour cream have guacamole
set.seed(6)
. Visualize the simulated null distribution and shade the area corresponding to the p-value.Calculate and interpret a 95% confidence interval for the difference investigated in Exercise 6. Use set.seed(7)
.
Create a new variable for overall burrito quality by taking the average scores for all ratings (tortilla quality, temperature, meat quality, etc.) in the dataset. Is there evidence that burritos with guacamole have a different average overall perceived quality score compared to burritos without guacamole? Evaluate this claim using a 99% confidence interval. Use set.seed(8)
.
In the previous exercise, we’ve treated the rating variables as numeric variables. Evaluate the merits of this approach: is it appropriate? Could it potentially be misleading? Briefly explain.
There should only be one submission per team on Gradescope.
Component | Points |
---|---|
Ex 1 | 1 |
Ex 2 | 8 |
Ex 3 | 6 |
Ex 4 | 5 |
Ex 5 | 4 |
Ex 6 | 5 |
Ex 7 | 5 |
Ex 8 | 8 |
Ex 9 | 4 |
Workflow & formatting | 4 |