Statistical hypothesis testing is the procedure that assesses evidence provided by the data in favor of or against some claim about the population (often about a population parameter or potential associations).
Ei incumbit probatio qui dicit, non qui negat
Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis.
Choose a (representative) sample, collect data, and analyze the data.
Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis.
Choose a (representative) sample, collect data, and analyze the data.
Figure out how likely it is to see data like what we observed, IF the null hypothesis were in fact true.
Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis.
Choose a (representative) sample, collect data, and analyze the data.
Figure out how likely it is to see data like what we observed, IF the null hypothesis were in fact true.
If our data would have been extremely unlikely if the null claim were true, then we reject it and deem the alternative claim worthy of further study. Otherwise, we cannot reject the null claim.
One consultant tried to attract patients by noting that the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated. She claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).
Is this a reasonable claim to make?
One consultant tried to attract patients by noting that the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated. She claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).
Is there sufficient evidence to suggest that her complication rate is lower than the overall US rate?
Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis.
Choose a (representative) sample, collect data, and analyze the data.
Figure out how likely it is to see data like what we observed, IF the null hypothesis were in fact true.
If our data would have been extremely unlikely if the null claim were true, then we reject it and deem the alternative claim worthy of further study. Otherwise, we cannot reject the null claim.
The null hypothesis (often denoted H0) states that "nothing unusual is happening" or "there is no relationship," etc.
On the other hand, the alternative hypothesis (often denoted H1 or HA) states the opposite: that there is some sort of relationship.
In statistical hypothesis testing we always first assume that the null hypothesis is true and then evaluate the weight of proof we have against this claim.
The null and alternative hypotheses are defined for parameters not statistics.
What will our null and alternative hypotheses be for this example?
The null and alternative hypotheses are defined for parameters not statistics.
What will our null and alternative hypotheses be for this example?
Expressed in symbols:
where p is the true proportion of transplants with complications among her patients.
With these two hypotheses, we now take our sample and summarize the data.
The choice of summary statistic calculated depends on the type of data. In our example, we use the sample proportion: ˆp=3/62≈0.048:
Next, we calculate the probability of getting data like ours, or more extreme, if H0 were in fact actually true.
This is a conditional probability:
Given that H0 is true (i.e., if p were actually 0.10), what would be the probability of observing ˆp=3/62?"
This probability is known as the p-value.
Let's simulate a distribution for ˆp such that the probability of complication for each patient is 0.10 for 62 patients.
This null distribution for ˆp represents the distribution of the observed proportions we might expect, if the null hypothesis were true.
When sampling from the null distribution, what is the expected proportion of complications? What would the expected count be of patients experiencing complications?
Supposing that the true proportion of complications is 10%, if we were to take repeated samples of 62 liver transplants, about 11.5% of them would have 3 or fewer complications.
That is, p = 0.115.
If it is very unlikely to observe our data (or more extreme) if H0 were actually true, then that might give us enough evidence to suggest that it is actually false (and that H1 is true).
What is "small enough"?
If the p-value is less than α, we say the results are statistically significant. In such a case, we would make the decision to reject the null hypothesis.
If the p-value is α or greater, we say the results are not statistically significant and we fail to reject the null hypothesis.
Importantly, we never "accept" the null hypothesis -- we performed the analysis assuming that H0 was true to begin with and assessed the probability of seeing our observed data or more extreme under this assumption.
There is insufficient evidence at α=0.05 to suggest that the consultant's complication rate is less than the US average.
Your friend claims that the mean price per guest per night for Airbnbs in Asheville, NC is $100. What do you make of this statement?
Remember, the null and alternative hypotheses are defined for parameters, not statistics
What will our null and alternative hypotheses be for this example?
Expressed in symbols:
where μ is the true population mean price per guest per night among Airbnb listings in Asheville.
With these two hypotheses, we now take our sample and summarize the data. We
have a representative of 50 Airbnb listings in the file asheville.csv
.
The choice of summary statistic calculated depends on the type of data. In our example, we use the sample proportion, ˉx=76.6.
asheville <- read_csv("data/asheville.csv")asheville %>% summarize(mean_price = mean(ppg))
## # A tibble: 1 × 1## mean_price## <dbl>## 1 76.6
We know that not every representative sample of 50 Airbnb listings in Asheville will have exactly a sample mean of exactly $76.6.
How might we deal with this variability in the sampling distribution of the mean using only the data that we have from our original sample?
We know that not every representative sample of 50 Airbnb listings in Asheville will have exactly a sample mean of exactly $76.6.
How might we deal with this variability in the sampling distribution of the mean using only the data that we have from our original sample?
We can take bootstrap samples, formed by sampling with replacement from our original dataset, of the same sample size as our original dataset.
We've captured the variability in the sample mean among samples of size 50 from Asheville area Airbnbs, but remember that in the hypothesis testing paradigm, we must assess our observed evidence under the assumption that H0 is true.
boot_means %>% summarize(mean(stat))
## # A tibble: 1 × 1## `mean(stat)`## <dbl>## 1 76.6
Where should the bootstrap distribution of means be centered under H0?
Supposing that the true mean price per guest were $100 a night, about 0.16% of bootstrap sample means were as extreme or even more so than our originally observed sample mean price per guest of $76.6.
That is, p = 0.0016.
If it is very unlikely to observe our data (or more extreme) if H0 were actually true, then that might give us enough evidence to suggest that it is actually false (and that H1 is true).
There is sufficient evidence at α=0.05 to suggest that the mean price per guest per night of Airbnb rentals in Asheville is not $100.
"A p-value of 0.05 means the null hypothesis has a probability of only 5% of being true"
"A p-value of 0.05 means there is a 95% chance or greater that the null hypothesis is incorrect"
"A p-value of 0.05 means the null hypothesis has a probability of only 5% of being true"
"A p-value of 0.05 means there is a 95% chance or greater that the null hypothesis is incorrect"
p-values do not provide information on the probability that the null hypothesis is true given our observed data.
Remember, a p-value is calculated assuming that H0 is true. It cannot be used to tell us how likely that assumption is correct. When we fail to reject the null hypothesis, we are stating that there is insufficient evidence to assert that it is false. This could be because...
Even more bad news, hypothesis testing does NOT give us the tools to determine which one of the two scenarios occurred.
Suppose we test a certain null hypothesis, which can be either true or false (we never know for sure!). We make one of two decisions given our data: either reject or fail to reject H0.
Suppose we test a certain null hypothesis, which can be either true or false (we never know for sure!). We make one of two decisions given our data: either reject or fail to reject H0.
We have the following four scenarios:
Decision | H0 is true | H0 is false |
---|---|---|
Fail to reject H0 | Correct decision | Type II Error |
Reject H0 | Type I Error | Correct decision |
Statistical hypothesis testing is the procedure that assesses evidence provided by the data in favor of or against some claim about the population (often about a population parameter or potential associations).
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |