Confidence Intervals and Hypothesis Testing for a Proportion

Learning Objectives:

Create confidence intervals and conduct hypothesis tests in R
Conduct a simulation experiment to explore the relationship between confidence intervals and hypothesis tests
Explore Type 1 and Type 2 errors through a simulation experiment.

The Data

In 2004, the state of North Carolina released a large data set containing information on births recorded in this state. This data set is useful to researchers studying the relation between habits and practices of expectant mothers and the birth of their children. We have access to a random sample of 1000 births. The sample can be loaded with the code below:

download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")

Two variables in this dataset that are of interest to policy makers are habit, which records whether the mother was a smoker or a nonsmoker, and lowbirthweight, which records whether the baby was born with a low or a not low birthweight. Birth weight is an important predictor of infant morbidity and mortality, as well as long-term health outcomes. There is a concern in public health that mothers who smoke are more likely to have underweight babies. We can see that this concern is supported by our sample.

Quiz Question: What proportion of babies in this sample had low birth weight? Report your answer to 3 decimal places.
Quiz Question: What proportion of babies born to habit=="smoker" mothers had low birth weight? Hint: if you are stuck, see exercise 10 on Lab 2. Report your answer to 3 decimal places.

It is known that in the state of North Carolina as a whole, 10% of all babies have low birth weight. But we do not know what this proportion is for the subpopulation of mothers who smoke. For this lab, our population of interest is babies born to smoking mothers and our parameter of interest is \(p\), the population proportion of babies born to smoking mothers who have low birth weight. We will treat the 126 babies born to smoking mothers as a random sample from this population. Our goal will be to test the hypothesis that \(p\) is equal to 0.1. First we will estimate \(p\) using a confidence interval. Afterwards, we will conduct a hypothesis test and explore what the hypothesis test actually means through simulation.

For this investigation, the random sample that we are interested in only includes the babies born to smoking mothers. Create a dataframe called nc_smoker that stores only the rows of the dataset where habit=="smoker". Hint: use filter.

nc_smoker <- ### Fill in here

Estimating with Confidence

In exercise 2, you computed \(\hat{p}\), the sample proportion of babies born to smoking mothers with low weight. We can use this sample proportion to create a 95% confidence interval for the population proportion \(p\). We could do this by hand as follows: be sure to fill in the appropriate value for phat and \(n\) (sample size). Also be sure that you understand each step.

phat <- ### Fill in here.
n <- ### Fill in here
SE <- ### Enter plug-in standard error formula (uses phat and n)
multiplier <- ### Enter what you should multiply the SE by to get 95% confidence
lower <- phat - multiplier*SE
upper <- phat + multiplier*SE
lower 
upper

Make a 95% confidence statement about the parameter in context.

Testing a Hypothesis

We start with the assumption that smoking does not increase the risk of a low birth weight baby, that is, \(p = 0.1\). We can then calculate a p-value to see if our sample \(\hat{p}=0.14\) is abnormally large relative to this assumption

Set up a two-sided hypothesis test to test whether or not \(p=0.1\) is plausible (write down the null and the alternative).

In a world where the null is true (\(p=0.10\)), what should the sampling distribution of \(\hat{p}\) look like? Fill in the blanks in the code below to plot the sampling distribution.

null_p <- ## Fill in
null_se <- ## Fill in
ggplot(data = NULL, aes(x = seq(0,1,by=0.01))) +
        geom_blank() +
        stat_function(fun = dnorm, args = c(mean = null_p, sd = null_se))

To see how our observed \(\hat{p}\) compares to the sampling distribution, we can add it to the plot above as a red vertical line. Make the same ggplot as the previous exercise, but now add +geom_vline(xintercept=phat, col = "red") to the end of the command (make sure phat is correctly defined).
Quiz Question: Compute a p-value for this test and explain how it relates to the plot from the previous exercise. Do you have convincing evidence to suggest that the proportion of smoking mothers who have low weight babies is not equal to 0.1? For the quiz, just report the p-value and round your answer to 4 decimal places.

Exploring with Simulation

Confidence intervals and p-values can be hard to wrap our heads around. In this section, we will perform some simulations to make sure we understand what confidence intervals and hypothesis tests are telling us. We will consider two different scenarios.

Scenario 1: The Null Hypothesis is True

Let’s create a fake population of 100,000 smoking mothers where 10% of them have low birthweight babies. The code below makes a list of 100,000 0s and 1s, where 90% are 0s (normal weight baby) and 10% are 1s (underweight baby).

popsize <- 100000
pop <- data.frame(lowbirthweight = c(rep(0, 0.9*popsize), rep(1, 0.1*popsize)))

Now let’s take many different samples of 126 mothers from this population. In the following code, we take 10,000 samples. For each sample, we compute a 95% confidence interval and a p-value for the hypothesis test based off of our sample. We will suppose that we are conducting hypothesis tests at the \(\alpha=0.05\) significance level. We will use tools from the infer package to help us. You may need to run install.packages("infer") in your console prior to loading it.

library(infer)
set.seed(111)
p0 <- 0.1
n <- 126
true_sd <- sqrt(p0*(1-p0)/n)
results <- pop %>%
        rep_sample_n(size = 126, reps = 10000) %>%
        summarise(phat = mean(lowbirthweight), 
                  se = sqrt(phat*(1-phat)/126),
                  me = 1.96 * se,
                  lower = phat - me,
                  upper = phat + me,
                  correct_CI = (lower <= p0 && upper >= p0),
                  zscore = (phat-p0)/se,
                  pval = 2*(1-pnorm(abs(zscore))),
                  reject_null = pval <= 0.05)

The code takes our population data frame and creates 10000 samples of size 126. It then computes several summary statistics for each sample. Explan each piece of code in the summarise statement above in words: what does each column in results represent, and what formula or concept was used to compute it?
- phat
- se
- me
- lower
- upper
- correct_CI
- zscore
- pval
- reject_null
What was the smallest sample proportion you observed? Largest?
In this case, since the true population proportion is 0.1, theory tells us that 95% of our computed confidence intervals should cover 0.1. What percent of our intervals actually covered 0.1?

Note that an interval covers 0.1 whenever \(\hat{p}\) is within 2 standard errors of \(p\). We know that this should happen around 95% of the time.

Quiz Question: In the simulation experiment above, we conducted the hypothesis test for 10,000 different samples. How many times did we make a type 1 error?
Quiz Question: How many times did we make a type 2 error?

Recall our original hypothesis test with our original, real North Carolina sample. How many of our 10,000 samples in results had phats with a zscore that was as large or larger (in magnitude) than the zscore for the phat in the original sample? Relate this number to the original p-value.

Summary

In this lab, you computed a confidence interval based on a sample proportion and you also completed a test of a null hypothesis. You then explored how your confidence intervals and hypothesis tests would change from sample to sample in a scenario where the null hypothesis is true, but also in a scenario where the null hypothesis is false.

You should have seen that our sample of 126 smoking mothers did not provide sufficient evidence at the \(\alpha=0.05\) level that the proportion of smoking mothers who have underweight babies (\(p\)) differs from 0.1. However, on the homework you will see that we certainly have not proved that p=0.1. Observing a sample such as ours is not that rare if \(p=0.1\), but it is also not that rare under numerous other possible values of \(p\). Hopefully this simulation convinced you of the importance of interpreting and understanding p-values.

Acknowledgements

This lab is a modified version of an OpenIntro lab.

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was written for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.