1、Lecture 12 Chapter 8 Inference for proportions12讲8章推理的比例Lecture 12, Section 8.1 & 8.2ProportionsMany statistical studies produce counts rather than measurements. Example: Did you vote in the last election? The response would be either a “Yes” or a “No”. The variable is categorical, the response is t
2、he value the variable takes on for each unit/person. If I did a survey of this class, I could accumulate the count of “Yes” responses and describe this count as a proportion of the total. Example: What academic year are you in at Purdue. The response would be either “Freshman”, “Sophomore”, “Junior”
3、, or “Senior”. Again, I could accumulate the count of each and describe each as a proportion of the total. Population and Sample proportions:In statistical sampling we often want to estimate the proportion, p, of “successes” in a population. “Success” is when the categorical variable takes on one pa
4、rticular value. p = count of successes in population size of population = X / NWe take a sample of our population; our estimator is the sample proportion of successes: count of successes in sample size of sample = X/nExample:1. You flip a coin 20 times and record whether a head or a tail is tossed.
5、In this sample, a head is recorded 11 times. What is the sample proportion of heads?Inference for a Single Proportion:So far we have only looked at making statistical inference on population means, a measurement of some quantitative variable of interest. Now we will look at making statistical infere
6、nce on a categorical variable using the proportion of some outcome/success in a population.Examples: How common is it for students at Purdue to fail a class? Out of a sample of 200 students, 50 of them have failed at least one class, or 25% . Based on these data, what can we say about all students a
7、t Purdue? What proportion of golfers in the USA have made at least one hole-in-one in their lifetime. From an SRS of 50 golfers 25 of them had made a hole-in-one. What can we say about all golfers in the USA?In both examples above we are interested in estimating the unknown proportion p from a popul
8、ation. The estimate of that population parameter p is the sample proportion, a statistic.Sampling Distribution of a Sample Proportion:Choose an SRS of size n from a large population with population proportion p having some characteristic of interest. We normally call whatever characteristic we are s
9、tudying a “success.” Let X be the count of successes in the sample and be the sample proportion of success, = X/nAlso: The sampling distribution of is approximately normal for a SRS from a large population and is closer to a normal distribution when the sample size n is large. The mean of the sampli
10、ng distribution is p. The standard deviation of the sampling distribution is Large-Sample Confidence Interval for a Population Proportion:Choose an SRS of size n from a large population with unknown proportion p of successes. The sample proportion is: = X/n The standard error of is: When n is large,
11、 an approximate level C confidence interval for p is: where is the value for the standard normal density curve with area C between and. The margin of error is: Use this interval when the number of successes and the number of failures are both at least 15 and the confidence level is 90%, 95%, or 99%.
12、Example:1. When trying to hire managers and executives, companies sometimes verify the academic credentials described by the applicants. One company that performs these checks summarized its findings for a six-month period. Of the 84 applicants whose credentials were checked, 15 lied about having a
13、degree.a. Give the estimate of the proportion of applicants who lied about having a degree and give the estimate for the standard error of.b. Consider these data to be a random sample of credentials from a large collection of similar applicants. Give a 95% confidence interval for the true proportion
14、 of applicants who lie about having a degree.Large-Sample Significance Test for a Population Proportion: 1. State the Null and Alternative hypothesis. , or , or 2. Find the test statistic:Draw a SRS of size n from a large population with unknown proportion p of successes. To test the hypothesis, com
15、pute the z statistic: 3. Calculate the p-value.In terms of a standard normal random variable Z, the approximate P-value for a test of against is is is 4. State the conclusions in terms of the problem. Choose a significance level such as = 0.05, then compare the P-value to the level. If P-value , the
16、n reject If P-value , then fail to reject Use the Large-Sample Significance Test for a Population Proportion if the expected number of successes and the expected number of failures are both greater than 10. If this is not met, or if the population is less than 10 times as large as the sample, other
17、procedures should be used.Example:1. Shereka, a starting player for a major college basketball team, made 60% of her free throws in her last three seasons. During the summer she worked hard on developing a softer shot in the hope of improving her free-throw accuracy. In the first nine games of this
18、season Shereka made 48 free throws in 67 attempts. Let p be her probability of success, making each free throw she shoots.a. State the null hypothesis that Sherekas free-throw probability has remained the same as the last three seasons and the alternative that her work in the summer resulted in a hi
19、gher probability of success.b. Calculate the z statistic for testing versus.c. Do you accept or reject for = 0.05? Find the P-value.d. Give a 90% confidence interval for Sherekas free-throw success probability for the new season. Are you convinced that she is now a better free-thrower shooter?e. Wha
20、t assumptions are needed for the validity of the test and confidence interval calculations that you performed?Sample Size for Desired Margin of Error:The level C confidence interval for a proportion p will have a margin of error approximately equal to a specified value m when the sample size satisfi
21、es Here z* is the critical value for confidence C, and p* is an estimated or guessed value for the proportion of successes in the future sample. The estimated value can be either based on a previous pilot study or it can be assumed to be .5, the value of p* that generates the largest sample size. Th
22、e margin of error will be less than or equal to m if p* is chosen to be 0.5. The sample size required is then given by Example:1. You want to estimate the proportion of students at your college or university who are employed for 10 or more hours per week while classes are in session. You plan to pre
23、sent your results by a 95% confidence interval. Using the estimated value, find the sample size required if the interval is to have an approximate margin of error of.Comparing Two ProportionsAssumptions for comparing two Proportions: The data consist of the two independent SRSs The two SRSs are larg
24、e.Typically we want to compare two proportions by giving a confidence interval for the difference, , or by testing the hypothesis of no difference,.Confidence Intervals for Comparing Two ProportionsChoose an SRS of size from a large population having proportion of successes; choose a SRS of size fro
25、m another large population having proportion of success. An approximate level C confidence interval for is Where and are the estimates of the population proportions; the standard error of the difference is and z* is the value for the standard Normal density curve with area C between z* and z*. The m
26、argin of error is: Use this method when the number of successes and the number of failures in both sample sizes are at least 10 and the confidence level is 90%, 95%, or 99%.Example1. Is lying about credentials by job applicants changing? From a previous example, one company that performs these check
27、s summarized its findings for the first six-month period: of the 84 applicants whose credentials were checked, 15 lied about having a degree. The company performed the same checks for a second period, the results for both are shown below.PeriodnX(lied)18415210635 a) Find the 95% confidence interval
28、for the difference of the proportion of applicants who lied about having a degree for the two periods. b) Based on this confidence interval, what can you say about the change over time of the proportion of applicants that lied on their application for the two periods? Significance Tests for Comparin
29、g Two Proportions1. State the Null and Alternative hypothesis. , or or 2. Find the test statistic:Draw a SRS of size n from a large population with unknown proportion p of successes. To test the hypothesis, compute the z statistic: where and and where the pooled standard error is and where 3. Calcul
30、ate the p-value.In terms of a standard normal random variable Z, the approximate P-value for a test of against is is is 4. State the conclusions in terms of the problem. Choose a significance level such as = 0.05, then compare the P-value to the level. If P-value , then reject If P-value , then fail
31、 to reject Examples1. Data on the proportion of applicants who lied about having a degree in two consecutive six-month periods are given in the previous example as:PeriodnX(lied)18415210635a. Formulate appropriate null and alternative hypotheses that can be addressed with these data, carry out the significance test, and summarize the results.