Using one of the Data Tools and Apps from the United States Census Bureau, choose some interesting census data and perform a statistical analysis on it in order to answer a question you want to answer with the data. Do not just copy and paste or summarize the data that you find. Use an appropriate statistical tool and provide your thought process behind the tool you chose as well as the results of your statistical analysis. Use this week’s lecture to aid your analysis. Cite the data source that you chose in your post, and document it in APA style as outlined in the Ashford Writing Center.

Lecture Aid

**Week 5: Statistical analysis**

Most people view statistics as a way to present the cold, hard objective facts about data. For example, if a poll says that 60% of Americans approve of the President, that means the majority of Americans like him, right? Not necessarily! The most talented statisticians know how easy it is to lie with statistics. Our job as researchers is to apply statistics carefully to avoid as many lies as possible.

Those of you who like baseball know that baseball is a game of statistics. If St. Louis Cardinals fans say, “The Cardinals have won eight of their last nine games,” they make their favorite team look like a currently successful team. But, if Chicago Cubs fans say, “The Yankees have won eight of their last nine games, but they’ve won by outscoring the other team by only one run,” then the Cardinals don’t sound as impressive anymore. Perhaps the Cubs fans are more truthful, but none of the fans are technically wrong.

Statistical tests such as the ones described in this week’s textbook chapter require many things to be in place in order for them to be accurate. In most cases, the larger the dataset, the more accurate the test results will be. If you run a chi-square with a sample size of 50, it will not be as accurate as a sample size of 5,000, and you run the risk of error: a small sample size might say that there is no statistically significant difference within the two groups, when in fact there is a difference.

Many popular tests assume that your data is normally distributed, but some data is skewed. If data is skewed, you will not get accurate results if you use most common tests. Data can be skewed frequently in survey data. For example, let’s assume you’re analyzing final exam scores. In a normal distribution, most students would earn a C, because it’s considered “average.” Fewer students would earn Bs and Ds, and even fewer students would earn As and Fs. However, in the exam scores you’re analyzing, most students earned As; this means your data is skewed. Perhaps the exam was too easy, or most of the students were really smart. This is an oversimplified example of how this can work, and without getting into the math or the theories behind it, the point here is to understand that you need to use data that’s appropriate for the chosen test.

Your sample can also skew results. Let’s say that you buy several types of candy: hard candies, Tootsie Rolls, Hershey’s Kisses, etc. Then you count the total number of pieces as well as the number of each type of candy. You put the candy into a bag, and pass the bag around a room of people, inviting them to each take a handful of candy. You ask them to write down the total number of their candies as well as the number of each type of candy before they ate it.

After they eat their candy, you ask them what biased them when they took their sample: Are you on a diet? Do you like Kisses more than Tootsie Rolls? You could then compare the total numbers of candy (the population) to the candy each person chose (their sample) to see whether their sample turned out skewed. In other words, was their sample representative of the overall population, or was it skewed with too many or too few of certain candy types? You could perform a number of statistical tests on this candy data, and the room of people might be eager to help you, since they’d all be on a sugar high!

Choosing the right test for the job is one of the biggest challenges for people new to statistics; we can use Excel, SPSS, or other statistical software to actually calculate the statistic, but if you’re not using the right test, the results will be meaningless. Given your data, should you use a chi-square or a t-test? It’s your job to decide that as the researcher.

**For viewing**

Foster, P. (n.d.) Tutorial videos for business statistics [Video playlist]. Retrieved from https://www.youtube.com/playlist?list=PL09B919EDBD83A95A