There are basically two different types of statistics which are related to but still different from each other. The first is descriptive statistics and the second is inferential statistics.
Descriptive statistics is simply the act of defining characteristics of a statistical measurement. Descriptive statistics consist of the procedures and methods used to organize and summarize raw data. In order to categorize the raw data that is collected, most statisticians rely on tables, charts, graphs and standard measurements such as averages, percentiles, and measures of variation.
Descriptive statistics are often used in the course of a baseball season. Baseball statisticians spend a great deal of time and effort looking at the raw data and summarizing, categorizing to come up with statements of fact regarding the season. For example in 1948 there were over 600 games played in the American League. To determine who had the best batting average in that season, you would need to take the official score sheets for each game, list each batter, determine the results of each time at bat, add the total number of hits and the total number of times at bat in order to come up with a batting average. In 1948 the American League player with the highest batting average was Ted Williams. But, if you wanted to know who the top 25 players for the season were, the statistical calculations would become increasingly complicated.
The use of computer statistical programs and the ability to incorporate many statistical functions on spreadsheet programs such as Excel means that more and more complicated and detailed information can be collected, formatted and presented with only a few clicks of the mouse.
The imaginary games and sports events developed through the use of a computer software program is essentially the collection of massive amounts of data and correlating it in such a way as to be able to compare like activities.
Inferential statistics is the process of choosing and measuring the trustworthiness of conclusions about a group based upon data obtained from a sample of the group. Political polling is an excellent example of inferential statistics. In order to determine who the winner of a presidential election is likely to be, typically a sample of a few thousand carefully chosen Americans are asked which way they will be voting. From the answers given to this question, statisticians are able to predict, or infer who the general population will vote for.
Obviously, the two keys to inferential statistics are choosing which members of the general population will be polled and which questions are asked. In a case such as the above, where there is a choice of two candidates, and the polled population, or sample population is asked: “Will you vote for Candidate X in the next election?” the answer will be either “yes”, “no”, or “undecided”. From the descriptive statistics you can determine that 51% of the sample group will vote for Candidate X. Turning to inferential statistics, you can infer that Candidate X will win the election.
However, in some cases, the sampling techniques have created incorrect inferences. A classic example is the 1948 Presidential election. Based on a poll taken by the Gallup Organization, President Harry Truman believed he would only gain about 45% of the votes and would lose to Republican challenger Thomas Dewey. In fact, as history proves, Truman won more than 49% of the votes and of course, won the election. This caused a change in some of the sampling techniques and the Gallup Organization has correctly predicted the Presidential election winner since.