All Terrain Thinking

A Compendium of things I think are Important

"If you teach a man to think he is thinking, he will love you. If you teach a man to think, he will hate you. - Ed McArthur"
 
 

Statistics: It's not just whats' in your wallet

Descriptive Statistics

Descriptive Statistics can be best viewed as a bag of tricks that allow one to present the essential information contained in a data set in a way that the information can be readily interpreted by the reader. Included in the bag of tricks will be a number of pictures/graphs--histograms, stem plots, time plots, box plots--which give us a general picture of the data and a number of formulas/indices--mean, median, quartiles, standard deviation--which give us a number to describe a facet--center, spread, shape--of the data." We will end our coverage of descriptive statistics with an idealized description of an important, regular pattern or distribution of real data called the normal distribution.

To see what we are talking about, let's talk about something you are interested in-grades. You remember after each exam you were interested in knowing the results and what it would mean for your grade in the course. Now let look at grades from the instructor's side. Below you will find the complete data set for the first test in a recent course. There were 52 students that received the following scores on their exams.

Grades on Exam 1

Student

Grade

Student

Grade

Student

Grade

Student

Grade

1

91.983

14

97.531

27

96.8659

40

89.0001

2

91.7597

15

98.2979

28

76.2859

41

76.5876

3

87.9158

16

70.4242

29

99.6804

42

93.3512

4

77.0586

17

72.6251

30

87.6299

43

88.82

5

98.7479

18

86.9584

31

89.6395

44

77.4919

6

79.8029

19

95.2241

32

85.6969

45

94.7336

7

80.5968

20

91.9544

33

95.2098

46

75.4139

8

77.8953

21

80.2882

34

71.9719

47

86.5368

9

96.1051

22

77.2291

35

92.3448

48

93.7865

10

74.1581

23

93.1482

36

74.4269

49

73.8672

11

83.152

24

75.1727

37

82.7137

50

75.7028

12

91.7678

25

87.5282

38

77.8714

51

73.415

13

78.1368

26

80.501

39

71.828

52

74.7755

While this entire data set may be useful to you for some purposes, I suspect you were not interested in studying the entire class results. Furthermore, I suspect if you looked at the table for 30 seconds and were asked to describe what you saw, you would have considerable difficulty capturing the essential features of the data set. It is for this reason that we have descriptive statistics that allow us to summarize this data set with a few pictures and numbers, although in general we will not be able to reproduce the exact data from the summary statistics, they do offer us some insight into the underlying data.

As a starter, I suggest that we transform the data from a table to a graph. The diagram is created by rounding off the grades to the nearest whole number and then sorting them by score which gives us the following table.

Test Scores

Score

# of Tests

70

1

71

72

2

73

2

74

3

75

3

76

2

77

4

78

3

79

80

2

81

2

82

83

2

84

85

86

1

87

2

88

3

89

2

90

1

91

92

5

93

2

94

1

95

3

96

1

97

1

98

2

99

1

100

1

If we take these data and create a column graph with score on the horizontal and # of Tests on the vertical and graphing how many students received each score we get the following diagram which could be called a histogram or a frequency distribution. We can see that four students received a 77 and 5 students received a 92.

We could also create a second graph which will look exactly like the first except that we will graph the relative frequency against the scores. For example, a score of 92 is received by approximately 10 percent of the students (5 of 52).

Once you are comfortable with looking at the data set graphically, we can then look at specific features of the distribution of scores. Generally there are four features of the distribution we are concerned with: modality, symmetry, central tendency, and variability. Of the four, modality and symmetry are the easiest to visualize. Any value at which the frequency curve or relative frequency curve reaches a peak is called a mode. Most distributions in practice have one peak and are described as "unimodal". A distribution with two peaks is called "bimodal". This distribution has a number of modes and would not be easily characterized by the mode.

A distribution is said to be symmetric if the relative frequency (or probability) is the same the same distance either side of its center, m. Mathematically, the distribution of X-m is then the same as the distribution of m-X. The mean and median, concepts which we will discuss in the next section, are equal in a symmetric distribution. An asymmetric frequency distribution is skewed to the left if the lower tail is longer than the upper tail and skewed to the right if the upper tail is longer than the lower tail.

Now it is time to turn our attention to the two features of data sets such as this that have attracted the most interest, Central Tendency and Variability.

 

 

Valid HTML 4.01 Transitional
 

Add to Your Social Bookmarks: -

Visitors Map
several several several Site Map - Press Room - Privacy Policy - Disclaimer
Copyright © 1998-2012 eMcArthur unless otherwise indicated
Unauthorized duplication or publication of any materials from this Site is expressly prohibited.
    Hosting by IPower!