Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called...

27
39 PART II Σigma Freud and Descriptive Statistics 02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 39

Transcript of Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called...

Page 1: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

39

PART II

Σigma Freudand DescriptiveStatistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 39

Page 2: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

One of the things that Sigmund Freud, the founder ofpsychoanalysis, did quite well was to observe and describethe nature of his patients’ conditions. He was an astute

observer and used his skills to develop what was the first systematicand comprehensive theory of personality. Regardless of what youmay think about the validity of his ideas, he was a good scientist.

Back in the early 20th century, courses in statistics (like theone you are taking) were not offered as part of undergraduate orgraduate curricula. The field was relatively new, and the nature ofscientific explorations did not demand the precision that this setof tools brings to the scientific arena.

But things have changed. Now, in almost any endeavor, numberscount (as Francis Galton, the inventor of correlation and a firstcousin to Charles Darwin said as well). This section of Statistics forPeople Who (Think They) Hate Statistics . . . The Excel Edition isdevoted to how we can use Excel’s most basic statistical functionsto describe an outcome and better understand it.

Chapter 2 discusses measures of central tendency and how com-puting one of several different types of averages gives you the onebest data point that represents a set of scores. Chapter 3 completesthe coverage of tools we need to fully describe a set of data pointsin its discussion of variability, including the standard deviation andvariance. When you get to Chapter 4, you will be ready to learnhow distributions, or sets of scores, differ from one another andwhat this difference means. Chapter 5 deals with the nature ofrelationships between variables, namely, correlations.

When you finish Part II, you’ll be in excellent shape to startunderstanding the role that probability and inference play in thesocial and behavioral sciences.

40—— Statistics for People Who (Think They) Hate Statistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 40

Page 3: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

41

Computing andUnderstanding Averages

Means to an End

Difficulty Scale ☺☺☺☺ (moderately easy)

How much Excel? (a ton)

What you’ll learn about in this chapter

• Understanding measures of central tendency• Computing the mean for a set of scores using the AVERAGE

function• Computing the mode using the MODE function• Computing the median for a set of scores using the MEDIAN

function• Using the Data Analysis ToolPak to compute descriptive

statistics• Selecting a measure of central tendency

You’ve been very patient, and now it’s finally time to getstarted working with some real, live data. That’s exactlywhat you’ll do in this chapter. Once data are collected, a

usual first step is to organize the information using simple indexesto describe the data. The easiest way to do this is through comput-ing an average, of which there are several different types.

An average is the one value that best represents an entire groupof scores. It doesn’t matter whether the group of scores is thenumber correct on a spelling test for 30 fifth graders or the bat-ting percentage of each of the New York Yankees or the numberof people who registered as Democrats or Republicans in the mostrecent election. In all of these examples, groups of data can be

2

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 41

Page 4: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

summarized using an average. Averages, also called measures ofcentral tendency, come in three flavors: the mean, the median, andthe mode. Each provides you with a different type of informationabout a distribution of scores and is simple to compute and interpret.

COMPUTING THE MEAN

The mean is the most common type of average that is computed.It is simply the sum of all the values in a group, divided by thenumber of values in that group. So if you had the spelling scoresfor 30 fifth graders, you would simply add up all the scores and geta total and then divide by the number of students, which is 30.

The formula for computing the mean is shown in Formula 2.1.

(2.1)

where

• The letter X with a line above it (also called “X bar”) is themean value of the group of scores or the mean.

• The Σ, or the Greek letter sigma, is the summation sign,which tells you to add together whatever follows it.

• The X is each individual score in the group of scores.

• Finally, the n is the size of the sample from which you arecomputing the mean.

To compute the mean, follow these steps:

1. List the entire set of values in one or more columns. These are allthe Xs.

2. Compute the sum or total of all the values.

3. Divide the total or sum by the number of values.

For example, if you needed to compute the average number ofshoppers at three different locations, you would compute a meanfor that value.

X--- = �X

n

42—— Part II ♦ Σigma Freud and Descriptive Statistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 42

Page 5: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Location Number of Annual Customers

Lanham Park store 2,150Williamsburg store 1,534Downtown store 3,564

The mean or average number of shoppers in each store is 2,416.Formula 2.2 shows how it was computed using the formula yousaw in Formula 2.1:

(2.2)

See, we told you it was easy. No big deal.

And Now . . . Using Excel’s AVERAGE Function

To compute the mean of a set of numbers using Excel, followthese steps.

For some reason, the people who name functions would rather callthe one that computes the mean AVERAGE, rather than MEAN.Yikes—these same folks used the name MEDIAN to name thefunction that computes the median and they assigned the nameMODE to the function that computes the mode, so why not makeeveryone’s life easier and assign the name MEAN to the functionthat computes the average? If you find out, let us know.

1. Enter the individual scores into one column in a worksheet,such as you see in Figure 2.1.

X--- = �X

n= 2,150 + 1,534 + 3,564

3= 7,248

3= 2,416

Chapter 2 ♦ Computing and Understanding Averages 43

Figure 2.1 Data That Will Be Used to Compute an Average Score

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 43

Page 6: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

2. Select the cell (an intersection of a row and a column ina workbook) into which you want to enter the AVERAGEfunction. In this example, we are going to compute the meanin Cell A5.

3. Now, create a formula in any cell that would average the threevalues. The formula would look like this:

= (A1 +A2 + A3)/3

or

click on Cell A5 and type the Average function (which wedid) as follows:

= AVERAGE (A1:A3)

and press the Enter key

or

use the Insert Function menu option and the “Inserting aFunction” technique we talked about on page 26 in Chapter 1to enter the AVERAGE function in Cell A5.

Whether you type in a function or enter it using the InsertFunction menu option, it looks the same and no one will ever,ever, ever know how you did it. Once it’s there, whether typed orinserted, it does exactly the same thing.

As you can see in Figure 2.2, the mean was computed and thevalue returned to Cell A5. Notice that in the formula bar (whereyou can see the contents of a cell) in Figure 2.2, you can see theAVERAGE function fully expressed and the value computed as2,416, just as we did manually earlier in the chapter.

44—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.2 Using the AVERAGE Function to Compute the Mean of aSet of Numbers

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 44

Page 7: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

More Excel

You may also want to explore the geometric mean (the function isGEOMEAN). The geometric mean uses multiplication rather thanaddition to summarize data values. It’s used when one expectsthat changes occur in a relative fashion in the data.

THINGS TO REMEMBER

The mean is sometimes represented by the letter M and is alsocalled the typical, average, or most central score. If you are readinganother statistics book or a research report, and you see somethinglike M = 45.87, it probably means that the mean is equal to 45.87.

• In the formula, a small n represents the sample size for whichthe mean is being computed. A large N (like this) would rep-resent the population size. In some books and in some journalarticles, no distinction is made between the two.

• The sample mean is the measure of central tendency that mostaccurately reflects the population mean.

• The mean is like the fulcrum on a seesaw. It’s the centermostpoint where all the values on one side of the mean are equalin weight to all the values on the other side of the mean.

• Finally, for better or worse, the mean is very sensitive toextreme scores. An extreme score can pull the mean in onedirection or another and make it less representative of theset of scores and less useful as a measure of central tendency.This, of course, all depends on the values for which the meanis being computed. More about this later.

The mean that we just computed is also referred to as the arithmeticmean, and there are other types of means that you may read about, suchas the harmonic mean. Those are used in special circumstances but neednot concern you here. And if you want to be technical about it, the arith-metic mean (which is the one that we have discussed up to now) is alsodefined as the point at which the sum of the deviations is equal to zero(whew!). So, if you have scores like 3, 4, and 5 (where the mean is 4),the sum of the deviations about the mean (–1, 0, and +1) is 0.

Chapter 2 ♦ Computing and Understanding Averages 45

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 45

Page 8: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

More Excel

In addition to the arithmetic mean (one of many other kinds of aver-ages), here’s also the moving average, brought to you by the DataAnalysis ToolPak. Using the Moving Average tool, you can computethe average of a set of scores in chunks. For example, say you havethe numbers 1, 4, 5, and 10. The average of these is 5.0. But a mov-ing average taking two measures at a time (the average of 1 + 4, theaverage of 4 + 5, and the average of 5 + 10) equals 4.88. The mov-ing average, in some cases, is a bit more accurate because it takesinto account scores that are a bit extreme or unique in the set(which in this case is the 10). The Moving Average tool also revealsa chart that plots the actual data against the averages for each point.

Remember that the word average means only the one measurethat best represents a set of scores, and that there are many dif-ferent types of averages. Which type of average you use dependson the question that you are asking and the type of data you aretrying to summarize. More about this later.

Computing a Weighted Mean

You’ve just seen an example of how to compute a simplemean. But there may be situations where you have the occurrence ofmore than one value and you want to compute a weighted mean. Aweighted mean can be computed easily by multiplying the value bythe frequency of its occurrence, adding the total of all the products,and then dividing by the total number of occurrences.

To compute a weighted mean, follow these steps:

1. List all the values in the sample for which the mean is being com-puted, such as those shown in the column labeled Value (the valueof X) in the following table.

2. List the frequency with which each value occurs.

3. Multiply the value by the frequency, as shown in the third column.

4. Sum all the values in the Value × Frequency column.

5. Divide by the total frequency.

46—— Part II ♦ Σigma Freud and Descriptive Statistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 46

Page 9: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

3. Copy the formula down so that Cells C2 through C9 containthe multiplied values.

More Excel

Remember that a cell contains information. Sometimes, thatinformation is a formula, even though what you see is the resultsof that formula. In Figure 2.3, we used the Ctrl+` key combinationto show the formulas or functions in a cell, rather than the resultsof that formula or function.

Chapter 2 ♦ Computing and Understanding Averages 47

For example, here’s a table that organizes the values and frequenciesin a flying proficiency test for 100 airline pilots.

Value Frequency Value × Frequency

97 4 38894 11 1,03492 12 1,10491 21 1,91190 30 2,70089 12 1,06878 9 70260 (don’t fly with this guy) 1 60

Total 100 8,967

The weighted mean is 8,967/100, or 89.67. Computing the meanthis way is much easier than entering 100 different scores into yourcalculator or computer program.

How can we do this using Excel? Quite easily. Just follow thesesteps . . .

1. Enter the data you see in the above table in a new worksheet.

2. Create a formula like the one you see in Cell C2 to multiplethe value times the frequency. You can see the formula in theformula bar in Figure 2.3.

Figure 2.3 Using Excel to Compute a Weighted Mean

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 47

Page 10: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

In basic statistics, an important distinction needs to be made betweenthose values associated with samples (a part of a population) and thoseassociated with populations. To do this, statisticians use the followingconventions. For a sample statistic (such as the mean of a mean), Romanletters are used. For a population parameter (such as the mean of a pop-ulation), Greek letters are used. So, the mean for the spelling score for asample of 100 fifth graders is represented as X–5, whereas the mean for thespelling score for the entire population of fifth graders is represented asµ5, using the Greek letter mu, or µ.

COMPUTING THE MEDIAN

The median is also an average, but of a very different kind. Themedian is defined as the midpoint in a set of scores. It’s the pointat which one half, or 50%, of the scores fall above and one half, or

4. In Cells B10 and C10, use the SUM function to total thecolumns.

5. Now, placing the results in Cell C12, divide the total sum (inCell C10) by the total frequency (in Cell B10), as you see inFigure 2.4. Ta-da! You did it again with a weighted average of89.67, just as we showed above.

48—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.4 The Computation of a Weighted Mean

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 48

Page 11: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

50%, fall below. It’s got some special qualities that we will talkabout later in this section, but for now, let’s concentrate on howit is computed. There’s no standard formula (but there is an Excelfunction, as we will see later) for computing the median.

To compute the median, follow these steps:

1. List the values in order, either from highest to lowest or lowest tohighest.

2. Find the middle-most score. That’s the median.

For example, here are the incomes from five different households:

$135,456

$25,500

$32,456

$54,365

$37,668

Here is the list ordered from highest to lowest:

$135,456

$54,365

$37,668

$32,456

$25,500

There are five values. The middle-most value is $37,668, and that’sthe median.

Now, what if the number of values is even? Let’s add a value($34,500) to the list so there are six income levels. Here they are:

$135,456

$54,365

$37,668

$34,500

$32,456

$25,500

Chapter 2 ♦ Computing and Understanding Averages 49

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 49

Page 12: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

When there is an even number of values, the median is simplythe mean between the two middle values. In this case, the middletwo cases are $34,500 and $37,668. The mean of those two valuesis $36,084. That’s the median for that set of six values.

What if the two middle-most values are the same, such as in thefollowing set of data?

$45,678

$25,567

$25,567

$13,234

Then the median is same as both of those middle-most values. Inthis case, it’s $25,567.

And Now . . . Using Excel’s MEDIAN Function

To compute the median of a set of numbers using Excel, followthese steps.

1. Enter the individual scores into one column in a worksheet,such as you see in Figure 2.5.

50—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.5 Data for Computing the Median

2. Select the cell into which you want to enter the MEDIANfunction. In this example, we are going to compute the meanin Cell A9.

3. Click on Cell A9 and type the median function as follows:

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 50

Page 13: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

If you know about medians, you should know about percentile points.Percentile points are used to define the percentage of cases equal toand below a certain point in a distribution or set of scores. For exam-ple, if a score is “at the 75th percentile,” it means that the score is at orabove 75% of the other scores in the distribution. The median is alsoknown as the 50th percentile, because it’s the point below which 50%of the cases in the distribution fall. Other percentiles are useful as well,such as the 25th percentile, often called Q1, and the 75th percentile,referred to as Q3. So what’s Q2? The median, of course.

Here comes the answer to the question you’ve probably had in theback of your mind since we started talking about the median. Whyuse the median instead of the mean? For one very good reason. Themedian is insensitive to extreme scores, whereas the mean is not.

When you have a set of scores in which one or more scores areextreme, the median better represents the centermost value of thatset of scores than any other measure of central tendency. Yes, evenbetter than the mean.

= MEDIAN(A2:A7)

and press the Enter key

or

use the Insert Function menu option and the “Inserting aFunction” technique we talked about on page 25 in Chapter 1to enter the MEDIAN function.

You can see in Figure 2.6 the value of the median, and in theformula bar, you can see the MEDIAN function.

Chapter 2 ♦ Computing and Understanding Averages 51

Figure 2.6 Computing the Median Using the MEDIAN Function

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 51

Page 14: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

What do we mean by extreme? It’s probably easiest to thinkof an extreme score as one that is very different from the groupto which it belongs. For example, consider the list of five incomesthat we worked with earlier (shown again here):

$135,456

$54,365

$37,668

$32,456

$25,500

The value $135,456 is more different from the other five than anyother value in the set. We would consider that an extreme score.

The best way to illustrate how useful the median is as a measureof central tendency is to compute both the mean and the median fora set of data that contains one or more extreme scores and then com-pare them to see which one best represents the group. Here goes.

The average or mean of the set of five scores you see above isthe sum of the set of five divided by five, which turns out to be$57,089. On the other hand, the median for this set of five scoresis $37,668. Which is more representative of the group? The value$37,668, because it clearly lies more in the middle of the group,and we like to think about the average as being representative orassuming a central position. In fact, the mean value of $57,089 fallsabove the fourth highest value ($54,365) and is not very central orrepresentative of the distribution.

It’s for this reason that certain social and economic indicators(mostly involving income) are reported using a median as a measureof central tendency, such as “The median income of the averageAmerican family is . . . ,” rather than using the mean to summarizethe values. There are just too many extreme scores that would skew,or significantly distort, what is actually a central point in the set ordistribution of scores.

You learned earlier that sometimes the mean is represented by thecapital letter M instead of X–. Well, other symbols are used for themedian as well. We like the letter M, but some people confuse itwith the mean, so they use Med for median, or Mdn. Don’t let thatthrow you—just remember what the median is and what it repre-sents, and you’ll have no trouble adapting to different symbols.

52—— Part II ♦ Σigma Freud and Descriptive Statistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 52

Page 15: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

More Excel

You can use the QUARTILE function to compute the 25th, 50th,and 75th percentiles as well as other quartiles in a distribution.Remember the median? It’s the second quartile in Excel’s grand plan.

THINGS TO REMEMBER

Here are some interesting and important things to remember aboutthe median.

• The mean is the middle point of a set of values, and themedian is the middle point of a set of cases.

• Because the median cares about how many cases, and not thevalues of those cases, extreme scores (sometimes called out-liers) don’t count.

COMPUTING THE MODE

The third and last measure of central tendency that we’ll cover,the mode, is the most general and least precise measure of centraltendency, but it plays a very important part in understanding thecharacteristics of a special set of scores. The mode is the value thatoccurs most frequently. There is no formula for computing the mode.

To compute the mode, follow these steps:

1. List all the values in a distribution, but list each value only once.

2. Tally the number of times that each value occurs.

3. The value that occurs most often is the mode.

For example, an examination of the political party affiliation of300 people might result in the following distribution of scores.

Chapter 2 ♦ Computing and Understanding Averages 53

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 53

Page 16: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Party Affiliation Number or Frequency

Democrats 90Republicans 70Independents 140

The mode is the value that occurs most frequently, which in theabove example is Independents. That’s the mode for this distribution.

And Now . . . Using Excel’s MODE Function

To compute the mode of a set of numbers using Excel, follow thesesteps.

1. Enter the individual scores into one column in a worksheetas you see in Figure 2.7, where one column indicates partyaffiliation. Keep in mind that we have to enter a number (andnot text), and Excel counts the number of times the valueappears. We did include a code, though, in the same worksheetso you can keep straight what value stands for what party.

54—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.7 Data for Computing the Mode

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 54

Page 17: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

2. Select the cell into which you want to enter the MODEfunction. In this example, we are going to compute the meanin Cell B21.

3. Click on Cell A5 and type the MODE function as follows:

= MODE (A2:A20)

and press the Enter key

or

Use the Insert Function menu option and the “Inserting aFunction” technique we talked about on page 26 in Chapter 1to enter the MODE function in Cell B21. You see the mode andthe function in the formula bar in Figure 2.8.

Chapter 2 ♦ Computing and Understanding Averages 55

Figure 2.8 Using the MODE Function

You can use the COUNTIF function (in the Database set) to count thenumber of occurrences of text, which would be a really simple way tofind out the mode without having to use the MODE function. Simplycreate a list, and then use the function defining the range and the values.In our example here, it would be something like =COUNTIF(A2:A20,democrat) which would tally all the occurrences of the word Democratand return it to the cell of your choice. Then, you would just select thelargest value as the mode.

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 55

Page 18: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Want to know what the easiest and most commonly mademistake is when computing the mode? It’s selecting the number oftimes a category occurs, rather than the label of the category itself.Instead of the mode being Independents, it’s easy for someone toconclude the mode is 140. Why? Because they are looking atthe number of times the value occurred, and not the value thatoccurred most often! This is a simple mistake to make, so be onyour toes when you are asked about these things.

Apple Pie à la Bimodal

If every value in a distribution contains the same number of occur-rences, then there really isn’t a mode (and Excel will return an incor-rect value). But if more than one value appears with equal frequency,the distribution is multimodal. The set of scores can be bimodal (withtwo modes), as the following set of data using hair color illustrates.

Hair Color Number or Frequency

Red 7Blond 12Black 45Brown 45

In the above example, the distribution is bimodal because thefrequency of the values of black and brown hair occurs equally. Youcan even have a bimodal distribution when the modes are relativelyclose together, but not exactly the same, such as 45 people withblack hair and 44 with brown hair. The question becomes, Howmuch does one class of occurrences stand apart from another? Canyou have a trimodal distribution? Sure—where three values havethe same frequency. It’s unlikely, especially when you are dealingwith a large set of data points, but certainly possible.

USING THE AMAZING ANALYSIS TOOLPAKTO COMPUTE DESCRIPTIVE STATISTICS

Now it’s our first chance to use the amazing ToolPak that we intro-duced on page 35.

56—— Part II ♦ Σigma Freud and Descriptive Statistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 56

Page 19: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Chapter 2 ♦ Computing and Understanding Averages 57

Figure 2.9 Data for the Descriptive Statistics ToolPak Option

Figure 2.10 The Dialog Box That Gets Us Started With the AnalysisToolPak

This particular item in the ToolPak, named Descriptive Statistics,computes more values than we need, but because you can’t beselective in what the ToolPak computes (but can edit the results, asyou will shortly see), we’ll just show you all the results and dealonly with those that we cover in this chapter. We’ll follow the sameprocedure in later chapters.

To use the ToolPak to compute descriptive statistics, follow thesesteps. We’re using the data you see in Figure 2.9.

1. Click Tools Data Analysis and you will see the DataAnalysis dialog box shown in Figure 2.10.

2. Click Descriptive Statistics, and then click OK, and youwill see the Descriptive Statistics dialog box as shown inFigure 2.11.

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 57

Page 20: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

3. Enter the range of data you want Excel to use in the compu-tation of the descriptive statistics in the Input Range box, butalso include the column heading (so the heading shows up inthe analysis). In this example (as you can see in Figure 2.9),the data we want to analyze are in Cells A1 through A7.

4. Click the “Labels in First Row” check box.

5. Now click the Output Range button in the Output Optionssection of the dialog box and enter the location where youwant Excel to return the results of the analysis. In this exam-ple, we choose C1.

More Excel

Click or Drag?

This is covered in Appendix A, but it’s important enough toemphasize again here. You can enter cell addresses in a dialogbox in Excel in three ways.

You can just enter them using the keyboard (such as typing “A1”).You can click and drag the mouse over the cells you want to

select, and then release the mouse and the cell range (or cell) willappear in the range box.

You can use the Collapse box (the one that looks like this: .When you click this, it allows you to enter the range throughdragging as well.

Which one for you? Whichever you find works best, but youhave to know that typing cell ranges can get old very fast. Try toclick and drag or use the Collapse box.

58—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.11 The Descriptive Statistics Dialog Box

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 58

Page 21: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

6. Click the “Summary statistics” check box in the DescriptiveStatistics dialog box. The completed Descriptive Statisticsdialog box is shown in Figure 2.13.

Chapter 2 ♦ Computing and Understanding Averages 59

Figure 2.13 The Completed Descriptive Statistics Dialog Box

Figure 2.14 The Descriptive Statistics ToolPak Results

7. Click OK and you will see the results, as shown in Figure 2.14.

Now this is pretty darn amazing. You get all this informationwith relatively few clicks. You just have to make sure that you getyour ducks all lined up in a row (where that expression ever camefrom, who knows?), but you must be sure that all the cell referencesare entered accurately.

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 59

Page 22: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

You can see all kinds of useful information in Figure 2.14 rangingfrom the Mean (53324.16667) of the six values (See how Count = 6?)to the Median (36084), with a bunch of other stuff (some of it we willnot be dealing with—see Statistics 2 in your course catalog).

Make the Analysis ToolPak Output Pretty

Once you use any tool in the Analysis ToolPak and get someoutput like you see in Figure 2.14, you can (of course) leave it likeit is or use other Excel tools to format it to better fit your needs.This output is absolutely part of the worksheet you created in thefirst place, so anything you do to the entire sheet also has an impacton this new output. In Figure 2.15, you can see we made severalchanges using simple Excel tools.

60—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.15 The New and Improved Descriptive Statistics PrintoutFrom the ToolPak

• We formatted the entire worksheet in Arial 12.

• We formatted numbers so they made more sense. For example,we reformatted the mean as $53,324.17 from 53324.16667.

• We deleted the Mode cells because there is no mode and Excelgave us back a nonsense response as you see in Figure 2.14.

• We used the Format Column AutoFit option to adjustthe columns so that all the information fit on the worksheet.

• We could have added other things and used more of Excel’sbells and whistles (color, table formats, shading, etc.), butwhat you see does a fine job of showing the results of theanalysis. Fancy is nice, but there’s nothing wrong with simpleand straightforward—words to live by.

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 60

Page 23: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

WHEN TO USE WHAT

OK, we’ve defined three different measures of central tendencyand given you fairly clear examples of each. But the most impor-tant question remains unanswered. That is, “When do you usewhich measure?”

In general, which measure of central tendency you use dependson the type of data that you are describing. Unquestionably, ameasure of central tendency for qualitative, categorical, or nominaldata (such as racial group, eye color, income bracket, voting pref-erence, and neighborhood location) can be described using onlythe mode.

For example, you can’t be looking at the most central measurethat describes which political affiliation is most predominant in agroup and use the mean—what in the world could you conclude,that everyone is half-Republican? Rather, that out of 300 people,almost half (140) are Independent seems to be the best way ofdescribing the value of this variable. In general, the median andmean are best used with quantitative data, such as height, incomelevel in dollars (not categories), age, test score, reaction, and num-ber of hours completed for a degree.

It’s also fair to say that the mean is a more precise measurethan the median, and the median is a less precise measure thanthe mode. This means that all other things being equal, use themean, and indeed, the mean is the most often used measure ofcentral tendency. However, we do have occasions when the meanwould not be appropriate as a measure of central tendency—forexample, when we have categorical or nominal data, such as haircolor. Then we use the mode. So, here is a set of three guidelinesthat may be of some help. And remember, there can always beexceptions.

1. Use the mode when the data are categorical in nature andvalues can fit into only one class, such as hair color, politicalaffiliation, neighborhood location, and religion. When this isthe case, these categories are called mutually exclusive.

2. Use the median when you have extreme scores and you don’twant to distort the average, such as income.

3. Finally, use the mean when you have data that do not includeextreme scores and are not categorical, such as the numericalscore on a test or the number of seconds it takes to swim50 yards.

Chapter 2 ♦ Computing and Understanding Averages 61

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 61

Page 24: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Summary

No matter how fancy schmancy your statistical techniques are, you willstill almost always start by simply describing what’s there—hence theimportance of understanding the simple notion of central tendency.From here, we go to another important descriptive construct: variabil-ity, or how different scores are from one another.

Time to Practice

1. Compute the mean, median, and mode for the following threesets of scores saved as Chapter 2 Data Set 1. Do it by hand orusing Excel. Show your work, and if you using Excel, print out acopy of the output.

Score 1 Score 2 Score 3

3 34 1547 54 1675 17 1324 26 1455 34 1546 25 1457 14 1138 24 1566 25 1545 23 123

2. You are the manager of a fast food store. Part of your job is toreport to the boss at the end of each day what special is sellingbest. Use your vast knowledge of descriptive statistics and writeone paragraph to let the boss know what happened today. Hereare the data. Do this exercise by hand. Be sure to include a copyof your work.

Special Number Sold Cost

Huge Burger 20 $2.95Baby Burger 18 $1.49Chicken Littles 25 $3.50Porker Burger 19 $2.95Yummy Burger 17 $1.99Coney Dog 20 $1.99

Total specials sold 119

62—— Part II ♦ Σigma Freud and Descriptive Statistics

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 62

Page 25: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Answers to Practice Questions

3. Under what conditions would you use the median rather than themean as a measure of central tendency? Why? Provide an exampleof two situations where the median might be more useful than themean as a measure of central tendency.

4. You’re in business for yourself and you have been fortunateto buy and own the Web site titled havefun.com, where yousell every imaginable stupid toy and game (like potato guns) thateveryone needs. You’re reviewing your advertising budget for thethird quarter (from 7/1 through 10/1) and want the mean. Hereare the sales data in dollars. Use Excel to compute the averagesales by toy and by week.

July August September Average Toy Sales Sales Sales Sale

Slammer 12,345 14,453 15,435Radar Zinger 31,454 34,567 29,678LazerTags 3,253 3,121 5,131

Average Sale

5. As the head of public health, you do a weekly census across agegroups of the number of cases of flu reported. By hand, computethe mean and the median by week. Which do you think, given thisparticular data, is the most useful measure of central tendency?

12/1 12/8 12/16 through 12/7 through 12/15 through 12/23

0–4 years 12 14 155–9 years 15 12 14

10–14 years 12 24 2115–19 years 38 12 19MeanMedian

Chapter 2 ♦ Computing and Understanding Averages 63

1. By hand . . .

Score 1 Score 2 Score 3

Mean 5.6 27.6 144.3Median 5.5 25.0 149.5Mode 5 25, 34 154

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 63

Page 26: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

Figure 2.16 shows you what the Excel worksheet could look like. Keepin mind that there are two modes in the set of data named Score 2 (theyare 34 and 25).

64—— Part II ♦ Σigma Freud and Descriptive Statistics

Figure 2.16 Using Excel to Compute the Mean, Median, andMode for the Data From Question 1

2. Here’s what your one paragraph might look like.

As usual, the Chicken Littles [the mode] led the way in sales. Thetotal amount of food sold was $303, for an average of $2.55 for eachspecial.

3. You use the median when you have extreme scores, which woulddisproportionately bias the mean. One situation where the medianis preferable to the mean is where income is reported. Because itvaries so much, you want a measure of central tendency that isinsensitive to extreme scores. Another example is where you havean extreme score or an outlier, such as the speed with which agroup of adolescents can run 100 yards, where there are one ortwo exceptionally fast individuals.

4. Average for the quarter by month are shown in Figure 2.17. Weinserted the AVERAGE function in only one cell and then copiedit across rows and only one other cell and copied it acrosscolumns.

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 64

Page 27: Σigma Freud and Descriptive Statistics · summarized using an average. Averages, also called measures of central tendency,come in three flavors: the mean, the median, and the mode.

5.

12/1 12/8 12/16 through 12/7 through 12/15 through 12/23

Mean 19.25 15.5 17.25Median 13.5 13 17

At least for the week of 12/1 through 12/7, the median is a betterrepresentative measure of central tendency because there is oneextreme data point, 38, for the 15- to 19-year-old group. For all theother weeks, the mean seems appropriate.

Chapter 2 ♦ Computing and Understanding Averages 65

Figure 2.17 Average Sales

02-Salkind-45022.qxd 6/21/2006 6:02 PM Page 65