Behavioral Statistics

 

Homework #1 - Graphical Representation of Data

For Helpful Hints, Click Here

For each problem, Download the data from the Problem Description (see following):

Problem Description 1

The final marks on a mathematics exam are stored in the file .

        a. Construct a stem and leaf display.
        b. Construct a histogram.
        c. Briefly describe what the histogram and stem and leaf display tell you about the data.

Remember to right click the mouse to save!

Problem Description 2

Perceptions of how well or poorly the economy will perform can sometimes result in self-fulfilling prophecies.  As a result, executives, economists, and government officials are interested in the public’s perceptions about the economy. Every year, 500 adults are surveyed in late December and asked “Compared with last year, do you think this coming year will be a year of economic prosperity, economic difficulty, or about the same as last year?” The responses are as follows:

                    1 = Prosperity
                    2 = Difficulty
                    3 = About the same
The responses (coded as 1, 2, and 3) for the years 1998, 1995, 1992, 1989, 1986, and 1983 are stored in columns 1 through 6 of the file . Use pie charts to summarize the data and briefly describe what the graphs tell you.

Problem Description 3

In 2004, a spate of small aircraft crashes made the safety of turboprop airplanes an issue. As part of an analysis of different types of accidents, Airjet Ltd. determined where accidents occurred for both turboprop airplanes and jets in the period 1994—2003. The data are stored in the file   using the following format.  

When accidents happen Code When accidents happen Code
Ground 1 Cruise 5
Takeoff 2 Descent 6
Initial Climb 3 Approach 7
Climb 4 Landing 8

The results for turboprops are stored in column 1, and the results for jets are stored in column 2.
        a. Identify the type of data stored in each column.
        b. Use two pie charts to summarize these data.
        c. Does it appear that turboprop airplanes and jets have similar accident patterns?

Problem Description 4

Women own about 40% of Canadian small businesses, but there are large variations in the types of businesses owned by men and women. Suppose that a survey of female-owned and male-owned small businesses was conducted and the type of business each operated was recorded in the following format.

Business Code  Business Code
Services - 1  Construction - 5
Retail/Wholesale/Trade - 2  Manufacturing - 6
Finance/Insurance/Real Estate - 3  Agriculture and primary - 7
Transportation/Communication - 4

The responses of women and men are stored in columns 1 and 2, respectively, in the file .
        a. Identify the type of data stored in each column.
        b. Use two bar graphs (one for men and one for women) to summarize and present these data.

Problem Description 5

A high school student named David Merrell did a fascinating study of the effects of listening to rock music on the performance of rats in a maze. He had three groups of rats, one raised in the presence of rock music (performed by the group Anthrax), one raised in the presence of music by Mozart, and one raised in the absence of music. These animals learned to navigate a maze before exposure to the music, and then performed over three additional weeks.  The data for this study is found in the file =

The variables in the file are, in order, Subject, Group [1 = Control, 2=Mozart, 3=Anthrax], wk1r1, wk1r2, wk1r3, wk2r1 ... wkk4r3 [4 weeks of 3 runs each], week1 week2 week3 week4 [weekly means], wt1, wt2, wt3, wt4 [weekly weights], median1--median4 [weekly medians].

Problem Description 6

An example that we will look at several times in the future comes from a study by Mireault (1990) investigating the effects of the death of a parent on the emotional well-being of college students. Among other things, she asked three different groups of college students to rate the perceived vulnerability to loss--i.e., how vulnerable did they feel about the loss of someone important two them. The three groups were (1) a group who had had a parent die before they started college, (2) a group whose parents had divorced, and (3) a group whose parents were both alive and still married to each other. Download these data from .

There are many variables here. They are, in order, ID, Group, Gender, YearColl, College, GPA, LostPGen, AgeAtLos, SomT, ObsessT, SensitT, DepressT, AnxT, HostT, PhobT, ParT, PsycT, GSIT, PVTotal, PVLoss, SuppTotl. We are interested in Group and PVLoss. The other variables will come up in other exercises.

Problem Description 7

Most of us have grown up to think of the geyser at Yellowstone named Old Faithful as just that--faithful and reliable. But actually it isn't very faithful at all, with times between eruptions varying between about 45 minutes and 90 minutes (And it has gotten worse in the last few months, following recent earthquake activity.) Chatterjee et al. (Chatterjee,S., Handcock, M.S., & Simonoff, J. S. (1995) Casebook for a First Course in Statistics and Data Analysis. New York: Wiley) presented data on the timing of nearly 300 eruptions, as well as the length of each eruption. The data:

The authors currently have these (and other) data available at geyser2a.dat The variables, in order, are length of previous eruption, interval between eruptions, and a dichotomized version of the first variable. Draw Histograms for the Length of Previous Eruption and the Dichotomized version of this variable.

  • What does the distribution of Intervals tell us about the "faithfulness" or "reliability" of Old Faithful?
  • What kinds of things might you look at to explain the variability in Intervals?
    The New York Times recently reported that the average Interval is getting longer. What might this mean in the context of what we already know?

Comments from Samprit Chatterjee, Mark Handcock and Jeffrey Simonoff:

The Old Faithful geyser is a wonderful national icon, and is also a wonderful source of interesting data, due to its non-faithful faithful appearance. What do we mean by "non-faithful"? It is well-known that the time interval between eruptions of the geyser is not faithful at all (in the sense of being around one value consistently), as it has a bimodal distribution. About one-third of the time the time between eruptions is roughly 55 minutes, while about two-thirds of the time it is roughly 80 minutes. Unfortunately, the description in the article of average time intervals in different years gives the mistaken impression of a unimodal distribution.

We have used Old Faithful eruption data (circa 1978, 1979 and 1985) in our introductory classes for many years, and made a case based on these data the lead case in our book "A Casebook for a First Course in behavioral statistics and Data Analysis" (Wiley, 1995), since students are often very surprised to find out just how faithful (or non-faithful) Old Faithful is. The specific characterization of the bimodal distribution mentioned in the previous paragraph comes from those data. The case also points out that a simple way to predict the time interval until the next eruption is to check whether the duration of the previous eruption was short (less than 3 minutes) or long (more than 3 minutes), and predict accordingly (55 minutes until the next eruption, or 80 minutes, respectively). This rule, derived using the 1978/1979 data, correctly predicts the 1985 values to within plus or minus 10 minutes about 90% of the time, right in line with what the Times article states.

 
Time trouble for geyser: It's no longer Old Faithful.
The New York Times, 5 Feb. 1996, D1
James Brooke


Rick Hutchison, Yellowstone National Park's research geologist, reported that Old Faithful, the park's leading tourist attraction, has been slowing down. In 1950, the average time interval between eruptions was 62 minutes, in 1970 it was 66 minutes, and today it is 77 minutes. It is also apparently becoming more difficult to predict the time until the next eruption, with forecasts now being to within plus or minus ten minutes.

The changes of recent years seem to be produced by seismic activity. Scientists theorize that earthquakes can have two effects on geysers, either speeding up or slowing down the rate of supply of water. Quakes can either shake loose debris that clog rock channels that feed water to a geyser, resulting in more water and steam, or can crack open new underground channels, redirecting water to other geysers or hot springs. It is speculated that the latter process is affecting Old Faithful.

 

For Helpful Hints, Click Here.


© David M. Compton, Ph.D.
Last updated: September 4, 2008