next up previous
Next: Summary Up: The Homerun Hitting of Previous: Simulating a consistent hitter

Was Schmidt consistent?

By use of the above simulation procedure, we have constructed a population of 2000 baseball hitters based on a constant probability coin-tossing model. To see if Mike Schmidt is truly a consistent hitter, we compare his homerun statistics with those of this ``consistent population".

As we looked at Schmidt's homerun data, some features may appear unusual. For example, suppose we focus on his homerun totals for each year given in Table 1. During the years 1974-1987, we note that Schmidt hit 21 homeruns in his worst season, and 48 homeruns in his best. In addition, we notice that Schmidt hit 31 homeruns in the strike-shortened 1981 season. Are these homerun totals unusual if Schmidt was a consistent hitter?

To answer this question, we look at our population of consistent hitters. For each of the 2000 simulations, we compute the minimum number of homeruns (excluding the strike season), the maximum number of homeruns, and the number of homeruns in 1981. The histograms of these statistics for our simulated hitters are shown in Figure 4. On the horizontal axis of each graph, we have indicated the value for Schmidt using a large black dot. Note that the worst homerun totals for our consistent hitters ranged from 16 to 37 with an mean about 28. Schmidt's worst total, 21, is in the left tail of this distribution. The p-value, or probability that a consistent hitter has a homerun total this value or smaller, is .03. Since this probability is small, Schmidt does appear to have a ``small" homerun probability for this year. Many baseball writers, for example, Westcott (1995), attribute this bad homerun year to injuries that Schmidt suffered during that season.

Schmidt's best homerun total and his 1981 total conform well to the simulated data. The consistent hitters had best years ranging between 38 to 65 homeruns; Schmidt's best year (48) is about at the center of this distribution. The average of the 1981 totals for the simulated data is about 24. Schmidt's value (31) is relatively large, but since the upper tail probability is .12, this homerun total is not quite large enough to say that Schmidt was unusually hot this year.

By comparing the value of any homerun statistic of Schmidt with the distribution of values of the same statistic for the population of consistent hitters, we can see if Schmidt has an unusual value of this statistic. Specifically, let us focus on the following statistics that were discussed in Section 2.

First, consider Schmidt's games with multiple homeruns. Table 2 summarizes the numbers of days with 2, 3, or 4 homeruns for the simulated hitters. The mode of the number of days with 2 homeruns for the 2000 hitters was 42. Schmidt's value, 43, is just about this average. The p-value, or probability that a simulated hitter would have 43 two-homerun days or more, is .50. With respect to three-homerun days, the most common value for the simulated hitters was 2, which is the same as Schmidt's 2 days with this number of homeruns. The only surprising result, from a statistical viewpoint, is Schmidt's one four-homerun day. Practically all of the simulated hitters never had a day with four homeruns. Only 4 percent of the hitters had one or more days with four homeruns, so Schmidt's day could be considered rare.

  table34
Table 2: Summary statistics for the number of two-homerun, three-homerun, and four-homerun games for the consistent hitters.

Similarly, we can look at the largest moving average (with a width of 10 days) for our simulated data. A probability table for this average is presented in Table 3. For our consistent hitters, the best moving average ranged between .6 and 1.3, with .7 and .8 the most common values. Schmidt's largest moving average of 1.0 is somewhat extreme since the p-value for 1.0 is .066. Table 4 summarizes the probability distribution for the number of spacings that exceed 20 days for the simulated hitters. The probability that a consistent hitter would have 13 or more days with ``large" spacings is .068+.002 = .070. Since the number of large spacings and the largest moving average are positively correlated, it is reasonable that Schmidt is extreme with respect to both measures.

  table42
Table 3: Probability distribution of maximum moving average statistic for consistent hitters.

  table50
Table 4: Probability distribution of number of spacings larger than 20 days for consistent hitters.

Tables 3 and 4 give some evidence that Schmidt's homerun hitting was not consistent during the weeks of the individual seasons. However, the largest moving average and the number of large spacings statistics only focus on particular aspects of Schmidt's ``inconsistent" homerun hitting behavior. Perhaps more can be learned by looking at the distribution of numbers of homeruns formed by subdividing Schmidt's career by weeks instead of years.

Suppose that we partition each season of 182 days (26 weeks) into 13 two-week periods. For each two-week period, we record Schmidt's number of homeruns. The relative frequency distribution for this two-week homerun count over all 14 seasons is displayed in the row labeled ``Schmidt" in Table 5. We see that 10 percent of the time, Schmidt had no homeruns during these two-week periods, 16 percent of the time he had 1 homerun during these periods, and so on. To see if this distribution is unusual, we find the corresponding two-week homerun distribution over all seasons for our simulated sample of players. The probability distribution of the homerun counts over the simulated sample is displayed in the ``Mean" line of Table 5. This average two-week homerun distribution for the simulated players is different from Schmidt's distribution. Schmidt appears to have more two-week periods with 0 and ``6 or more" homeruns than the average consistent hitter, and has fewer periods with 4 homeruns.

Since Schmidt's two-week homerun distribution exhibits more spread than the typical consistent hitter, we can distinguish the two distributions by the computation of standard deviations. Schmidt's standard deviation of 1.95 is larger than the standard deviation of 1.75 for the probability distribution of the consistent hitters. To see if Schmidt's value is significantly larger, Figure 5 displays a histogram of the standard deviations of the two-week homerun distributions for the sample of 2000 simulated hitters. Schmidt's value is in the right tail of the distribution and the corresponding p-value is .014. Since this p-value is small, we can conclude that Schmidt's homerun hitting in short periods is different from that of a consistent homerun hitter with a constant probability of success.

  table58
Table: Relatively frequency distribution of number of homeruns for two-week periods for Schmidt, the

mean distribution for the consistent hitters, and the associated standard deviations for the two distributions.

One explanation for this distinctive two-week distribution is that Schmidt's homerun rate did not stay constant over a season. There were periods in which he was relatively hot (high success rate) or cool (low success rate). This behavior would result in large number of weeks in which he had only a few homeruns, small number of weeks in which he had an average number of homeruns, and large number of periods where he had a lot of homeruns.

Since there is some evidence for a nonconstant homerun rate, it is interesting to check if this pattern is evident across all seasons of Schmidt's career. For each year from 1974 to 1987, we can compute the mean number of homeruns for all two-week periods. In addition, we can compute the standard deviation of these homerun numbers. The mean measures an average seasonal homerun production and the standard deviation measures the consistency of this production across the season. Figure 6 plots these means and the standard deviations against the year number. The pattern in the mean graph is the same as if we plotted Schmidt's homerun rates (Table 1) against the year. Generally, Schmidt hit three homeruns each two-week period across all years; the exceptions are the one poor year (1978) and the three good years in the middle of his career. The standard deviation plot shows a different pattern. Generally, the largest standard deviations occur during the first part of Schmidt's career and there is a decreasing trend for later years. This indicates that there was more instability in the his performance in the early part of his career. There are two outliers (small values) in the early years. The small 1978 value for both the mean and standard deviation is just a reflection of Schmidt's poor season where his homerun counts were low.


next up previous
Next: Summary Up: The Homerun Hitting of Previous: Simulating a consistent hitter

Jim Albert
Mon Mar 16 13:40:53 EST 1998