The dataset consists of the date (month, day, and year) of each of the 548 homeruns that Mike Schmidt hit during his major league career. His first homerun was hit against the pitcher Balor Moore of the Montreal Expos on September 16, 1972, and his final homerun was hit May 2, 1989 against Jim Deshaies of the Houston Astros. To perform some exploratory analysis on this data, it is convenient to reexpress these homerun dates in terms of the number of days into the baseball season. In the years that Schmidt played, major league baseball started typically on the first Monday in April and continued for 26 weeks. We denote the first day of the season in April as ``day 1" and the final day of the season (in September or October) as ``day 182". With this reexpression, we can view Schmidt's homeruns during a given year as a sequence of day numbers (from the beginning of the season) that each homerun was hit. For example, in 1973, Schmidt hit 18 homeruns at the day numbers 21, 44, 76, 78, 79, 83, 87*, 93, 96, 119, 125, 131*, 148, 163, 170, 173, where the asterisk indicates the day numbers on which he hit two homeruns.
Figure 1 plots the dates for all of Schmidt's homeruns during his career. The date of each homerun is plotted as a function of the year and the day number. A dot indicates that at least one homerun was hit that particular date. Dots that are circled
correspond to days on which Schmidt hit two homeruns. An + symbol is used to indicate Schmidt's two three-homerun days (July 7, 1979 and June 14, 1987) and a
indicates the day in April, 1976 on which Schmidt hit four homeruns.
This graph provides an informative profile of the pattern of Schmidt's homerun hitting over his 18-year career. The most striking feature of this display is the generally consistent spread of homeruns across years and days. Starting with 1974, we see that Schmidt hit a regular pattern of homeruns from the beginning to the end of each season until 1987, his last successful season. The only notable gap in this fourteen year span is 1981 in which no homeruns were hit in the middle of the season. This gap is due to a baseball strike during this year that started in June 11 and lasted for 50 days. We also note the large number of circled dots. On 43 days, Schmidt hit two homeruns, on 2 days he hit three homeruns, and 1 day he hit four homeruns.
If we take a closer look at Schmidt's homerun hitting profile for a given year, we note the presence of clustering and gaps in the day numbers of homeruns. For example, suppose we focus on the year 1974 which was Schmidt's first good year for hitting homeruns. Note that Schmidt's homeruns are approximately regularly spaced from day 1 to day 60. At this point, Schmidt appears to get ``hot" and hits 9 homeruns between days 60 and 80. Next, his homerun hitting gets quiet until day 120 and then there are two clusters of homeruns -- he hits six homers about day 120 and seven about day 140. If we look across all years, we notice some particularly large clusters and gaps in the data. In 1977, we see that Schmidt hit 17 homeruns between days 60 and 100, and in the early part of 1976, he hit 10 homeruns in a 10 day span. Also, we notice gaps where Schmidt hit no homeruns. In 1982, for instance, Schmidt has two large gaps between days 1 and 30 and between days 40 and 60. In 1987, Schmidt hit no homeruns between days 40 and 65.
Let's investigate further the clustering in Schmidt's homerun hitting behavior. For each day during the season, we can compute the average number of homeruns that Schmidt hits in a time interval centered about that day. If we consider an interval of width w days, then the average number of homeruns hit during the interval k to k+w-1 days is the number of homeruns hit during these days divided by the width w. If we do this for all intervals 1 to w, 2 to w+1, 3 to w+2, and so on, we obtain a set of moving averages. Figure 2 plots these moving averages against the midpoint of the interval for Schmidt's homerun data for his ``consistent" years 1974-1987 using a width of w = 10 days.
This moving average plot dramatically displays the instability in Schmidt's homerun hitting behavior that we began to see in Figure 1. For example, consider the year 1977, where Schmidt was a relatively young ballplayer. His homerun hitting was relatively weak early in the season with two small peaks and two gaps when he didn't hit any homeruns. Then in the middle of the season, he caught fire. His moving average remained over .5 for a large number of days, indicating that he was hitting a homerun about once every two days during this period. Then he cooled down and hit homeruns at a consistent smaller rate (.2 per day) for the remainder of the season. Schmidt appeared to display similar stretches of hot hitting other times during his career. Indeed, many times Schmidt's moving average exceeded .5. His largest moving average of 1 occurred during the first month of the 1976 season -- this moving average of 10 homeruns in 10 days included Schmidt's four homerun day.
It is interesting to note that there appears to more instability in Schmidt's homerun hitting during the early part of his career. The patterns in the moving averages for the early years 1975, 1976 and 1977 appear more volatile than in the later years 1985, 1986 and 1987. These differences could possibly be explained by the change in Schmidt's batting style over the years. He began his career with a big swing and evolved into a contact type of hitter with a shorter swing. The big swing produced some inconsistent batting behavior for Schmidt during the early part of his career. It produced many homeruns but also many strikeouts. His shorter swing during the later part of his career produced more balls hit in play and fewer strikeouts. Perhaps this change in batting style is also reflected in his pattern of homerun hitting.
The moving average computation is useful in looking at the clustering in the data. Another way to look at the instability in Schmidt's homerun hitting is to focus on the spacings between the dates of successive homeruns. For example, in 1973 Schmidt hit his first six homeruns on the day numbers 21, 44, 76, 78, 79, 83. The spacings for this set are 21, 23, 32, 2, 1, 4 -- this indicates that it took 21 days for Schmidt to hit his first homerun, 23 days to hit his second homerun, and so on. Figure 3 plots histograms of the homerun spacings for each of his consistent years from 1974 to 1987. Generally all of these spacing distributions are heavily right skewed. We note the large number of small spacings -- it appears that the largest number of small spacings occur during the early years of Schmidt's career. We also notice the number of large spacings. In particular, we see that Schmidt had 13 spacings larger than 20 days. In other words, there were 13 instances during his career where Schmidt had gaps of 20 days or longer between homeruns. (This includes the gap during the strike of 1981.) As in the moving average graph, one may think that Schmidt's large spacing values indicate that he had inconsistent homerun hitting behavior.