Bayes factors for comparing models --------------------------------------- Example: Modeling Mike Schmidt's homerun data using Poisson and binomial distributions.) In 1980, Mike Schmidt hit 48 homeruns and the dates of each homerun are recorded. The baseball season can be divided into 26 weeks (week 1: April 7- April 13, etc) and the number of homeruns for each week recorded. The counts of homeruns for the 26 weeks, y1, ..., y26, is given below: 0, 2, 2, 1, 4, 0, 3, 5, 1, 2, 0, 1, 0, 1, 1, 3, 1, 0, 5, 3, 0, 1, 2, 3, 3, 4 Consider the following two models: M1: The yi are independent Poisson with mean lambda. lambda is assigned a Gamma(alpha, alpha/mu) prior with alpha=10, mu=1.5. (The mean of lambda is 1.5 -- you think that Schmidt will hit an average of 1.5 homeruns per week. M2: The yi are independent Binomial with sample size 21 and probability p. Schmidt has 21 at-bats each week and yi is the number of successes (homeruns). p is the probability of a homerun at a single at-bat p is assigned a beta(alpha, beta) prior with alpha=1.4, beta=18.6. (you think that p is in the neighborhood of .07 -- this roughly matches the prior information given in model M1) The Bayes factor is BF_12 = (y | M1)/P(y | M2) We'll illustrate computing this two ways. 1. (Simulation) Suppose that our data is the number of homeruns in a single week -- call this y. We simulate values of the prior predictive distribution for each model. We show this below using words and matlab commands. M1: (a) Simulate lambda_1, ..., lambda_1000 from Gamma(alpha, beta ) (b) Simulate y_1, ..., y_1000, where y_i is Poisson (lambda_i) On MATLAB, use commands: lambda=gamrnd(10, 1.5/10,1000,1); y1=poissrnd(lambda,1000,1); M2: (a) Simulate p_1, ..., p_1000 from Beta(alpha, beta) (b) Simulate y_1, ..., y_1000, where y_i is Binomial(21, p_i) On MATLAB, use commands: p=betarnd(1.4,18.6,1000,1); y2=binopdf(21,p,1000,1); Here are the results of the 2 simulations (frequences given): y 0 1 2 3 4 5 6 7 8 >=9 --------------------------------------------- p1(y) 232 343 219 116 61 19 7 2 1 0 p2(y) 368 280 168 100 37 15 16 7 3 6 If, for example, if Schmidt hit 1 homerun during this single week, the Bayes factor in support of M1 is BF_12 = 343/280 2. (Direct Integration) For the complete data for the 26 weeks, it is more efficient to use direct integration. Write the predictive density as p(y) = integral p(y|theta) p(theta) dtheta For each model, one can evaluate the integrals directly. MATLAB functions were defined to compute the predictive densities. For this data log p_1(y_1,...,y_100) = -47.3059 log P_2(y_1,...,y_100) = -48.5041 BF_12 = exp(-47.3059+48.5041) = 3.3141 So there is some evidence to prefer model M1