I have probably spent 20 hours a week during the offseason doing things related to the projection system, especially the pitchers. I know that no matter what I do its going to barely improve the projections overall. I may not have a chance to beat PECOTA, since I have not built a similarity score program or anything like that. Its an obsession, and at some point I need to call it quits, release some final numbers, and then just sit back and watch baseball for the year, and not worry about every little thing that might improve the projections until next offseason.
For the hitters I have:
A) added a 4th year to the sample, though it is weighted much less than the other years.
B) Changed the weights on each year
C) added some modifications to the aging curve based on player type. This is still in the rudimentory stage.
Using the same sample of players with 500+ AB that I ran the projection test with, the R for CHONE improves from .677 to .703. That needs to be taken with a grain of salt, since I already knew the 2006 outcomes before making these changes, but I promise I didn't "best fit" to my little 100 player sample. I could have easily beaten all other projections in hindsight by putting in a line some like this:
IF playername.last rhymes with "Why" then HR > 40.
But I didn't do that. I looked at every player from 1982-2005 and used 4 years of data to predict the 5th year. Anyway, we'll see how it turns out.
I've spent a lot more time on the pitchers.
A) Cleaned up a lot of formuli. I made calculations more rigorous in places I was taking shortcuts.
B) Added batted ball data to model BABIP and HR allowed. I'm sure I'm not the only one to do this, but I don't think every forecasting system does this yet. It really helps on the homeruns. Both Derek Lowe and Javier Vasquez gave up a bunch of homers in 2005. Batted ball data tells you which one stays high and which one is a fluke. I don't know how this will work on BABIP, but its worth a shot.
C) Improved MLE's for pitchers
I use the system to project innings, strikeouts, walks, hit batters, and homers. I do not have a calculation for doubles and triples, I just assumed an average distribution for each pitcher between doubles, triples, and singles.
Once I've added batted ball data, this must be changed. Flyballs are turned into hits at a lower rate than grounders, and this is reflected in the player's hit total, but flyballs are also more likely to become doubles and triples. This has been fixed. What shocked me is how small the effect is though. Pitchers in general give up, not counting homers, 1.27 bases per hit. For the 10 starters with the highest groundball rate, its only a bit lower, 1.25. For the biggest flyball pitchers, its only a bit higher, 1.29.
I looked at the before and after ERAs for Jered Weaver and Chien-Ming Wang. Weaver baloons from 3.38 to 3.40. Wang drops to 4.09 from 4.16. Its not much but I'm not looking for every last bit to make these projections the best I can. Sometime soon I will finish CHONE v2.1.