Friday, January 19, 2007

PSP season over

Vlad fell just short, finishing with 72 homers as we completed our season sweep of the A's to finish 125-37. He actually did get a hold of one, but it ran about 5 feet foul, and then he grounded out. Vlad hoped for one more chance in the bottom 8th, but the Angels' inning ended as Dallas McPherson flied out to deep center with Vlad on deck.

He hit .411 with 72 HR and 169 RBI, and finished second in the batting race to rookie shortstop Brandon Wood, who hit .461-52-159. Soriano chipped in 57 homers, Kotchman 35, and McPherson 27.

For the pitching, Bart came back strong with a 30-2 record, 1.53 ERA, 276 innings, 182 hits, 21 walks and 263 strikeouts. John Lackey was 25-4 with a 2.82 ERA and 235 strikeouts, Escobar was 18-6, and Santana 13-6. K-Rod saved 40 games in 42 chances, and Shields pitched 94 innings with a 3.15 ERA and even saved 9 games.

Next stop: Winning our second straight world series. Should be easy.

Thursday, January 18, 2007

Sammy Sosa

Just in case, I did a CHONE projection on Sammy Sosa. Playing in Arlington, I have him hitting .241/.313/.424 in 310 at bats. Thats 13 2b, 1 3b, 14 hr, 30 bb, 2 hbp, and 77 whiffs.

That's probably an optimistic forecast, but its really not good enough for a poor defensive corner OF/DH to play in the majors. Then again, its probably a better forecast than anyone would have given Tim Salmon last year, and Tim did just fine.

It probably doesn't hurt Sammy to put an extra two years onto his waiting time for HOF eligibility either.

Wednesday, January 17, 2007

The 2008 Los Angeles Angels of the PSP

My current Angels season has so far been the most enjoyable. I'm playing as Brandon Wood, and earned a late season callup in 2007 as we won the world series. This year is even better, as we've passed the 120 win mark with 3 to play.

I've hit 49 homers so far, but Vlad is still the big Star. There is no Howie Kendrick in this game, nor is there a Napoli or Jered Weaver. Kennedy is still the 2B. Rivera was traded or released, but this team did add Soriano for left and Curtis Granderson for center. Kotchman and McPherson are living up to all their hype as well.

I was not supportive of the idea that we should have tried to offer Soriano 137 million or anything, but he is pretty fun to have on the team. He's got over 50 homers. There are 3 games to play against Oakland, who also have a loaded team. We've tried to keep them out of the playoff picture, sweeping the season series so far, but they have the wild card right now with a slim lead over the White Sox. They can thank less stingy computer ownership for keeping Barry Zito and bringing in Jason Bay.

In game 159, Escobar earns his 18th win, but the big story is Vlad. With 2 weeks to go he had about 66 homers after a hot streak and started thinking he could erase Bonds from the record book. I've had a little trouble since then, hitting the ball hard but not out often enough. With a 14-1 lead in the bottom 8th, Angels needed 2 runners to reach in order to bring Vlad, now at 69 homers, up one more time.

After one out, Jeff Mathis singled, bringing up the top of the order. Granderson singled. Now Kennedy had merely to avoid the DP. His orders were to swing at nothing from Joaquin Benoit. He didn't, and wound up walking to load the bases.

Then Vlad guessed a fastball, and got it. Hit to center field, not a no-doubter, but a high fly that crept to the fence. And then just barely over it for a grand slam. Quite an enjoyable afternoon commute on the train.

Now he just needs 4 homers in 3 games against the A's, and Bartolo Colon, healthy once again, needs 1 more win to get 30.

Friday, January 12, 2007

A post DIPS world

Tango Tiger links to a study detailing how Barry Zito keeps his batting average against so darn low. He's doing his best work vs righthanders. It makes sense. All the years of watching Barry from the other side, when he's on his game it always seems to me that lefty hitting Garret Anderson is most likely to come through for the good guys.

I've got Barry at a .271 BABIP for 2007, well below average. One clear example of how big a difference CHONE sees in pitcher's ability in this regard is Johan Santana vs Ben Sheets. Prorating the injury prone Sheets to match Johan's 212 innings, they match up pretty well in the defense independent categories:

(Sheets, Santana)
walks: 37, 42
whiffs: 228, 232
homers: 22, 23

But in hits its Sheets, 190, Santana 159, pretty much the difference 50 points in ERA. BABIP: .303 to .262. Some of that is a Minnesota defense expected to be much better than last year and well above average, Punto and Bartlett did a much better job than Castro and Batista. Some of that is the pitcher's own performance in terms of hits allowed, and some is due to their batted ball data.

Batted ball data helps me keep more of the signal through the noise. Without it I'd have to regress more to the mean, and my projections would have a smaller spread in BABIP.

Pitch by pitch data may be able to take us further by giving information about quality of strikes, a pitcher's ability to keep out of the middle of the plate. Smarter people than me are working on it.

Thursday, January 11, 2007


The St Louis Cardinals resigned Mulder even though he's going to miss much of the 2007 season. The first number I read was 2 years, 18 million, but ESPN is reporting 2 years, 13 million guaranteed, and 3 years, 45 million if he makes 30 starts in 2007 and 2008.

Looks like the Cards are safe on 2007. Viva El Birdos does the research on pitchers coming off rotator cuff surgery, and it doesn't look good. If he comes back as the 2001-2003 Mulder, he'd be worth 45 million and more, but that seems like a pipe dream. The Cardinals might be lucky just to get the 2005 version of Mulder back by 2008. Seems like a waste of money to me, but I don't know how the incentives work if he misses half of 2007 but pitches 30 starts and a 4.75 ERA in 2008. Would that make his deal worth 15 million for 2009?

Randy Johnson not dead yet

That was my first article for The Hardball Times. One question on BTF raised the possibility of selective sampling. The first group of 25 pitchers I looked at, 4 were dropped from the study for not throwing 50 innings the following season. I don't think its a big deal, its really something you can't get away from anytime you look at consecutive years in baseball for any kind of study. Players don't play forever. Sometimes they aren't there the next year. I thought at first having 19 of 23 (excluding Johnson and Buchholz) in the sample was pretty good.

But I'll look deeper: The 4 were Gene Brabender (a HOF name if there ever was one) 1970, Dick Drott 1958, Jim Abbott 1996, and Craig Anderson 1962.

Brabender: 1970 was the last year of his career. Up to that year, he allowed about as many runs as predicted.

Drott: Only 21 that year. The year before he allowed slightly fewer runs than expected. In 1959 pitched only 27 innings. He pitched mostly in relief after that, and combining his 1960, 61, and 63 lines he pitched 251 innings and allowed 22 more runs than expected.

Abbott: Truly awful in 1996. He was wild, hittable, tateriffic, couldn't strike anyone out, and being a bit unlucky to top off the package of suckitude. That's how you get a 2-18 record. Up to that year, he had allowed fewer (27) runs than expected over his first 7 seasons. He didn't pitch in 1997 (mercifully), came back in 1998 and 1999, and allowed about as many runs as expected.

Anderson: His 1962 season was for the Mets, where he was 3-17. He had pitched 38 innings the year before, and 22 more in two seasons after. Being a 1962 Met makes you unlucky by definition.

Of the 4, Drott is the only one that would make this appear to be more of a persistent ability than the study showed - but his results aren't going to make a dent in the larger group of over 8400 pitchers.

Saturday, January 06, 2007


This is it, the last projections I'm going to do before the 2007 season ends. I really need to hold myself to that. I do this because I like it, and for every tiny improvement I can put into the projection, I probably think of 10 others I could do if I put about 50,000 more hours into it.

Once I assemble the team projections, I may add pitcher wins and losses in, but there should be no other changes. I really need to move on to other things, like have a life.

I've added runs over replacement to the pitcher projections. Replacement level is team specific, and I get this by feeding a generic replacement level line into the system. I haven't put too much effort into this. I'm using a starter replacement level 6% higher than that for relievers. I treat anyone with 120 or more projected inning as a starter and all other as relievers. I just wanted a quick measure to see where Barry Zito ranked, there's a lot of questions on him: Is he an ace or just a mid-rotation innings eater? I have him ranked around #40, so ideally, he's not a #1 but a #2. The last projection change I made was converting Joel Piniero to a reliever. He gets better, but he's not really any better than Donnelly. He could pitch OK, keep the job, and rack up 30-40 saves in a medicore way like Ryan Dempster, but I'm hoping the Donkey outpitches him and takes the job.

For hitter's I've added weighted on base average for each hitter - a better measure than OPS. I'm guessing where each player will hit in their team's batting order and using that to project Runs and RBI. All non-starters have a default '8' for lineup position.

Download away here: CHONE

Friday, January 05, 2007

Newest Projections

I have probably spent 20 hours a week during the offseason doing things related to the projection system, especially the pitchers. I know that no matter what I do its going to barely improve the projections overall. I may not have a chance to beat PECOTA, since I have not built a similarity score program or anything like that. Its an obsession, and at some point I need to call it quits, release some final numbers, and then just sit back and watch baseball for the year, and not worry about every little thing that might improve the projections until next offseason.

For the hitters I have:

A) added a 4th year to the sample, though it is weighted much less than the other years.
B) Changed the weights on each year
C) added some modifications to the aging curve based on player type. This is still in the rudimentory stage.

Using the same sample of players with 500+ AB that I ran the projection test with, the R for CHONE improves from .677 to .703. That needs to be taken with a grain of salt, since I already knew the 2006 outcomes before making these changes, but I promise I didn't "best fit" to my little 100 player sample. I could have easily beaten all other projections in hindsight by putting in a line some like this:

IF playername.last rhymes with "Why" then HR > 40.

But I didn't do that. I looked at every player from 1982-2005 and used 4 years of data to predict the 5th year. Anyway, we'll see how it turns out.

I've spent a lot more time on the pitchers.

A) Cleaned up a lot of formuli. I made calculations more rigorous in places I was taking shortcuts.
B) Added batted ball data to model BABIP and HR allowed. I'm sure I'm not the only one to do this, but I don't think every forecasting system does this yet. It really helps on the homeruns. Both Derek Lowe and Javier Vasquez gave up a bunch of homers in 2005. Batted ball data tells you which one stays high and which one is a fluke. I don't know how this will work on BABIP, but its worth a shot.
C) Improved MLE's for pitchers

I use the system to project innings, strikeouts, walks, hit batters, and homers. I do not have a calculation for doubles and triples, I just assumed an average distribution for each pitcher between doubles, triples, and singles.

Once I've added batted ball data, this must be changed. Flyballs are turned into hits at a lower rate than grounders, and this is reflected in the player's hit total, but flyballs are also more likely to become doubles and triples. This has been fixed. What shocked me is how small the effect is though. Pitchers in general give up, not counting homers, 1.27 bases per hit. For the 10 starters with the highest groundball rate, its only a bit lower, 1.25. For the biggest flyball pitchers, its only a bit higher, 1.29.

I looked at the before and after ERAs for Jered Weaver and Chien-Ming Wang. Weaver baloons from 3.38 to 3.40. Wang drops to 4.09 from 4.16. Its not much but I'm not looking for every last bit to make these projections the best I can. Sometime soon I will finish CHONE v2.1.