Saturday, December 30, 2006

Good Casey, Bad Casey

My projection of Casey Kotchman for 2007 is .264/.333/.400 in 291 at bats. That really isn't good enough to play 1st base in the majors. Kotchman certainly has the potential to do better than that. I really have no idea how he'll do in 2007, but that projection is a midpoint for a player who has shown remarkable ability at times, but also has suffered through horrendous slumps and most recently missed an entire season (well, he should have missed the entire season, its not like he helped any when he was playing) to mono.

Good Casey hit a combined .370 between AA and AAA during the 2004 season, at age 21. In 2005, he finished the year with the big club as probably their second best hitter after Vladimir, hitting .278/.352/.484.

Bad Casey played in the majors in 2004 and hit only .224 with no power. Between his .368 and .372 minor league stints, he seemed like a deer in the headlights and followed the Gary DiSarcina approach to hitting, his only goal seemed to be not to strike out. Bad Casey returned when he went to AAA to start the 2005 season, and he came on strong after a horrendous early season slump, but still hit only .289 in 363 at bats, and when you are in a hitters park in a hitters league, that isn't very good.

Combining minor league numbers (translated MLE's) for 2004 with major league numbers for 2005, I get good Casey. Combining minor league 2005 with major league 2004, I get bad Casey.

Good Casey hit, per 150 games, .292 with 36 doubles and 15 homers. He struck out 65 times with 38 walks. That's a fine young player, a Wally Joyner/Mark Grace/Sean Casey type. That's what everyone seems to think he should be. Comparisons to Rafael Palmeiro have been made in the past, but should be considered invalid unless Casey can get past MLB testing and use some illegal viagra or something.

Bad Casey, though, hits only .232 with 27 2b and 8 HR per 150 games, though his K/W ratio is remarkably similar at 63/41.

My projection, once again, is a midpoint. He's young enough to improve, and hopefully will finally be healthy. Here's hoping He's healthy enough to let good Casey play and we never see bad Casey again.

Thursday, December 28, 2006

Prediction Results

Well before the 2006 season, I made some predictions of the final standings. If you want to check out the original posts, check my archives for around last January. How did I do? Not too bad really. Among the highlights:

-All 6 division winners were predicted to at least tie for for their division leads. I predicted the Padres to tie the Dodgers, and they actually did, but were awarded the division on technicality.

-The National league East order was predicted exactly.

-I predicted the Cardinals to be the best team in baseball, and they wound up winning the world series.

Of course that's only part of the story, I had the Cards winning 93 games, they only won 83, not playing like the team I expected until the playoffs. I missed the Tigers, having them as a .500 team. I also predicted the Pirates to be .500 and the Cubs to win 86. While I correctly listed the Mets, Phillies, and Braves in that order, I took a stand against the numbers and declared the Braves would win the division anyway. I didn't care if my system had the Mets at 91 and Braves at 88, after 15 years I expected the Braves to somehow do it anyway.

I am most proud of my predictions for these teams:

Nationals - 71 wins right on the nose.

Astros - I was off by 6 wins, but I made my prediction on the assumption Clemens would not pitch. Had he not, I probably would have nailed it. I thought the 2005 NL Champions were a team of has beens, and they were.

Twins - I felt like a moron for picking the Twins early on, when they fell behind, but they came back and did me proud.

I am least proud of this one:

Angels - When I pick them to finish out of the playoffs, I don't like being right.

Here is a prediction review from Diamond Mind

Using that method to score standings, mine came out at 45, among the top 20% or so. Diamond Mind did beat me in standard error of wins, 60-67 though. Its a better measure than using standings, but impossible to use since many people just predict finish and don't give you wins and losses. With Standings, the Cards finish 1st as predicted, so zero is added to my score. With standard error, I get docked for being 10 wins off their final record.

Its almost January, so pretty soon it will be time to start predicting the 2007 season.

Sunday, December 17, 2006

The Limits of a Projections System

How accurate can a projection system be?

Tango Tiger asked this question earlier this year, and his answer was 0.73

Pecota actually beat this for the hitters in my sample this year (.736), although that could be explained by random variation, we know that a projection, no matter how sophisticated, would be able to consistently beat a person or divine entity who knew, exactly, what a player's true ability was.

How can we go beyond the theoretical approach and test how accurate "perfect" projections can be? You'd have to be a Baseball God. It just so happens that to the players of the APBA Major and Superior Leagues, I am the Creator. In 1982, I said, "Let there be baseball" And there was baseball. And I saw it was good.

25 years later, my league has 30 teams. I tried the same test for my league that I put all the other projection systems through, minimum of 500 at bats, correlation of OPS. The result was r = .770.

I've only looked at one year, but if that holds up, that is our limit.

Batted ball charts

I feel the need to give a plug here for a completely free application that has provided me with a lot of fun in the past few days. Dan Fox, Baseball Prospectus Writer and Blogger, has created a chart that will give you a breakdown of batted balls (pops, liners, grounders, and flies) for every hitter and pitcher for 2003-2006, and broken down by what side of the plate the batter hits from. If that's not enough, you can see these further broken down by what part of the field the ball is hit to. You can find the program here: Charts Program

Of course charts aren't enough for most statheads. What good is data if you can't bring it into a spreadsheet or database? The data behind the charts is easily accessible enough for that, and I've created a junk stat from that.

Line drive to popup ratio, minimum 250 balls in play, 2003 to 2006:

1. Larry Bigbie ?! 15.4
2. Bob Abreu 10.9
3. Cory Sullivan 10.6
4. Derek Jeter 10.6
5. Joe Mauer 10.2

Other notable players doing very well here are Michael Young, Julio Franco, Lyle Overbay, and Ryan Howard. Down at the bottom are Jose Bautista, Henry Blanco, Marcus Thames, Rod Barajas, and Jonny Gomes.

If it measures anything, this measures the abilty to center the baseball, as a line drive is what good hitters are looking to do, and a popup is the most obvious failure given you actually made contact with the ball. But as I said, its a junk stat, I'm not claiming its predicitve of anything.

Ryan Howard is particularly impressive, as he both strikes out a lot and hits a ton of homers. I'd think more of his failures would be popups, like Gomes and Thames, but he's been able to avoid that so far. If we had these stats going back to the 80's, Wade Boggs would be off the charts. He rarely popped the ball up to the infield.

Tuesday, December 12, 2006

Myth Busting

I have read something to the effect that minor league pitchers who allow a very high batting average on balls in play should raise a big red warning sign. The theory is something like this: Unlike a high BABIP from an established major leaguer, which you might assume to be a product of bad luck, it means the pitcher does not have the stuff to translate into the big leagues. So even if he has a good strikeout rate, he will suffer a worse translation to his peripherals (HR, BB, SO) than a tough to hit prospect pitcher.

I don't remember who said this first, where I heard it, or if anything was published on it or not, but I thought I'd check it out.

I looked at all AAA pitchers in the last 5 years who also pitched in the Major Leagues, using the matched innings method. The minor league pitching stats were regressed to the league average. My sample size is pretty good here, 31,945 innings. The AAA to majors factors for 2002 to 2006 have been:

BB 1.28
SO 0.83
HR 1.40
BABIP 1.01

The ratios are walks/batter faced, strikeouts/batters faced - walks, and HR/fair contact (BFP-K-BB).

Now, I look at only those pitchers who had a high (.325 or above) BABIP. My sample size is 7803 matched innings. My translation rates are:

BB 1.24
SO 0.80
HR 1.35
BABIP 0.942

Not a lot different. The peripherals are all close enough that we can assume its just sampling error. The hittable pitchers do a little worse in strikeouts than pitchers overall, but actually do better in walks and homers. The BABIP is actually lower in the majors, since we have selectively sampled pitchers with high minor league BABIP. I think its safe to say that they were a bit hit-unlucky in the minors.

In conclusion, its a myth that a high BABIP in the minors dooms a prospect. Its always good to not allow hits (duh, that's why no-hitters are so cool), but if you've got two prospects, and both are equally excellent in walks, homers, and strikeouts, and one has a higher BABIP, take the one who gives up fewer hits. But don't give up on the other guy so quick. He can probably pitch too.

This is not what I expected. I thought there might be an effect, and was looking to measure the effect, redo the CHONEs, and knock Jason Windsor down a peg. He's a classic high hit, good peripherals pitcher in the Oakland system. But there's nothing here. His numbers are every bit as likely to translate into decent major league pitching as most good AAA pitchers. Oakland isn't going to fall apart without Zito. Angels are going to have to step it up a notch to reclaim the west.

Monday, December 11, 2006

Pecota

I finally tested Pecota vs the others. For hitters with minimum of 500 AB, Pecota had an r = .736. Very impressive, beating Ron Shandler's second place of .702. Their method of finding comparable players might just be on to something.

For pitchers, they were at .451, better than several others, but in second place after ZIPS.

Sunday, December 10, 2006

Pitcher Projections

As a rule, they suck. Much, much harder than projecting hitters.

For pitchers with at least 100 innings last year, Chone came in at .424, way lower than the near .700 range for hitters. This represents the correlation coefficient between projected ERA and actual ERA. The other projection systems didn't do much better:

.459 ZIPs
.445 Baseball Info Solutions (Bill James has nothing to do with this although its in his Handbook, he claims it can't be done)
.442 Marcel
.423 Shandler

Still waiting on the return of my 2006 Prospectus, so I can check PECOTA.

How about souting info? I tried putting together some projections based solely on a regression formula from scouting reports, with things like average fastball velocity, types of other pitches, quality of pitches, etc. Didn't do so hot:

.265. Of course, I'm no great scout, and a real scout attempting this with better scouting data might be able to do better.

The projections are better than using prior year ERA (.29) or prior year Fielding Independent Pitching (.37), though not much better than the latter.

I've heard some people claim that you shouldn't test a pitcher projection by looking at ERA anyway, you should look at component ERA or something. I don't buy that. I need ERA (or at least run average). Whether for fantasy baseball, where you need ERA to get results, or real baseball, where you need to keep real runs off the scoreboard, I don't give a damn if you can accurately predict component ERA. If you can't predict the real thing, then your system is useless.

Pitcher projections are a very inexact science. The only thing good I can say is as bad as they are, its better to have some flawed, not especially accurate idea of how a pitcher will do that to have no idea at all.

Tuesday, December 05, 2006

Projections 2.0

Revised 2007 projections. I've beefed up the formuli and if I'm lucky these might be 0.1% more accurate than the last run. I've left version 1.0 in the spreadsheet for comparisons, the minor leaguers have not been updated yet.

Runs and RBI have been added, as I've added players to the team they'll be playing for and where they should bat in the order for 2007. Default batting position is 8, that's what I have for all the backups.

CHONE


Next I'll work on pitchers.

Update 12-10-06: Pitchers added. Minor leaguers mixed in with the others, only the most current projections are here now.

Sunday, December 03, 2006

More Projection stuff

I've added in Marcel, Bill James, and Ron Shandler's projection OPS to the mix. I'm only looking at 114 players who had 500 or more AB, and I had to eliminate a few (Dan Uggla and Hanley Ramirez among others) because not all systems projected minor leaguers. Kenji Johjima had to go, as only ZIPs and CHONE even tried to project him (very well I might add). I don't have access to Baseball Prospectus 2006, so I'll look at Pecota in a week or so.

Here's the results:

Shandler .702
James .685
ZIPs .684
Chone .677
Marcel .664

So Chone is a little behind the more established systems, but at least it passes the monkey test. As I've read before, there's not a whole lot of room to improve. Shandler, the king of fantasy baseball, beats a first year amateur by 2.5%.

But there's something real interesting here. These algorithms all seems to have different workings. They like different players better than others. For example, Shandler had Paul Konerko at .921 - and Konerko did better than that, totally destroying the other projections, in the .850 range. But Shandler missed the boat on Grady Sizemore, projecting only .774. Sizemore beat all of our projections, but the other systems at least had him in the .800's

Take an average of the top 4 projection systems, and we can get an r of .714! (Adding Marcel subtracts from the combined r, all the others add). So the strategy should be to collect all the projection systems that pass the monkey test, combine them, and we have a projection that reaches uncharted waters in accuracy!

Give this a try, and dominate your fantasy league.

Projection system championships

How does the CHONE system compare to other systems?

I tried looking at 2006 CHONE projections, compared to ZIPS, published on Baseball Think Factory by Dan Szymborski. I've only looked at hitters, and the typical method is to compare correlation for some summary level stat, such as OPS, EQA, or RC/g. In this case I'm using OPS.

Other systems worth checking into are Marcel the monkey (no relation), PECOTA, Bill James Handbook, and Ron Shandler's baseball forecaster. In addition, seems like everyone who publishes a fantasy baseball guide has a system. Each system should beat Marcel, the simplest. If you've got a statistical process that can't spank the Marcel monkey, then all your formuli and algorithms amount to nothing more than mathematical masturbation. I think Marcel's creator, Tango Tiger, has said that the best correlation a projection system can hope for is 70%, Marcel gets you 60%, and the advanced systems are around 65%. Or its 75/70/65, or something.

But I'm not sure what this really means, as you have to put a playing time cutoff somewhere. The higher your cutoff, the better your correlation should be.

So I looked at ZIPs and Chone to see how we did in the 2006 season.

Using 300 AB as a cutoff: ZIPS = .617, Chone = .615
400 AB: ZIPs = .648, Chone = .635
500 AB: ZIPs = .656, Chone = .661

Really close, but ZIPs gets the edge. I'll have to look at if either of us can pass the monkey test. I should be able to find that on the internet. In addition, I can look at Shandler and James, though probably only for the 500 AB cutoff, as I have those only in print and it involves quite a bit of data entry.