Wednesday, November 13, 2013

Quantifying the Unquantifiable - Sabremetric Phish

There is an ongoing debate amongst baseball analysts and sportswriters over statistics.  Or more honestly, there is a debate over the use of certain statistics as it comes to establishing value.  For decades the prevailing wisdom was that Batting Average, Home Runs, and Runs Batted In (for a hitter at least) were the most important statistics in determining who the best players were and who was worth the largest contracts.  Other counting statistics like Stolen Bases, Hits, and Walk, although important, were held to a lesser standard.  Slowly, however, a different way to quantify a players excellence and value was developed by Bill James (amongst others).  These sabremetricians (SABR standing for Society for American Baseball Research) developed new ways of looking at existing data and combined the new statistics to try to holistically view and compare players.  It was determined that RBIs were heavily dependent on a teammates ability to get on base and thus were not directly in a players control.  It followed that the most important thing a hitter can do is get on base (and therefore not get out). Therefore, why shouldn't walks be essentially counted as hits.  No longer was batting average used as a gauge but instead used OBP (On base percentage).  This is greatly simplified from the actual practice but the message was clear to the new breed of analysts: if we have all of this available data, why not use it to do something meaningful.

So as times progressed, some baseball writers changed their meterstick to try to provide a more complete view point.  And one statistic that tries to incorporate all facets of a players game is Wins Above Replacement (WAR).  WAR quantifies the additional wins the player's team will have with him in the line-up over that of a theoretical "Replacement player"; a catch-all term for a readily available player in the team's farm system.  WAR summarizes a player's hitting, defense, base-running, slugging,ability to get on base, amongst other basic stats (there is pitcher's WAR as well which obviously use different basic stats but the concept is the same).   And this is where the debate starts.

Some people like WAR and it's ability to provide a quantifiable measure but other sportswriters and fans think that it takes the eye-test out of the equation.  In 2012, the American League MVP was clearly either Miguel Cabrera or Mike Trout.  In summary, Cabrera won the first triple crown (leading the league in HR, RBI, and BA) since 1967 and Trout led considerably in WAR 10.7 to 6.9 (due to a massive edge in defense and base running).  Cabrera also led his Detroit Tigers to the postseason while Trout, a rookie, and his Angels did not make the playoffs.  Quantifiably, Trout had the much better season (in fact one of the best WAR seasons ever and by far the greatest for a 21 year old) but  Cabrera had the more traditionally great season and won the MVP award in a landslide. 

And that (long-worded intro) brings us to Phish.

I've been intrigued for a while to trying to determine value of a given Phish concert.  The question was: Are there any quantifiable traits that are present that make show "A" intrinsically better than show "B"?  Now, before I get into this too deep, even the worst Phish concert is better than most other things.  But the impetus of this exercise came from a series of concerts I saw in 2009.  I was able to attend the 08-05, 08-13, and 11-28 shows of that year.  Using the Phish.Net ratings, the 11-28 show is the highest rated of the year (4.48 on a scale of 5) and the 08-05 show is one of the lowest with a rating of 2.93.  I loved both of these shows!  But clearly the public was a bit more split.  So in the interest of nerd-dom everywhere, I decided to try to make a stat that would quantifiably look at shows.

I began by trying to determine what traits of a concert are deemed valuable by Phish fans.  Clearly, this is not a comprehensive list (and may not be agreed upon by everyone) but I came up with Song Length, Show Gap, Debuts, Segues, and Encore Length.  When these all were calculated with weighting factors, I ended up with a range that was 1.46 for the lowest show and 3.71 for the highest show.  I looked at all shows for 2009, and using the equation that was developed, the average or "replacement" show was 2.01.  This number assumes that all songs played were exactly the average length for the calendar year 2009 (i.e. if the average length of Golgi was 4:45 during 2009, the version of Golgi played during the show in question was 4:45 as well), an average show gap of 16, 0.6 Debuts per show, an average number of segues, and an encore length of about 11 minutes. Shows with a VORS (Value over Replacement Show) greater than 2.13 can be seen as above average with this metric.  The results are shown here:


To better illustrate the differences between Phish.net ratings and the quantifiable look based on VORS, a simple chart can be created which shows the correlation between the two:




In general, the R squared shows that the general correlation between the two ratings is there.  Which makes sense.  There should be a correlation between traditionally rated great shows and a statistic that, without listening to any of the songs, tries to gauge the intrinsic greatness. There are some outliers shown but with further refinement of VORS, the alignment should be more defined. The two shows that are rated the highest by VORS, were the Halloween show and November 1. Those shows are skewed slightly due to the Exile set debuts and the bustouts during the acoustic set on 11/1.  The only other shows that present as slight outliers (with high value) are March 8, June 14, and August 2. By the metric they derived most of their value from an extremely long encore, the collaborations with Bruce Springsteen at Bonnaroo, and slightly longer than average songs and long encore respectively. However the Phish.net reviews aren't quite as high due to a myriad of reasons (probably because there is less importance placed on encore length when reviewing a show).  

The maligned, August 5th show mentioned above had an above average rating by VORS of 2.15 based on a 20 minute DWD and the bust-out of Oh! Sweet Nuthin' (seriously the second set is awesome)!

Conversely, Phish.net has significantly higher ratings than as would be predicted by VORS for 8/7 and 12/30.  Listening to both of these shows they are both very well performed shows and are highly recommended but illustrate that something is missing from the quantitative statistic.  

One thing I wish was required for show reviews on aggregate sites would be a classification on whether or not the show was attended in person or upon listening.  This would eliminate the subjective bias, because, clearly, once a concert occurs, it is a pretty abnormal occurrence to be able to experience it again visually from start to finish. Plus, with the importance of being in the Lot Scene and general camaraderie that is experienced being at the concert itself, I think the bias can not be fully avoided.

I think there is valuable information that can be gleaned from this exercise.  As fans we realize that time can sometimes be short when selecting a show to listen to at a given time and that reviews on websites can frequently (and honestly should) skew towards the subjective viewpoint very easily. It's a cliche for a reason: My favorite concert I've seen was the last one I attended. Hopefully, as I continue this analysis, refine the metric more (probably incorporating more objective measures and lessening the weight on encores), and apply to other years more hidden and overlooked gems can be found.