Measuring and Predicting Value

 

Big Theory Vocabulary Word: Value

            In political/moral economy, value is the idea that to an individual or society, something is worth more (or perhaps less) than its price.

        This was at the core of Marx’s theory: he separated use value (the utility of an item) from “value”  (which Marx defined as socially necessary labor time, or how much labor was required to make it), as well as from exchange value (how much labor time the item can fetch on the market (basically price configured in denomination of time instead of money))

       Oddly, aligned against Marx are postmodernists like Jean Baudrillard who argue that there is no “essence” of use value that exists without circulation, as well as market fundamentalists who believe price is the only efficient measure of value.

        Value is a word in the vernacular too, of course – people enjoy paying less than what it is worth to them (or the normal “price”) or getting the most for their money.    Also, some things are considered to have so much value, they are “priceless”

 

 

Positivism: Measuring and Forecasting

           Remember: Positivism is the stance that theory comes after collecting the data, and that the data should be as objective as possible

           The easiest way to compare data  “objectively” is to assign them numeric value

       Size, weight, speed – things that can be easily counted are the most common measurements developed (often whether they are significant or not).

       Beyond that, subjectivity often begins to creep in (purity, color, etc…) both in how the data is tallied and how the measure is codified

      Grades are a good example of this: you can develop “universal” assessments like multiple choice, but you have to weight them vs. other assessments; recognize some people are just better at writing their own answers; speak English as a second language, etc…

    But ultimately, grades are a way of ranking the skill of students, in terms of grade point average (certain levels of which are required to graduate, to be cum laude, magna cum laude, summa cum laude, etc…)
»    Also, arbitrary thresholds from A to B to C to D to F

       Good positivism should question the numbers themselves, and work to develop more significant (and predictive) measures

 

 

 

With that in mind…

            No sport is so associated with the keeping of numbers as is baseball

        Thus, if one were to attempt positivism to sports, the most obvious sport to start with is baseball.

            Note: I will throw out a lot of names of baseball statistics – unless they are on the study guide, you don’t have to know them.

 

 

Henry Chadwick – Inventor of Baseball Statistics

            Born in the UK in 1824 (half-brother was a famous advocate for urban sanitation; father was a writer)

        He moved to US as teenager; as young adult started playing various bat and ball games

       Becomes one of first reporters assigned to cover baseball

            Besides making rules and writing several important books about baseball, he most known as a statistical innovator

        He is the inventor of the box score, a numerical/abstracted representation of what happened in a game

        Besides advocating for counting of hits, strike outs, player runs scored, he also popularized two of the basic rate statistics which normalized performance regardless of number/length of appearances in games – thus allowed comparison of players.

       ERA – Earned Run Average (Runs allowed*9/innings pitched)

       Batting Average – Hits/Plate Appearances

 

Box Score

Other early statistics

            Walk (4 balls not in the strike zone, also not swung at) – Chadwick believed a walk should be counted as a pitcher error, not believing it was a batter skill.

        It is a both a pitcher and batter skill

            RBI (Runs Batted In) – Runs scored by players were counted since beginning of baseball stats; little credit was being given to the person who hit the runner in.

        But of course, to get RBIs, you had to bat in the right part of the lineup on a team with other good hitters.

      So once people decided RBIs were important, it devalued players with unfavorable lineup positioning.  

 

Other early statistics (cont.)

           Save – Invented by a Sporting News writer in 1960, as a way to try to recognize strong performance by relief pitchers (because they rarely won games and pitched fewer innings)

       But it is a very specific statistic, defined as (from Wikipedia):

1. He is the finishing pitcher in a game won by his team

2. He is not the winning  pitcher

3. He is credited with at least ⅓ of an inning pitched;

4. He satisfies one of the following conditions:

    He enters the game with a lead of no more than three runs and pitches for at least one inning
    He enters the game, regardless of the count, with the potential tying run either on base, at bat or on deck
    He pitches for at least three innings.

       Problem is that teams stopped using their best reliever in any non-save situation, in order to pad that stat, they also stopped their best reliever from pitching more than one inning

      According to one reporter, the save is “the only example in sports of a statistic creating a job.“

    In this case, that job is the “closer”, who gets paid more than all other relief pitchers
    Some pitchers say that it helps them mentally prepare to know when they come in
»    Now, many statisticians say it is a poor use of the best relief pitcher, who should be used in the highest leverage situation (which might not be the ninth inning)

 

Bill James

          Raised by a widowed father who was janitor, nearly graduated from University of Kansas with dual degrees in English and Economics; joined army a few credits short of graduation in 1971 (later awarded the needed credits)

          After leaving the army, began writing statistically driven baseball articles in the late 1970s, while working as a security guard and several other jobs

      He had a hard time getting magazines and newspapers to publish his non-narrative-driven articles, so began self publishing a gigantic baseball book, Bill James Baseball Abstract

      He began to find success in 1981 after he was profiled in Sports Illustrated by Daniel Okrent, who invented “Rotisserie” Fantasy Baseball (and thus was interested in better ways to predict player performance)

      His prominence rose to the point that he was a consultant on the Red Sox World Series teams.

      Also was involved in what has become the leading “analytics” company, STATS

 

Bill James (cont.)

           As you could tell by the reading, he was frustrated that despite being saturated with numbers, most baseball statistical analysis were really bad

       Lots of “thresholds” where 20 of something was great, even though it is not statistically that different from 15 of something

       Believed that “luck” (or basic  random probability) was a factor in baseball (since it is a factor in all other known data sets)

       Also believed that the ballpark the game was played in greatly impacted outcomes

      Everyone knows this now that Coors Field in Denver exists where batting stats are crazy, stupid high thanks to combination of altitude and low humidity.

       Thought the major categories (Runs, Hits, RBIs, HRs) overshadowed other aspects of play, like defense, extra base hits, and walks.

      Wanted to come up with a way to compare the complete value of players in different positions, but also in different eras

           While most of his stats are not currently famous (“Game Score” for pitchers has been getting a lot of play recently), he inspired the next generation of baseball stats people

       IMHO – only a few of his stats are truly complex, most are simple rate stats (but on items that no one was paying attention to).

      Which shows is method is more about thinking differently than fancy math.

 

Baseball Statistics from Below

           Built using the project-team based model that drove of much of the 1990s tech revolution, while making use of the publicly accessible platform provided by the internet, those who were inspired by James started websites.

       They developed a new group of statistics, called SABREmetrics (Society of American Baseball Research)

       The first major site fan statistical analysis site was Baseball Prospectus, and now also includes Baseball Reference and Fangraphs

           As much of the audience for this was fantasy baseball players: they wanted to be able to project future value (or reveal the true value behind statistics).  Some of these  new stats included:

       OPS (On-Base % Plus Slugging %) – Basically adds how frequently hitters get a hit or walk, plus the rate of bases gained per at bat (ie a double gains 2, home run 4)

       BABIP (Batting average on balls in play)  -- While it has a number of purposes for batters (shows good contact, speed to first, trouble hitting into shift), it also is a good measure of luck (thus proving James right).

      For example, if an unusually high or low BABIP occurs for a batter vs. career norms, either something changed or they are super lucky/unlucky

      Can use it for pitchers, or the alternative measure (XFIP), which unlike ERA, controls for fielding quality (which pitcher does not control)

 

Statistics from Below (cont.)

           WAR (Wins Above Replacement) – A value above mean  statistic, which takes into account batting, pitching, and defense.  Basically tries to quantify how many wins a player created above (or below) a league-average player

           There are also various projection systems that take into account age, previous year’s statistics (including minor league and foreign league performance), regression to the mean, position, place in the batting order and sometimes comparable players, which try to guess how a player will do next year.

       Some of these include:  Nate Silver/Baseball Prospectus’s PECOTA (Player Empirical Comparison and Optimization Test Algorithm)  and Fangraph’s ZiPS, STEAMER (developed by two high-schoolers and their stats teacher), and Fans (which is letting anyone fill out a survey, and rely on the “wisdom of crowds”)

      Crowds tend to be optimistic on stats, but have a good sense of playing  time and unusual career path players (like batters converted to pitchers)

    Thus, as Bill James says, stats are not everything, and some  things can be sensed more than measured.
 

Statistics from Below (cont.)

            Responding to the success of early technologies (like Pitch F/X, which shows exact trajectory/velocity of pitches) among the SABRE community and fantasy players, as well as the demands of teams (more on this in a minute), baseball introduced STATCAST

        Basically, it will make some improvement on batted ball data, but will take massive defensive data for the first time

        All baseball statistics compiled before 2006 could probably be stored on one thumb drive; STATCAST will require a server warehouse.

 

Moneyball

          Despite Bill James’s writing being well known since the early 1980s, (and the statistics he used to make his statistics were actually collected by baseball), none of his ideas were implemented until Billy Beane, manager of the low payroll Oakland Athletics, began doing so in the late 1990s with the help of Paul DePodesta.

      In the language of economics, Beane thought baseball was an inefficient market.

      He had to figure out how to generate more value for less money

    Basically, his choice would be to value attributes that still contributed to winning games that others did not pay for (like walks) or embrace those players with red flags that did not matter (like body type).
    As the review said, his team won 100 games in back to back seasons, amazing considering the A’s lost their three best batters after the first season (although they still had great pitchers).
»    The second season actually featured the longest team winning streak in baseball history.

 

Moneyball (cont.)

           Although the A’s never won the World Series, it basically made most other teams take seriously the idea of advanced statistics to help spend money more wisely.

       Most teams now have a stats department (including many of the people who started Baseball Prospectus), so it is harder to create the big advantages Beane enjoyed in the late 1990s/early 2000s

      Every once in a while, he finds one: like a few years ago, saw that most successful pitchers threw low.  So all his pitchers were lowball pitchers, and his hitters were particularly strong lowball hitters (to neutralize the other teams’ best pitchers). That bought him a year of relevance.

    He also makes trade gambles that don’t pay off, so he is not super human.

      Tampa Rays were also good at this (they have lowest major league attendance, thus need to maximize every dollar of salary)

       In fact, when everyone uses statistics (and you can assume most teams have the mostly assumptions as you), human scouting becomes crucial to capture the small differences that statistics are not catching

           What this represented is a shift in power, from the old baseball elite (those who played the game and received knowledge from those before them) to those who maybe never played, but were skilled in very different ways from the players

       It in some ways, it is the long struggle about who knows the game the best, the media or those playing the game.  Outsider vs. insider.

 

Some changes to baseball…

           Defensive Shift

       Some players (even previously good hitters) had a tendency to almost always hit the ball to the same side of the ball park

      So you put more players on that side

           Awards based on statistics

       In the 2014 AL Cy Young (best pitcher) race, Cleveland’s Corey Kluber beat Felix Hernandez because his FIP was better than Felix’s

      In other words, Kluber’s  slightly higher ERA was considered more impressive, because he did it in front Cleveland’s historically bad fielders.

       In 2013, AL MVP went to Triple Crown Winner Miguel Cabrera (who led in all traditional hitting stats) over Mike Trout (who led in WAR)

      In 2014, Pitcher Clayton Kershaw won the NL MVP over Stanton and McCutchen because he led in WAR

 

Moneyball (cont.)

            This has spread to other sports too, most of which have kept far, far, fewer statistics than baseball (including more fluid sports like football and ice hockey)

        The same debate over the nature of value of authority has occurred there as well, with predictable results.

 

Conclusions

            In baseball, there was kind of a very weak positivism, which was very persistent, but got recently  swept away by an improved positivism

        But SABREMETRICS biggest gains were over bad positivism, it is harder to find advantages now

       With STATCAST, they are seeking ever more diverse terrains upon which to run statistical analysis and improve player discipline/performance

            It shows that although institutions  and entrenched interests are powerful, they are not universally so

        Baseball was basically rethought by its own audience (or at least fantasy players); furthermore largely by fans who were never good at baseball

       This challenged the dominant image of masculinity at the center of sports culture (while perhaps replacing it with another masculinity  (cocky brainiac?) that will come to dominate)

        But powerful, well-capitalized institutions have amazing absorptive capacity; although slow to admit stats work, baseball did pivot very quickly and swallowed all those who once challenged it

       Such is capitalism, where it is hard to be truly rebellious, because there is money to be made off of rebellion, too.