Measuring and Predicting Value
Big Theory Vocabulary Word: Value
•
In political/moral economy, value is the idea that to an
individual or society, something is worth more (or perhaps less) than its
price.
–
This was at the core of Marx’s theory: he separated use value
(the utility of an item) from “value” (which Marx defined as socially
necessary labor time, or how much labor was required to make it), as well
as from exchange value (how much labor time the item can fetch on the
market (basically price configured in denomination of time instead of money))
•
Oddly, aligned against Marx are postmodernists like Jean
Baudrillard who argue that there is no “essence” of use value that exists
without circulation, as well as market fundamentalists who believe price is the
only efficient measure of value.
–
Value is a word in the vernacular too, of course – people enjoy
paying less than what it is worth to them (or the normal “price”) or getting
the most for their money. Also, some things are considered to have so much
value, they are “priceless”
Positivism: Measuring and
Forecasting
•
Remember: Positivism is the stance that theory comes after
collecting the data, and that the data should be as objective as possible
•
The easiest way to compare data “objectively” is to assign them
numeric value
–
Size, weight, speed – things that can be easily counted are the
most common measurements developed (often whether they are significant or not).
–
Beyond that, subjectivity often begins to creep in (purity, color,
etc…) both in how the data is tallied and how the measure is codified
•
Grades are a good example of this: you can develop “universal”
assessments like multiple choice, but you have to weight them vs. other
assessments; recognize some people are just better at writing their own
answers; speak English as a second language, etc…
– But
ultimately, grades are a way of ranking the skill of students, in terms of
grade point average (certain levels of which are required to graduate, to be cum
laude, magna cum laude, summa cum laude, etc…)
» Also,
arbitrary thresholds from A to B to C to D to F
–
Good positivism should question the numbers themselves, and work
to develop more significant (and predictive) measures
With that in mind…
•
No sport is so associated with the keeping of numbers as is
baseball
–
Thus, if one were to attempt positivism to sports, the most
obvious sport to start with is baseball.
•
Note: I will throw out a lot of names of baseball statistics –
unless they are on the study guide, you don’t have to know them.
Henry Chadwick – Inventor of
Baseball Statistics
•
Born in the UK in 1824 (half-brother was a famous advocate for
urban sanitation; father was a writer)
–
He moved to US as teenager; as young adult started playing various
bat and ball games
•
Becomes one of first reporters assigned to cover baseball
•
Besides making rules and writing several important books about
baseball, he most known as a statistical innovator
–
He is the inventor of the box score, a numerical/abstracted
representation of what happened in a game
–
Besides advocating for counting of hits, strike outs, player runs
scored, he also popularized two of the basic rate statistics which normalized
performance regardless of number/length of appearances in games – thus allowed
comparison of players.
•
ERA – Earned Run Average (Runs allowed*9/innings pitched)
•
Batting Average – Hits/Plate Appearances
Box Score
Other early statistics
•
Walk (4 balls not in the strike zone, also not swung at) –
Chadwick believed a walk should be counted as a pitcher error, not believing it
was a batter skill.
–
It is a both a pitcher and batter skill
•
RBI (Runs Batted In) – Runs scored by players were counted since
beginning of baseball stats; little credit was being given to the person who
hit the runner in.
–
But of course, to get RBIs, you had to bat in the right part of
the lineup on a team with other good hitters.
•
So once people decided RBIs were important, it devalued players
with unfavorable lineup positioning.
Other early statistics (cont.)
•
Save – Invented by a Sporting News writer in 1960, as a way to try
to recognize strong performance by relief pitchers (because they rarely won
games and pitched fewer innings)
–
But it is a very specific statistic, defined as (from Wikipedia):
1. He is the finishing pitcher in a game won by his team
2. He is not the winning pitcher
3. He is credited with at least ⅓ of an inning pitched;
4. He satisfies one of the following conditions:
– He enters
the game with a lead of no more than three runs and pitches for at
least one inning
– He enters
the game, regardless of the count, with the potential tying run
either on base, at bat or on deck
– He pitches
for at least three innings.
–
Problem is that teams stopped using their best reliever in any
non-save situation, in order to pad that stat, they also stopped their best
reliever from pitching more than one inning
•
According to one reporter, the save is “the only example in sports
of a statistic creating a job.“
– In this
case, that job is the “closer”, who gets paid more than all other relief
pitchers
– Some
pitchers say that it helps them mentally prepare to know when they come in
» Now, many
statisticians say it is a poor use of the best relief pitcher, who should be
used in the highest leverage situation (which might not be the ninth inning)
Bill James
•
Raised by a widowed father who was janitor, nearly graduated from
University of Kansas with dual degrees in English and Economics; joined army a
few credits short of graduation in 1971 (later awarded the needed credits)
•
After leaving the army, began writing statistically driven
baseball articles in the late 1970s, while working as a security guard and
several other jobs
–
He had a hard time getting magazines and newspapers to publish his
non-narrative-driven articles, so began self publishing a gigantic baseball
book, Bill James Baseball Abstract
•
He began to find success in 1981 after he was profiled in Sports
Illustrated by Daniel Okrent, who invented “Rotisserie” Fantasy Baseball (and
thus was interested in better ways to predict player performance)
–
His prominence rose to the point that he was a consultant on the
Red Sox World Series teams.
•
Also was involved in what has become the leading “analytics”
company, STATS
Bill James (cont.)
•
As you could tell by the reading, he was frustrated that despite
being saturated with numbers, most baseball statistical analysis were really
bad
–
Lots of “thresholds” where 20 of something was great, even though
it is not statistically that different from 15 of something
–
Believed that “luck” (or basic random probability) was a factor
in baseball (since it is a factor in all other known data sets)
–
Also believed that the ballpark the game was played in greatly
impacted outcomes
•
Everyone knows this now that Coors Field in Denver exists where
batting stats are crazy, stupid high thanks to combination of altitude and low
humidity.
–
Thought the major categories (Runs, Hits, RBIs, HRs) overshadowed
other aspects of play, like defense, extra base hits, and walks.
•
Wanted to come up with a way to compare the complete value of
players in different positions, but also in different eras
•
While most of his stats are not currently famous (“Game Score” for
pitchers has been getting a lot of play recently), he inspired the next
generation of baseball stats people
–
IMHO – only a few of his stats are truly complex, most are simple
rate stats (but on items that no one was paying attention to).
•
Which shows is method is more about thinking differently than
fancy math.
Baseball Statistics from Below
•
Built using the project-team based model that drove of much of the
1990s tech revolution, while making use of the publicly accessible platform
provided by the internet, those who were inspired by James started websites.
–
They developed a new group of statistics, called SABREmetrics
(Society of American Baseball Research)
–
The first major site fan statistical analysis site was Baseball
Prospectus, and now also includes Baseball Reference and Fangraphs
•
As much of the audience for this was fantasy baseball players:
they wanted to be able to project future value (or reveal the true value behind
statistics). Some of these new stats included:
–
OPS (On-Base % Plus Slugging %) – Basically adds how frequently
hitters get a hit or walk, plus the rate of bases gained per at bat (ie a
double gains 2, home run 4)
–
BABIP (Batting average on balls in play) -- While it has a number
of purposes for batters (shows good contact, speed to first, trouble hitting
into shift), it also is a good measure of luck (thus proving James right).
•
For example, if an unusually high or low BABIP occurs for a batter
vs. career norms, either something changed or they are super lucky/unlucky
•
Can use it for pitchers, or the alternative measure (XFIP), which
unlike ERA, controls for fielding quality (which pitcher does not control)
Statistics from Below (cont.)
•
WAR (Wins Above Replacement) – A value above mean statistic,
which takes into account batting, pitching, and defense. Basically tries to
quantify how many wins a player created above (or below) a league-average
player
•
There are also various projection systems that take into account
age, previous year’s statistics (including minor league and foreign league
performance), regression to the mean, position, place in the batting order and
sometimes comparable players, which try to guess how a player will do next
year.
–
Some of these include: Nate Silver/Baseball Prospectus’s PECOTA
(Player Empirical Comparison and Optimization Test Algorithm) and Fangraph’s
ZiPS, STEAMER (developed by two high-schoolers and their stats teacher), and
Fans (which is letting anyone fill out a survey, and rely on the “wisdom of
crowds”)
•
Crowds tend to be optimistic on stats, but have a good sense of
playing time and unusual career path players (like batters converted to
pitchers)
– Thus, as
Bill James says, stats are not everything, and some things can be sensed more
than measured.
Statistics from Below (cont.)
•
Responding to the success of early technologies (like Pitch F/X,
which shows exact trajectory/velocity of pitches) among the SABRE community and
fantasy players, as well as the demands of teams (more on this in a minute), baseball
introduced STATCAST
–
Basically, it will make some improvement on batted ball data, but
will take massive defensive data for the first time
–
All baseball statistics compiled before 2006 could probably be
stored on one thumb drive; STATCAST will require a server warehouse.
Moneyball
•
Despite Bill James’s writing being well known since the early
1980s, (and the statistics he used to make his statistics were actually
collected by baseball), none of his ideas were implemented until Billy Beane,
manager of the low payroll Oakland Athletics, began doing so in the late 1990s
with the help of Paul DePodesta.
–
In the language of economics, Beane thought baseball was an
inefficient market.
•
He had to figure out how to generate more value for less money
– Basically,
his choice would be to value attributes that still contributed to winning games
that others did not pay for (like walks) or embrace those players with red
flags that did not matter (like body type).
– As the
review said, his team won 100 games in back to back seasons, amazing
considering the A’s lost their three best batters after the first season
(although they still had great pitchers).
» The second
season actually featured the longest team winning streak in baseball history.
Moneyball (cont.)
•
Although the A’s never won the World Series, it basically made
most other teams take seriously the idea of advanced statistics to help spend
money more wisely.
–
Most teams now have a stats department (including many of the
people who started Baseball Prospectus), so it is harder to create the big
advantages Beane enjoyed in the late 1990s/early 2000s
•
Every once in a while, he finds one: like a few years ago, saw
that most successful pitchers threw low. So all his pitchers were lowball
pitchers, and his hitters were particularly strong lowball hitters (to
neutralize the other teams’ best pitchers). That bought him a year of
relevance.
– He also
makes trade gambles that don’t pay off, so he is not super human.
•
Tampa Rays were also good at this (they have lowest major league
attendance, thus need to maximize every dollar of salary)
–
In fact, when everyone uses statistics (and you can assume most
teams have the mostly assumptions as you), human scouting becomes crucial to
capture the small differences that statistics are not catching
•
What this represented is a shift in power, from the old baseball elite
(those who played the game and received knowledge from those before them) to
those who maybe never played, but were skilled in very different ways from the
players
–
It in some ways, it is the long struggle about who knows the game
the best, the media or those playing the game. Outsider vs. insider.
Some changes to baseball…
•
Defensive Shift
–
Some players (even previously good hitters) had a tendency to
almost always hit the ball to the same side of the ball park
•
So you put more players on that side
•
Awards based on statistics
–
In the 2014 AL Cy Young (best pitcher) race, Cleveland’s Corey
Kluber beat Felix Hernandez because his FIP was better than Felix’s
•
In other words, Kluber’s slightly higher ERA was considered more
impressive, because he did it in front Cleveland’s historically bad fielders.
–
In 2013, AL MVP went to Triple Crown Winner Miguel Cabrera (who
led in all traditional hitting stats) over Mike Trout (who led in WAR)
•
In 2014, Pitcher Clayton Kershaw won the NL MVP over Stanton and
McCutchen because he led in WAR
Moneyball (cont.)
•
This has spread to other sports too, most of which have
kept far, far, fewer statistics than baseball (including more fluid sports like
football and ice hockey)
–
The same debate over the nature of value of authority has occurred
there as well, with predictable results.
Conclusions
•
In baseball, there was kind of a very weak positivism, which was
very persistent, but got recently swept away by an improved positivism
–
But SABREMETRICS biggest gains were over bad positivism, it is
harder to find advantages now
•
With STATCAST, they are seeking ever more diverse terrains upon
which to run statistical analysis and improve player discipline/performance
•
It shows that although institutions and entrenched interests are
powerful, they are not universally so
–
Baseball was basically rethought by its own audience (or at least
fantasy players); furthermore largely by fans who were never good at baseball
•
This challenged the dominant image of masculinity at the center of
sports culture (while perhaps replacing it with another masculinity (cocky
brainiac?) that will come to dominate)
–
But powerful, well-capitalized institutions have amazing
absorptive capacity; although slow to admit stats work, baseball did pivot very
quickly and swallowed all those who once challenged it
•
Such is capitalism, where it is hard to be truly rebellious,
because there is money to be made off of rebellion, too.