In 2019, we developed a model by building a Z statistic using Batting Average, Earned Runs Allowed and Fielding Percentage. The power rankings presented below for the 2019 NCAA tournament teams.
The model performed well with determining:
1) The top 2 teams (Oklahoma and UCLA) did make the Finals.
2) All teams that made the WCWS finished in the top half of the Power Rankings
The model struggled to capture:
1) The strength of UCLA by underestimating or
2) The strength of Oklahoma by overestimating or
3) Both
So while from a macro scale it seems the model did well, individual game outcomes seemed a weakness. This is ok, as we are developing a game model specifically for this reason. But also, the assumptions that went into this model 1.0 were not intended to determine game to game success.
There is still quite a good amount of information that can be captured in a single statistic. And if used for gauging the relative completeness of a team, it seems quite appropriate. We will most likely improve upon this power ranking and provide it in our dashboard. So for the purpose of learning in public, the following is a blog about how Model 1.0 (colloquially the power rankings model) was developed.
It serves two purposes:
1) Continuity to the readers in how the evolution of our models occurred.
2) The foundation for a future power rankings model that is more sophisticated than the first.
How was the power rankings developed?
The data used in the model consisted of the final statistics from the entire D1 field of teams. The 3 statistics used were Batting Average, Earned Run Average, and Fielding Percentage. Choosing these 3 statistics was not performed using EDA. Instead it came from our own personal experience with the sport and their content creators. Whether it is the best model is moot, and to its credit, the model does encompass a measure for offense, defense, and pitching-the 3 phases of the game. They are also the statistics that are the most interpretable and therefore easier to drive a storyline intended for a broad audience. Below is a description of how we took the data and pulled out the power rankings.
The first assumption we made about the data is that the relative strength of a team can be well represented by the 3 statistics mentioned above. The next assumption we made is that each statistic can be equally weighted the same, that is if power rankings is:
Then we have:
Team Normalized Averages = a*BA+ a*ERA + a*Fielding
where a = 1/3.
Step 1: Normalize each statistic by the range
In order to combine these statistics with different units, we normalize them. Take for example this made up, but instructive data set below.
In the table on the left, we have the base statistics for each measure. On the right, we have the normalized value for those statistics. The normalization is performed according to the equations that sit below those tables, where the range represents the span of all values across that statistic. Notice that ERA is different from BA and Fielding. The best ERA is a value of 0. For that reason, we subtract the team ERA from the MAX ERA and divide the range. Doing so, this achieves us a normalized scale.
Step 2: Find the average of the 3 normalized values
Once the normalized values have been achieved we find the average:
Step 3: Find the average team normalized average and standard deviation
Find the average of the normalized average? Sounds confusing. What we are doing is creating a Z statistic that we call the Power Ranking:
Interpreting the model
The power rankings for our made up table of data above is as follows:
You'll have to forgive the Oklahoma bias (Alma Mater). The model has positive and negative values. The positive values require a team that is above the average statistical team, whereas the negative is the opposite. Oklahoma is nearly 2 standard deviations better than the average team in this field, where Michigan is almost a standard deviation worse.
So going back to the graph of D1 Power Rankings for the field of 64 during the 2019 softball NCAA Tournament, Oklahoma was a full standard deviation better than UCLA, Northwestern falls near the average, and St Francis (PA) was the statistically weakest team in the field.
On the Horizon-Improvements to the Model
Some suggested improvements we have been contemplating include:
1) Using Slugging Percentage instead of BA
2) Using Strikeout-to-Walk Ratio over ERA
3) Changing the normalization process (e.g. using the median instead of the max/min)
1 and 2 deal directly with the results from our feature engineering of model 2.0, where we discovered that Slugging Percentage and Strike0ut-to-Walk ratio has a higher correlation with RPI rankings thank Batting Average and ERA. For 3, we consider using the median as a way to desensitize the model to statistical outliers (Oklahoma) and possibly improve the matchup comparison.
More to come soon!
Comments