Day 2 of NCAA Tournament Regionals: Predictions

kevinhaghi
May 21, 2022
4 min read

The first day of our model predictions are in the books! How exciting was that?! And if you were lucky enough to stay up and watch Arizona State's Kristiana Watson go yard for a go-ahead GRAND SLAM, the softball gods shined down upon you.

Evaluation of Day 1 model results

Now what kind of data scientist would we be if we didn't critically evaluate our previous model run. Here is an overview of our performance:

Explanation:

Correct/Incorrect gets a score of 1 based on the validation of the model.
Team 1/Team 2 actual is the actual resulting score
Team score 1/Team score 2 diff is the difference between the predicted and the actual value (predicted - actual).
Margin Diff is a measure of how the model predicted the margin of victory (Team 1 diff - team 2 diff).
The mean, median, and standard deviation are used to evaluate the bias and spread.

Some first impressions:

24-8 isn't so bad (75% of games correctly chosen)! Definitely room for improvement, but for a first attempt I would say I am excited to see what insights we are going to learn about the model.
The teams that the eye test told me should win (Kentucky, Stanford, Texas) did win. Something to look into as to why their opponent was so overvalued (as mentioned before, it seems the model has a hard time discerning quality of play from just results. Will talk about this more in future posts).
Upsets are real and they happen. Notre Dame getting spanked by McNeese was a surprise! I bet if these two match up again the results will be different. Can't wait to watch.
The bias is relatively low for this model, i.e. the median and mean are close enough to 0 that statistically its difficult to say that a bias exists (and appears to be somewhat normally distributed).
The spread is a little high, but this may imply that the features we are using are unable to make precise enough discernments about the score outcome. Also, score is a very challenging outcome to get right game to game.

Predictions for Day 2 of Regionals

We are going to look at a couple of things moving forward. First we are going to look at the new predictions given the outcome of yesterday. Later on during the week, we will revisit the model predictions as it predicts its way from start to finish without making replacements for errors. For now, here are the predictions for day 2:

Winners Bracket Game 1:

Note: Orange boxes indicate predictions the model got wrong. Wanting to keep track and see how much those errors propagate down or are eliminated.

First Impressions:

The model predicts that all of the mistakes which were made are going to lose this next game. Could be interesting if this transpires. I predict something like half of these teams will continue to surprise and move on.
Oklahoma has a 100% change to win?! That cannot be right. And in reality it is not, there is no way a team is undefeatable. But Oklahoma is such a stat producing machine that to the model, Oklahoma appears to be unable to lose this game. This is something we will definitely look to correct in the future.
Oregon-Arkansas I think is a more interesting matchup than what the stats may say. Melissa Lombardi has the post season experience and pedigree to have her team ready to upset the SEC champions.
Games to watch: Washington-Texas, Florida-Georgia Tech, Arizona-Missouri, Alabama-Stanford, Auburn-Clemson....wow I could continue to go on. There are so many good matchups day 2. Try to watch them all. But of the one's I mentioned, the favored team has only a 60% or Lower chance of winning. Could be lots of upsets.

Losers Bracket Game 1:

Note: Orange boxes indicate predictions the model got wrong. Wanting to keep track and see how much those errors propagate down or are eliminated.

First Impressions:

All of the games the model got wrong are unsurprisingly still bullish about them winning their elimination game. I tend to agree with a lot of those matchups, so will be interesting to see them pan ou. I expect all to win based on my eye test (South Dakota St-Villanova could be very interesting).
Nothing too surprising here. All of this aligns with what I would expect the outcomes to be as well.
Game scores are a lot closer.
Watch out for Campbell- Ohio St. and Canisius-Wisconsin. Could be some interesting matchups as my eye tells me these should be closer or the opposite of what the model is telling us.

Losers Bracket Game 2:

First Impressions:

Lost of previous matchups (South Dakota St.-Michigan, Wichita St.-Oregon, South Fla.- Miss. ST....)
South Dakota St. is still a darling of this model. After losing a close one to Michigan in the first round, this model still thinks they should've won.

Summary

I am pretty happy with the results so far, given that we are only using team stats to train and predict the game outcomes.
There is still a relatively a small amount of bias, but let's be careful with the small sample size.
The predictions look pretty good to the eye except for some teams that the model is bullish about. Will look once this season concludes as to how the model can be improved to include relative strength of stats. More about this in the future.

Alright, go watch the games. Go! So much wonderful softball to be had.