Predicting who will win the annual dance that is the NCAA basketball tournament is one I do annually and usually fail miserably. That is because most of my predictions are based on hunches, seedings and want of a Cinderella story. I'm a casual fan who enjoys the sport, but not enthralled in such a way that I get more than an average score on ESPN's tournament challenge.

This year, I'm going to look at it purely by the numbers. Where this might seem to be a fairly obvious thing to do, I think it may throw up some surprises and I'm keen to say how it compares to my normal 'hunch' backing.

The stats

Choosing which stats to use in a sport that is full of them is potentially a minefield and I could got bogged down in hundreds of tiny nuances that often decide winners and losers. Instead I'm going to use 4 main stats that are key to winning games outlined in Dean Oliver's book, 'Basketball on Paper'.

Using the stats  and forumula outlined in this article these roughly require:

  • Field Goals attempted and made
  • Turnovers
  • Free throw attempts and made
  • Offensive and Defensive rebounds

Both offensive and defensively for each stat.

The method

Grabbing these stats from NCAA official stats pages is fairly straightforward for the offesnive stats. However getting the free throw attempts faced is not available so I've roughly approximated this as number of fouls.

Rebounds is also hard to get defensively i.e. the number of D-rebounds and O-rebounds a particular team has faced is not available.

When determining a match up, the following formula is used given two teams playing each other (A and B). The score for team A is defined as:

adjusted_fg = (shooting_score(A) + shooting_score_defense(B)) / 2
Shooting score is defined as in the article mentioned above

adjusted_turnovers = (turnover_score(A) + turnover_score(B)) / 2
Turnover score is defined in the article above

rebounds = O rebounds (A) / (O rebounds(A) + D rebounds(B))

adjusted_free_throws = (ft_score(A) + ft_score_defense(B)) / 2
Where ft_Score is defined in the article, but when calculating defensive FTs to use fouls committed x 0.69 (average number of FTs made in NCAA)

Then weighting thus:

(adjusted_fg * 40) + (adjusted_turnovers * 25) + (rebounds * 20) + ( adjusted_free_throws * 15)

The result

The first pass was interesting and had Michigan State winning the whole thing but had UC Irvine making the final 4. It also had Michigan(2) and Virginia(1) leaving in the first round, so I had a play with the weightings a little bit. Nothing helped, so I added in a weighting for strength of schedule. This made a tonne of sense because the college system means that teams can get really good stats without playing a decent team. The new formula is now:

((adjusted_fg * 40) + (adjusted_turnovers * 25) + (rebounds * 20) + ( adjusted_free_throws * 15)) x (sos)

This seems much better, so the final prediction is Duke (unsurprisingly):