Adjusting the projections with a Bayesian prior

H-scoring as described by the papers is fully reliant on a single set of projections. If a drafter takes a player it projects to be a poor performer highly, the algorithm will not "doubt itself" and consider the possibility that its projections for that player are too low. It will assume that pick was a poor choice and the drafter who took it will have a bad team.

This inability to doubt itself makes the algorithm overconfident, believing that its own team is very strong, when its own projections are not necessarily better than those implicitly used by other drafters. As a practical matter this can lead the algorithm to think its team is so strong that the only way to improve is to "un-punt" categories it has given up on, which is probably a bad idea in practice.

The papers assume that player projections are all known and agreed upon by all the drafters, so they don't address this issue. However, it is so important in practice that I've added a module specifically to address it.

The adjustment

An adjustment is made to the algorithm's assessment of its team's strength for any pick after the first.

Say that \(w\) is a vector of the algorithm's naive guess at how likely it is to win each category, before performing gradient descent to optimize a future strategy. Corrected versions are calculated as

\[ w^* = \left[ I_{n \times n} + \frac{\beth}{ n^2}\mathbf{1}_{n \times n} \right]^{-1} \left[ w + \frac{\beth}{2n} \mathbf{1}_n \right ] \]

Where \(n\) is the number of categories and \(\beth\) is a parameter. The intuition on what this expression is doing is not immediately clear, but some intuition can be gleaned from the justification in the following section.

These corrected win rates are then used to reverse engineer an adjusted expectation of the team's current strength, like so:

\[ x^* = \text{CDF}^{-1} \left( w^* \right) \]

This way, as the punting strategy changes, the algorithm's opinion of its own team does not change. Re-adjusting the win rates every for every iteration of the algorithm based on the current expected win rates would implicitly change the algorithm's opinion of its pre-existing team based on its strategy for the future, which does not make much sense.

Justification

Say that we have prior expectations that

Our average win rate across all categories is approximately 50%, with Normally distributed error.
Our guesses for how often we will win a category are unbiased, but have some Normally distributed error.

This information provides a Bayesian framework for re-calculating adjusted category-level win rates.

By Bayes' rule, the probability of a certain set of category win rates being correct is proportional to its likelihood time the prior. In this case, the likelihood is

\[ \prod_c \phi (\frac{w^*_c - w_c}{\epsilon_a}) \]

And the prior probability is

\[ \phi \left( \frac{\frac{ \sum_c \left( w^*_c - \frac{1}{2} \right)}{n}}{\epsilon_b} \right) = \phi \left( \frac{ \sum_c \left( w^*_c - \frac{1}{2} \right)}{\epsilon_b n} \right) \]

Multiplying them together yeilds

\[ \left[ \prod_c \phi \left(\frac{w^*_c - w_c}{\epsilon_a} \right) \right] \left[ \phi \left(\frac{ \sum_c \left( w^*_c - \frac{1}{2} \right)}{\epsilon_b n } \right) \right] \]

We are only interested in what has the maximal likelihood, not what that likelihood is. So it is fine to convert this to log odds, which are

\[ \left[ \sum_c \left(\frac{w^*_c - w_c}{\epsilon_a} \right)^2 \right] + \left(\frac{ \sum_c \left( w^*_c - \frac{1}{2} \right)}{\epsilon_b n} \right)^2 \]

To optimize this, we set the derivative to zero. Applying the chain rule for category d-

\[ 0 = 2 \left(\frac{w^*_d - w_d}{\epsilon_a} \right) \frac{1}{\epsilon_a} + 2 \left(\frac{ \sum_c \left( w^*_c - \frac{1}{2} \right)}{\epsilon_b n} \right) \frac{1}{\epsilon_b n} \]

Isolating \(w^*_d\)-

\[ 2 \left(\frac{w_d}{\epsilon_a^2} \right) - 2 \left(\frac{ \sum_{c \neq d} \left( w^*_c \right) - \frac{n}{2}}{\epsilon_b n} \right) \frac{1}{\epsilon_b n}= 2 \left(\frac{w^*_d}{\epsilon_a} \right) \frac{1}{\epsilon_a} + 2 \frac{w^*_d}{\epsilon_b^2 n^2} \]

\[ 2 \left(\frac{w_d}{\epsilon_a^2} \right) - 2 \left(\frac{ \sum_{c \neq d} \left( w^*_c \right) - \frac{n}{2}}{\epsilon_b n} \right) \frac{1}{\epsilon_b n}= w^*_d \left( 2 \frac{1}{\epsilon_a^2} + 2 \frac{1}{\epsilon_b^2 n^2} \right) \]

So

\[ w^*_d = \frac{\frac{w_d}{\epsilon_a^2} - \left(\frac{ \sum_{c \neq d} \left( w^*_c \right) - \frac{n}{2}}{\epsilon_b^2 n^2} \right) }{\frac{1}{\epsilon_a^2} + \frac{1}{\epsilon_b^2 n^2}} \]

With \(\beth = \frac{\epsilon_a^2}{\epsilon_b^2}\), this is

\[ w^*_d = \frac{w_d - \beth \left(\frac{ \sum_{c \neq d} \left( w^*_c \right) - \frac{n}{2}}{ n^2} \right) }{1 + \frac{\beth}{ n^2}} \]

This expression is the best for gleaning intution behind the adjustment. When the average win rate is high, a larger quantity is subtracted out from all the win rates. If the win rates are all 50%, the numerator becomes \(\frac{1}{2} + \frac{\beth}{2n}\), cancelling with the denominator and keeping win rates constant. Higher values of \(\beth\) increase the importance of the distortion term and decrease the importance of the original win rate.

While being relatively interpretable, this expression unfortunately cannot be used directly because all of the \(w^*_c\) values are unknowns. Some linear algebra is required with the vector forms of \(w\) and \(w^*\).

With \(J\) as matrix with \(0\) on all diagonals and \(1\) on all non-diagonals, the equation can be written

\[ w^* = \frac{w - \frac{\beth J_{n \times n} w^*}{n^2} + \frac{\beth}{2n}\mathbf{1}_n }{\left( 1 + \frac{\beth}{ n^2} \right)} \]

Or

\[ \left( 1 + \frac{\beth}{ n^2}\right) I_{n \times n} w^* = w - \frac{\beth J_{n \times n} w^*}{n^2} + \frac{\beth}{2n} \mathbf{1}_n \]

Isolating \(w^*\) yields

\[ \left[ \left( 1 + \frac{\beth}{ n^2}\right) I_{n \times n} + \frac{\beth}{ n^2} J_{n \times n} \right] w^* = w + \frac{\beth}{2n} \mathbf{1}_n \]

The \(J\) can be simplified out

\[ \left[ I_{n \times n} + \frac{\beth}{ n^2}\mathbf{1}_{n \times n} \right] w^* = w + \frac{\beth}{2n} \mathbf{1}_n \]

Finally, the matrix can be inverted to yield an expression for \(w^*\)

\[ w^* = \left[ I_{n \times n} + \frac{\beth}{ n^2}\mathbf{1}_{n \times n} \right]^{-1} \left[ w + \frac{\beth}{2n} \mathbf{1}_n \right ] \]