Usage
Installation
To use pairwise-ranking, first install it using pip:
$ pip install pairwise-ranking
Loading data
Match data may be imported from a variety of formats:
.gml files, adjacency matrices, and lists of matches.
The function ranking.read_match_list() attempts to import the data in these formats.
Examples of data sets are given in the \data folder, and are cited in Data sets.
- ranking.read_match_list()
Read list of matches from a file, attempting to detect the appropriate file format among gml, match_list, or adjacency_matrix formats.
For a specific file format, the more specific functions can be used:
- ranking.read_match_list_from_match_list()
Read list of matches from a file where each line represents a match in the format “winner loser”.
- Parameters:
filename (str) – Input filename
- Raises:
AssertionError – If a line is not in “winner loser” format. Labels should not themselves include spaces.
- ranking.read_match_list_from_gml()
Read list of matches as the edges in a gml network file.
- ranking.read_match_list_from_adj_matrix()
Read list of matches from an adjacency matrix which counts the number of times each player beats another. In lieu of labels, assign in the form player_i.
Inference
For the models implemented in this package, described in models, point estimates of the strength scores can be found with the function ranking.scores().
- ranking.scores()
Get fitted scores of players in a given model.
- Parameters:
match_list (list) – List of matches, each represented by a dict of the winner and loser.
model_name (str, optional) – Model used for fitting. Defaults to ‘depth_and_luck’. Options: {‘depth_and_luck’, ‘depth_only’, ‘luck_only’, ‘logistic_prior’}.
num_samples (int, optional) – Number of samples used per chain for MCMC sampling, defaults to 5000
num_chains (int, optional) – Number of chains used for MCMC sampling, defaults to 4
force_mode (str or None, optional) – Optionally force the point estimate mode to be either ‘average’ or ‘MAP’, otherwise defaults to ‘MAP’ when more than 1000 players are present.
- Returns:
pandas DataFrame with columns of the labels, inferred scores, and score errors for each player.
- Return type:
DataFrame
Listed in decreasing order of the score estimates, the rankings from a match_list may be found:
- ranking.ranks()
Find the rankings of the players according to a specified model.
- Parameters:
match_list (list) – List of matches, each represented by a dict of the winner and loser.
model_name (str, optional) – Model used for fitting. Defaults to ‘depth_and_luck’. Options: {‘depth_and_luck’, ‘depth_only’, ‘luck_only’, ‘logistic_prior’}.
num_samples (int, optional) – Number of samples used per chain for MCMC sampling, defaults to 5000
num_chains (int, optional) – Number of chains used for MCMC sampling, defaults to 4
force_mode (str or None, optional) – Optionally force the point estimate mode to be either ‘average’ or ‘MAP’, otherwise defaults to ‘MAP’ when more than 1000 players are present.
- Returns:
Ranked list of the players in descending order of strength.
- Return type:
We can also infer the probability that an outcome between two players might occur:
- ranking.probability()
Inferred probability that one player will beat another, according to match_list data.
- Parameters:
match_list (list) – List of matches, each represented by a dict of the winner and loser.
winner_label (str) – Label of the desired winner
loser_label (str) – Label of the desired loser
model_name (str, optional) – Model used for fitting. Defaults to ‘depth_and_luck’. Options: {‘depth_and_luck’, ‘depth_only’, ‘luck_only’, ‘logistic_prior’}.
num_samples (int, optional) – Number of samples used per chain for MCMC sampling, defaults to 5000
num_chains (int, optional) – Number of chains used for MCMC sampling, defaults to 4
force_mode (str or None, optional) – Optionally force the point estimate mode to be either ‘average’ or ‘MAP’, otherwise defaults to ‘MAP’ when more than 1000 players are present.
- Raises:
AssertionError – If winner_label or loser_label is not present in match_list.
- Returns:
Tuple of the inferred probability and the error in the estimation.
- Return type:
Sampling
We implement a wrapper for Hamiltonian Monte Carlo (HMC) sampling via pystan for the models considered in this package:
- ranking.samples()
Get MCMC samples from the model fit to a match_list.
- Parameters:
match_list (list) – List of matches, each represented by a dict of the winner and loser.
model_name (str, optional) – Model used for fitting. Defaults to ‘depth_and_luck’. Options: {‘depth_and_luck’, ‘depth_only’, ‘luck_only’, ‘logistic_prior’}.
num_samples (int, optional) – Number of samples used per chain for MCMC sampling, defaults to 10000
num_chains (int, optional) – Number of chains used for MCMC sampling, defaults to 4
- Returns:
pandas DataFrame containing sampled draws of scores and relevant parameters.
- Return type:
DataFrame
These samples may also be used to visualize the posterior distribution of the depth and luck in the full model using matplotlib.pyplot:
- ranking.draw_depth_and_luck_posterior()
Draw the posterior distribution of the luck and depth parameters from sampled values of the depth_and_luck model.