In a world where everything from placement in a Google search result to World Cup eligibility depends on ranking and numerical ratings of some kind, it is becoming increasingly important to analyze the algorithms and techniques that underlie such ranking methods in order to ensure fairness, eliminate bias, and tailor them to specific applications.
In a paper published this month in the SIAM Journal on Scientific Computing, authors Timothy Chartier, Erich Kreutzer, Amy Langville, and Kathryn Pedings mathematically analyze three commonly-used ranking methods. “We studied the sensitivity and stability of three popular ranking methods: PageRank, which is the method Google has used to rank web pages, and the Colley and Massey methods, which have been used by the Bowl Championship Series to rank U.S. college football teams,” explains Langville.
All three methods analyzed – the Colley and the Massey ranking techniques and the Markov web page rankings—which is a generalized version of PageRank—are linear algebra-based with simple elegant formulations. Here, the authors apply a modified version of PageRank to a sports season.
“Both web page authors and teams sometimes try to game, or spam, ranking systems to achieve a higher ranking. For instance, web page authors try to modify their incoming and outgoing links while teams try to run up the score against weak opponents,” says Langville, pointing out the significance of studying such methods. “Mathematically, such spamming can be viewed as changes to the input data required by the ranking method.”
Most methods, including the aforementioned three, produce “ratings” of numerical scores for each team, which represents their playing ability. When sorted, these ratings produce ranks with integer values for each team, simply representing a numerical listing of the teams based on their rating.
In the first step of their analysis, the authors assume a simple rating scheme with constant difference of 1 in scores and apply it to a perfect sports season. In a perfect season, each team plays every other team only once and there are no upset victories or losses. In such an ideal scenario, a highly-ranked team would always beat a lower-ranked team. Thus, in a system with teams numbered 1 through 4 for their ranks, team 1 would beat all other teams; team 2 would beat teams 3 and 4, and lose to 1; team 3 would beat team 4, losing to teams 1 and 2; and team 4 would lose to all other teams. They then compute the output rating for each of the three methods and compare them to the input rating.
The three methods are applied to this ideal data, and all three methods recover the input ranking. However, while the Colley and Massey methods produce ratings that are uniformly spaced as would be desirable in a rating system, the Markov method, produces non-uniformly spaced ratings.
The authors analyze the sensitivity of the methods to small perturbations and determine how much the rating and ranking is affected by these changes. If, for instance, small changes in input data cause large changes in the output ratings, the method is considered sensitive. Similar discrepancies in the input and output ranking data would show instability of the ranking method.
The authors conclude that while the Colley and Massey methods are insensitive to small changes, the Markov method (or Page Rank method) is highly sensitive to such changes, often resulting in anomalies in rankings. For instance, there are cases of a single upset in a perfect season resulting in rearrangements of rankings for all teams because of the Markov method’s high sensitivity. In these cases, the Colley and Massey methods would have an isolated response, resulting in changes to the rankings of only the two teams in question.
In addition, the sensitivity of the PageRank or Markov method gets more pronounced further down in the rankings. “The PageRank vector is quite sensitive to small changes in the input data. Further, this sensitivity increases as the rank position increases,” Langville explains. “In other words, values in the tail (low-ranked positions) of the PageRank vector are extremely sensitive, which calls into question PageRank’s use to produce a full ranking, as opposed to a simply top-k ranking. It also partially explains PageRank’s susceptibility to spam. On the other hand, the Colley and Massey methods are stable throughout the entire ranking.”
PageRank has recently evolved from being used exclusively for web pages to rank various entities, from species to social networks, reinforcing the ubiquity of these ranking systems.
But the stability displayed by the Colley and Massey methods in this study shows that these two methods would perhaps be effective even in ranking other entities, such as web pages and movies, though originally conceived for sports rankings.
“As future work, we are exploring the use of the Colley and Massey methods in other settings beyond sports. For example, we have found that these two methods are more appropriate than PageRank for ranking in social networks such as Twitter,” says Langville.
While ranking methods can be applied to a wide range of areas, modifications are often required in order to translate a particular method to suit a specific application, making analyses of sensitivity and stability that much more important.
Timothy P. Chartier, Erich Kreutzer, Amy N. Langville, and Kathryn E. Pedings
SIAM Journal on Scientific Computing, 33 (2011) pp 1077-1102
Online publish date: May 5, 2011