The Math Behind Your Rankings: How Tchoozit's Elo System Works
A ranking system doesn't just count wins. Behind every duel, there are probabilities, adaptive coefficients, and a dual personal and community ranking. Here's how it actually works, with the real formulas.
You click a game in a duel. The winner goes up, the loser goes down. Simple, right?
On the surface, yes. But under the hood, every click triggers a calculation that factors in the skill gap between the two games, their respective experience, the criterion you're comparing them on, and even the maturity of your own pool. The system that does all of this is called Elo, and it has a fascinating history.
Where does the Elo system come from?
In 1960, a Hungarian-born physicist named Arpad Elo proposed a rating system for chess players. His idea started from a simple observation: existing rankings counted wins and losses equally, without considering who you'd beaten. Beating the world champion and beating a beginner counted the same.
Elo proposed something smarter: a system where every player has a numerical score, and the value of a win depends on the score gap between the two players. Beating someone much stronger than you earns a lot. Beating someone much weaker earns almost nothing. And losing to someone weaker costs you dearly.
The International Chess Federation (FIDE) adopted the system in 1970, and it has since become the standard for competitive rankings. You'll find it today in chess obviously, but also in football, table tennis, competitive video games like League of Legends, and even in recommendation systems like the one suggesting movies on Netflix.
The core formula
The heart of the system fits in two formulas. The first calculates the expected score, which is the probability that game A beats game B:
E(A) = 1 / (1 + 10^((Elo_B - Elo_A) / 400))
It's a logistic curve. If A and B have the same Elo, the expected score is 0.5 (50/50). If A has 200 more points than B, its expected score is roughly 0.76 (76% chance of winning). At a 400-point gap, it's 0.91. The gap is exponential, not linear.
The second formula updates the Elo after the duel:
Elo_new = Elo_old + K × (result - expected)
The result is 1 for a win, 0 for a loss. The K-factor controls how much the rating moves. A high K means each duel has a big impact. A low K means the ranking is stable and takes many duels to shift.
It's elegant because everything is symmetric. If an upset happens (a weak game beats a strong one), the movement is naturally amplified by the difference between the result (1) and the expected score (low). The system self-corrects.
How Tchoozit adapts the system
The Elo system was designed to rank chess players on a single axis: their playing strength. Tchoozit does something different. We don't rank players, we rank games, and we do it across 11 emotional criteria.
That fundamentally changes the problem. The same game can be first in narrative and tenth in gameplay. The Witcher 3 might dominate in immersion, but Mario Kart crushes it in couch co-op fun. Each criterion has its own independent Elo universe, completely separate from the others.
Dual ranking: personal and community
Every duel updates two rankings in parallel:
-
Your personal ranking: how you rank your games, based solely on your own votes. It's your opinion, quantified.
-
The community ranking: the synthesis of all votes from all users. It's the collective consensus.
Both use the same Elo formula, but independently. Your personal vote updates your personal Elo, and also contributes to the community ranking pool. You can then compare the two to see where you diverge from the crowd.
Adaptive K-factor (11 levels)
In chess, FIDE uses three K-factor levels: K=40 for new players, K=20 for players below 2400, and K=10 for players above 2400.
At Tchoozit, we use 11 levels, calibrated on the number of duels a game has played on a given criterion:
| Duels played | K-factor | Behavior | |---|---|---| | 0 to 1 | 96 | Discovery: each duel has a massive impact | | 2 to 3 | 80 | The game starts finding its zone | | 4 to 6 | 64 | Rapid adjustment | | 7 to 10 | 52 | Convergence | | 11 to 15 | 44 | The ranking stabilizes | | 16 to 25 | 36 | Moderate corrections | | 26 to 40 | 28 | Position established | | 41 to 60 | 22 | Fine-grained movements | | 61 to 100 | 16 | Nearly stable | | 101 to 200 | 12 | Very stable | | 200+ | 10 | Anchored |
Why 11 levels instead of 3? Because the transition from "new" to "established" isn't binary. A game with 5 duels isn't in the same situation as a game with 50. The granularity allows for progressive convergence: the first duels matter enormously (K=96, nearly 10 times the final K), then the impact gradually decreases until the ranking stabilizes.
This is crucial for user experience. When you add a new game to your pool, you want it to find its place quickly. Not after 200 duels, but after 10 or 15. The high K-factor at the start makes that possible.
The "young pool" multiplier
There's one last adjustment that only exists in the personal ranking: the young pool multiplier.
The idea: when a user has just started and only has a few votes, each vote should count more to accelerate ranking convergence. The formula is straightforward:
multiplier = 1 + max(0, 1 - ratio / 5) × (2 - 1)
Where ratio = number of votes / pool size. When the ratio is at zero (you just started), the multiplier is at 2x. When you reach an average of 5 votes per game, the multiplier drops back to 1x and has no further effect.
This boost doesn't apply to the community ranking, only to the personal one. It lets your personal ranking build up quickly without skewing the global ranking.
Matchmaking: not just any duels
A good Elo system isn't enough if the duels are poorly chosen. Systematically pitting the top-ranked game against the bottom one doesn't produce interesting duels: the outcome is a foregone conclusion, and the Elo movement is negligible in both directions.
This is exactly the problem that Elo-based matchmaking solves in competitive games: you try to match opponents of comparable level, because that's where duels are most informative.
Tchoozit uses a similar principle. The system favors pairs of games whose Elo scores are close on the criterion of the current duel. A duel between two games at 1550 and 1580 is far more useful than a duel between a game at 1800 and one at 1200, because the former has a real chance of shifting the ranking either way.
This doesn't mean unbalanced duels never happen. The system allows them, with a lower probability. This ensures that even games at the bottom of the ranking eventually face those at the top, which matters for long-term accuracy.
Reliability: when the ranking becomes meaningful
An Elo ranking based on 3 duels isn't reliable. A ranking based on 50 duels per criterion starts telling a real story.
Tchoozit uses a reliability index on a scale of 0 to 9, based on the average number of duels per criterion. This determines, for instance, whether a game is eligible for comparison pages, or whether its ranking is displayed with a confidence indicator.
Reliability isn't a magic formula. It's just an honest reminder: a ranking is only as good as the data feeding it.
Why it works
The Elo system works because it encodes a fundamental truth about human preferences: they are relative, not absolute. You don't know whether Outer Wilds is a 9.2 or an 8.7. But you know you prefer it over Firewatch for exploration, and that it's the other way around for narrative.
By accumulating these micro-decisions across 11 different axes, the system builds a nuanced portrait of your taste. Not a score, not a tier, but a profile. And that's exactly what traditional ranking systems can't do.
Curious to see the math in action? Your first duels are waiting on Tchoozit. Every click feeds the system, and your ranking builds in real time.