February 22, 2026

Why Tier Lists Are Broken (And What to Use Instead)

Tier lists are fun, but they're fundamentally broken as ranking tools. Here's why categories fail, what science says works better, and how pairwise comparison could change how we rank games.

tier listgame rankingpairwise comparisonElo ratinggaming culture

There's a moment (and if you've ever spent more than fifteen minutes on TierMaker, you know exactly which one I'm talking about) where you're staring at your screen, three games already slotted into S tier, and you suddenly realize you have absolutely no idea what you're doing anymore. Is Hollow Knight S tier because it's genuinely one of the greatest games ever made, or because you played it during a particularly emotional summer and now your brain can't separate the game from the feeling? And more importantly: does the distinction even matter?

I've been thinking about this for a while now. Not in a "shower thought" kind of way, no. More like in the obsessive, slightly unhinged way where you start questioning the fundamental architecture of how we rank the things we love. Because here's the thing about tier lists: they're everywhere, they're fun, they generate absolutely incredible arguments in comment sections, and they are, at their core, a beautifully broken tool that we've all collectively agreed to pretend we understand.

Let me explain.

Where tier lists come from (and why it matters)

Tier lists weren't born as a meme format. They emerged from the fighting game community, specifically from the competitive Tekken and Street Fighter scenes in the early 2000s, as a genuinely useful analytical tool. The idea was simple: rank every character in a fighting game based on their competitive viability, assuming equally skilled players. S tier meant dominant. D tier meant you were making your life unnecessarily hard.

And in that specific context? Tier lists made total sense. A fighting game has a finite cast, measurable frame data, documented matchups, and thousands of tournament results to draw from. When a community of top-level players converges on a tier list after months of competition, that list actually means something. It reflects real, observable performance data.

But then something happened. Around 2018-2019, TierMaker turned the concept into a universal template. Suddenly, everyone was ranking everything: cereal brands, dog breeds, countries, every Pokémon ever made. YouTube literally exploded with tier list content; according to YouTube's own blog, videos with "tier list" in the title generated over a billion views in just the first half of 2025. The format had officially transcended its origins. And in doing so, it had quietly abandoned everything that made it work in the first place.

As the FGC itself would tell you, modern tier lists are often just "feelycraft bullshit" (to quote a ResetEra thread) compared to the carefully studied matchup-based rankings they evolved from.

The five ways tier lists fail you

1. Categories destroy information

The most fundamental problem with tier lists is that they replace a continuous spectrum with a handful of buckets. When you put both The Witcher 3 and Red Dead Redemption 2 in S tier, you're saying they're equivalent. But are they really? Maybe one of them is your desert-island game and the other is simply excellent. A tier list can't express that difference. It literally doesn't have the resolution for it.

This is what mathematicians call an "information loss problem." You're taking a rich, nuanced opinion and compressing it into five or six categories. It's a bit like describing a sunset using only the words "good," "meh," and "bad." Technically possible. Deeply unsatisfying.

2. You can't rank 200 things in your head

Try this exercise: rank your top 50 games of all time in order. Not into tiers, but in order, from 1 to 50. You'll notice something interesting around position 15: your brain starts to genuinely struggle. Is Celeste better than Outer Wilds? What does "better" even mean when you're comparing two radically different experiences?

This isn't a personal failing. It's a well-documented cognitive limitation. Research in psychometrics has consistently shown that humans are remarkably bad at making absolute judgments across large sets. We can say "this is amazing" or "this is garbage," but the middle of the ranking turns into an agonizing soup of "uh... B tier? Or maybe A? No, wait..."

As ranking researchers Langville and Meyer put it quite directly: humans struggle to rank any set of items greater than about five. Yet tier lists routinely ask us to rank dozens or even hundreds of items at once, which is basically asking your brain to do something it was never designed to do.

3. The S-tier inflation problem

Every tier list ever made has an S tier that's too big. It's a universal law, right up there with gravity and the fact that someone will always pick Oddjob in GoldenEye.

The reason is psychological: we don't want to rank things we love as "merely" A tier, because A tier implies they're not the best. So everything meaningful gets shoved into S, which completely defeats the purpose of having tiers in the first place. Some people respond by adding SS tier, or SSS tier, which is basically just admitting the system doesn't work and trying to patch it with more letters. The Smash community even had to invent a whole "God Tier" above S to handle characters like Meta Knight in Brawl.

4. One dimension can't capture a multidimensional experience

This might be the most important one, and it's the reason I started questioning the whole system in the first place.

When you tier-list your games, what are you actually ranking them by? "Overall quality"? That's not a single dimension, it's a tangled mess of gameplay feel, narrative impact, art direction, emotional resonance, soundtrack, difficulty satisfaction, social experiences, nostalgia, and probably fifteen other things your brain is processing simultaneously without telling you.

A game like Journey might be C tier for gameplay complexity but S tier for emotional impact. Dark Souls might be S tier for challenge satisfaction but D tier for accessibility. When you force both of these assessments through a single tier, you're not ranking: you're averaging. And averages, as any statistician will tell you, can be spectacularly misleading.

Kotaku's Maddy Myers wrote about this brilliantly through the lens of her experience playing Blanka in Street Fighter (a character considered "low tier") who gave her a massive competitive edge precisely because no one bothered to learn the matchup. The tier list said Blanka was bad. Reality disagreed.

5. They're frozen in time

You made a tier list in 2023. Obviously your opinions have changed: you've played new games, revisited old ones, grown as a person. But that tier list? It's still floating around the internet somewhere, still generating comments, still representing an opinion you might not even hold anymore.

Real preferences are dynamic. They shift and evolve. A static tier list is a snapshot pretending to be a portrait.

So what actually works?

This is where it gets interesting. Because the problem isn't that ranking is impossible: it's that we've been using the wrong tool.

Think about it: if I asked you "Zelda: Tears of the Kingdom or Baldur's Gate 3, which is better?", you'd probably have an immediate gut reaction. Maybe you'd think about it for a few seconds, weigh some factors, and land on an answer. That comparison, one thing against one other thing, is something your brain is incredibly good at. It's the simplest possible ranking decision: this or that.

This isn't just intuition. There's real science behind it. A 2018 study published in PLOS ONE found that people made pairwise comparison decisions faster than when using traditional rating scales, and the results were actually more reliable. The researchers noted that pairwise comparisons carry a lower cognitive load, are less susceptible to systematic bias, and require fewer participants to produce stable results.

The chess world figured this out decades ago. The Elo rating system, created by physicist Arpad Elo, doesn't ask "how good is this player on a scale of 1 to 10?" It generates rankings entirely from head-to-head matchups. Every game is a simple comparison: who played better today? Over thousands of these pairwise comparisons, a remarkably accurate and self-correcting ranking emerges. No categories. No arbitrary buckets. Just the accumulated wisdom of many small, easy decisions.

And here's the beautiful part: you don't even need to compare every possible pair. The math (specifically the Elo algorithm) can infer positions from partial information. If you know A beats B and B beats C, the system already has a pretty good idea of what A vs C would look like, even if that comparison never happened.

What this looks like for games

Now imagine applying this to your game collection. Instead of staring at a grid of 200 games trying to sort them into arbitrary letters, you're simply answering questions: "Which game had the better gameplay: this one or that one?" "Which one moved you more emotionally?" "Which one was more fun?" One comparison at a time. One criterion at a time. No more agonizing over whether something is A or S tier, because the system doesn't think in tiers.

What you'd get isn't a frozen snapshot: it's a living ranking that evolves with every new comparison. A ranking that can tell you not just that you loved Elden Ring, but why (maybe it ranks first for challenge, third for atmosphere, and fifteenth for story). That kind of multidimensional picture is something no tier list will ever give you.

This is exactly what we built Tchoozit on. The idea was born from this frustration: yet another tier list in front of me, yet again that feeling that it captured absolutely nothing about why I actually loved these games. So we went the other way: instead of asking you to sort 200 games into five buckets, Tchoozit shows you two games and one emotional criterion, and asks you to pick. That's it. One duel at a time. The Elo algorithm does the rest, and over time, a rich multidimensional ranking emerges: one that tells you why you love what you love, not just that you love it.

It's still early, the platform has a lot to build. But the core idea, that pairwise emotional duels produce more honest rankings than any tier list, that's something we're pretty confident about. The science backs it up. And most importantly, it just feels right when you use it.

The real question

I'm not saying tier lists should disappear. As a conversation starter, as a meme format, as a way to generate delightful arguments about whether pineapple pizza deserves to exist (Answer: NO). They're fun precisely because they're reductive and controversial.

But if you actually care about understanding your own taste? If you want to know not just what you like, but the precise emotional fingerprint of why you like it? Then maybe, just maybe, it's time to stop dragging icons into colored rows and start ranking differently.

Your brain already knows how to do it. It just needs a system that speaks its language.

What's your experience with tier lists? Have you ever spent an unreasonable amount of time agonizing over a placement, wondering if the whole system was just broken? I'd love to talk about it.

← Back to blog