The five-star review system is broken, and we’ve known it for years. Exhibit 472,304 in the ongoing case against unreasonable rating scales comes from Terry Godier, the creator of a new RSS reader called Current. Godier recently highlighted an unavoidable and unsolvable problem: in the five-star review system, anything below five is a disaster, so what are we even doing here?
Godier’s observation is painfully astute. He noticed that many of his app’s reviews are four stars, yet the accompanying comments are effusive with praise. Users write things like, “This is my favorite app!” or “Gamechanger!” but they give only four stars. The intent is clearly positive, but the result is that these reviews actively harm the app’s average rating. Current is hovering above 4.0 in the App Store, but those four-star reviews are dragging it down numerically, even though the sentiment is overwhelmingly favorable.
The psychology of review inflation
This phenomenon isn’t unique to Current or even to RSS readers. It’s a widespread issue across all platforms that rely on the five-star system: Amazon, Yelp, Google Play, the App Store, and countless others. The problem stems from a simple psychological bias: people tend to rate products they love highly, but they almost never give a perfect score. Four stars feels like a strong endorsement—it says, “I like this very much, but nothing is perfect.” Yet in a system where five stars is the only truly positive rating, anything less is effectively a demerit.
Research in behavioral economics has shown that the five-star scale is not an interval scale; it’s a compressed ordinal scale. Users interpret the stars differently depending on context. For example, a three-star rating on Amazon often means “average” or “mediocre,” but on Uber, a three-star rating is a clear sign of dissatisfaction. This inconsistency makes the system unreliable for both consumers and creators.
The cost of a four-star review
Let’s do the math. Suppose an app has 100 reviews, all of them five stars. The average is 5.0. Now one user leaves a four-star review. The average drops to 4.99—negligible. But as the number of reviews grows, the impact of a single four-star becomes more pronounced because the denominator is larger. For an app with 10 reviews, a single four-star brings the average to 4.9. For an app with only five reviews, the same four-star drops the average to 4.8.
But the real trouble begins when users themselves feel compelled to rate in a way that does not match their sentiment. The four-star review from a happy user is a paradox: they want to recommend the app, but they withhold the final star because of some minor quibble, or because they simply never give anything five stars. They may think they’re being honest, but they’re actually hurting the app’s discoverability and perceived quality. App stores often use average ratings to determine rankings, visibility, and even whether an app is featured. A 4.2 average might be the difference between being a top search result and being buried on page five.
A brief history of the five-star system
The five-star rating system has its roots in the early days of e-commerce and online reviews. Amazon pioneered it in the late 1990s as a simple way for customers to evaluate products. The intent was to replace the traditional “good” vs. “bad” binary with a more nuanced scale. At the time, it was revolutionary. But as the internet grew, so did the awareness of its flaws. By the mid-2000s, researchers were already documenting the J-shaped distribution of reviews: an overwhelming number of five-star ratings, a smaller number of one-star ratings, and very few in between. The middle of the scale was effectively abandoned.
In recent years, platforms have tried various solutions. Some, like Amazon, have experimented with “verified purchase” labels and weighting reviews by helpfulness. Others, like Netflix, abandoned the five-star system entirely in favor of a simple thumbs-up/thumbs-down. Netflix’s then-CEO Reed Hastings explained that users found the five-star scale too cognitively demanding and that a binary choice led to more consistent feedback. Meanwhile, Uber and Lyft use a mutual five-star rating system between drivers and passengers, but even there, the system is rigged: a driver who receives less than a perfect rating consistently may be deactivated, creating a culture of inflated scores.
The Current case: why it matters
Terry Godier’s RSS reader is a microcosm of the larger issue. Current is a well-designed, user-friendly app that fills a gap in the market for people who still rely on RSS feeds to stay informed. Godier launched it recently, and early adopters seem genuinely thrilled. Yet those same enthusiastic users are inadvertently damaging the app’s reputation by giving four-star reviews. Godier is powerless to change the system; he can’t request that users change their reviews, nor can he override the algorithm. He’s stuck watching the average decline as more people discover the app.
The situation is particularly frustrating because the four-star reviews themselves contain glowing praise. One user wrote, “This is the best RSS reader I’ve ever used, and I’ve tried them all. The design is clean, the sync is fast, and the developer is responsive. I only give four stars because I want to see more customization options in the future.” That’s not a criticism; it’s a wish list. But on a five-star scale, it’s a demerit.
Possible solutions and why they fail
Proposed solutions to the five-star problem include switching to a ten-point scale, adopting a binary system, or using a “like” button similar to social media. Each has its own drawbacks. A ten-point scale adds more granularity but introduces even more confusion about what each number represents. A binary system (thumbs up/down) simplifies feedback but loses nuance—a product that is “good” but not “great” might get a thumbs down, which is unfair. The “like” button creates a positivity bias, as only satisfied users bother to click, but it at least avoids the problem of punitive four-star ratings.
Another solution is to surface the actual review text more prominently than the numeric score. Some platforms already do this, but algorithms still prioritize average ratings. As long as apps and products are ranked by aggregate stars, the four-star problem will persist.
The broader impact on creators
For independent developers like Godier, a few four-star reviews can have a disproportionate impact. Apps with fewer than 50 reviews are especially vulnerable. A single four-star review can drop the average from 5.0 to 4.0 or lower, making the app look mediocre. This can discourage new users from downloading, reduce word-of-mouth referrals, and even affect monetization through in-app purchases or subscription conversions. It’s a vicious cycle: fewer downloads mean fewer reviews, so the existing reviews have even more weight.
Moreover, the psychological effect on creators can be demoralizing. Spending months building a product that users love, only to see the rating dragged down by positive but imperfect reviews, is disheartening. It incentivizes developers to beg for five-star reviews, which then pollutes the review ecosystem even further.
What needs to change
The five-star rating system is not going away overnight. Too many platforms have built their infrastructure around it. But incremental improvements are possible. First, platforms could retrain users by changing the labels associated with each star. For example, on a scale where 1 star means “terrible,” 2 means “poor,” 3 means “okay,” 4 means “good,” and 5 means “excellent,” the distribution might shift. But that would require universal adoption, which is unlikely.
Second, platforms could use natural language processing to analyze review text and adjust the weight of the numerical rating accordingly. If a four-star review contains phrases like “my favorite” or “gamechanger,” the system could interpret that as a five-star sentiment and either adjust the average or flag the review for verification. Third, platforms could simply abandon the five-star system in favor of something more nuanced, like a “recommend to a friend” percentage or a sliding scale from 0 to 100. The challenge is that every system has its own biases.
In the meantime, creators like Terry Godier are left to navigate a broken system. His RSS reader Current is excellent, and users who try it love it. But if you look at its average rating, you might think it’s merely good. That’s the tragedy of the five-star review system: it conflates honest praise with polite mediocrity, and no one seems able to fix it.
Source: The Verge News