7 Signs Today’s Automatic Content Moderation is Broken - Tisane Labs
Logo

7 Signs Today’s Automatic Content Moderation is Broken

Vadim Berman

4 February 2025

7 Signs Today’s Automatic Content Moderation is Broken

7 Signs Today’s Automatic Content Moderation is Broken

Almost any time content is posted online, it is moderated by humans or machines. As the amount of user-generated content grows, so, too, does the demand for moderation. Problem is, today’s systems are riddled with flaws that unfairly silence innocent users yet gloss over content posted by bad actors.

What Is Automatic Content Moderation?

The internet is flooded with new content every second of every day. That content is published on digital platforms — social media, forums, and so on. There’s no one universal agency policing user content.

Instead, it’s the responsibility of the platform itself to determine own rules while complying with legal requirements. This means that it comes down to Reddit, Facebook, and other communities big and small to examine every piece of text and image hosted on their platform. Because of the sheer volume of new data generated by users, it is prohibitively expensive to rely on human moderators alone. It also puts content moderators at risk of being exposed to disturbing or even illegal material.

Automatic content moderation is an alternative. It uses algorithms and machine learning systems to monitor and remove content online. This might sound like an intelligent solution. It’s fast enough to keep pace with the influx of new content, and it protects employees from text and images they’d rather not see — all while saving the company money. But it’s far from perfect.

Here are seven signs today’s automatic content moderation approach isn’t working.

Photo credit: (none). Temporary caption, placeholder.

Sign 1: Moderations Pleases Exactly No One

Content moderation is not something that’s celebrated by most users.

On the one side, you have critics who see it as an infringement on their right to free speech or a form of censorship. In some cases, it could be perceived as a milder online analogue of police brutality.

On the other, you have those harmed by the toxic content they’ve been exposed to. They complain that the moderators — human or otherwise — didn’t do enough.

The very rare applause is reserved for human moderators praised for their “hard work.” But if their work is still this hard, isn’t the automation supposed to shoulder that burden? This only highlights the limitations of current automation tools, which leave both moderators and users dissatisfied.

Sign 2: Moderation Is a Political Scapegoat

Content moderation is often conflated with censorship in political discourse. Both sides of the political spectrum accuse platforms of bias — they delete content relating to one extreme while allowing content relating to the other. In reality, the algorithms are mostly apolitical: they filter insults, hate speech, and scam attempts but usually do not touch anything political.

Still, moderation becomes a convenient scapegoat, largely because it is notoriously opaque. In reality, these systems don’t “make political decisions.” They aren’t designed to handle political hot potatoes at all. But that is often lost in public debate.

Sign 3: The “Computer Says No” Problem

Automatic content moderation systems are decision-makers, and what makes them particularly infuriating is their lack of transparency about why a decision was made. When content is flagged or removed, users are often left in the dark. “Your post violates community guidelines” — no further details provided, very much like in the famous Little England sketch.

At best, platforms can use methods like Shapley values to point to specific flagged words or phrases that triggered content removal. This might pass some rudimentary legal requirements, but no platform shares these insights with users. It doesn’t explain much, so it would only fuel rather than alleviate frustration.

(Tooting our own horn, Tisane seems unique in being able to provide a human-readable explanation of the decision. LLM-based moderation engines were not yet shown to provide consistent, deterministic explanations at scale.)

Sign 4: Platforms Still Use Blacklists

You might expect cutting-edge AI to power moderation systems, but the truth is far less glamorous. Many platforms, even the big ones, still rely on a messy mix of rudimentary keyword blacklisting augmented by machine learning classifiers with vaguely defined categories. These too broad categories result in hard-to-interpret results, so the stone-age blacklist approach is still dominant.

While this approach may work for simple tasks like detection of spam and overt swearwords or slurs, it struggles with just about everything else.

Photo credit: (none). Temporary caption, placeholder.

Sign 5: Proliferation of Algospeak

Have you noticed how online users increasingly use obfuscated language? Words like “kill” become “k*ll,” and “Lolita” morph into “L-a.” This phenomenon, called “algospeak,” is a direct response to automated moderation’s flaws. Users try — and succeed — to evade detection by using all-too-simple tricks to bypass algorithms.

Sometimes, users leverage algospeak for malicious reasons (meaning, the automatic moderation is not very sophisticated). But in most cases, posters are afraid of unfair penalties (meaning, the automatic moderation is inaccurate). It’s a lose-lose.

Sign 6: No One Knows What Counts as “Good” Moderation

One of the biggest challenges in content moderation is defining what “good” even means. Is it about free speech absolutism? Is it about protecting users from harm? Compliance with EU legislation?

Many platforms mix vague claims of free speech with moderation policies potentially driven by personal grievances or cultural biases. What follows is wildly inconsistent enforcement of subjective rules. Something that’s acceptable on one platform is straight out banned on another similar platform. Users are left confused and frustrated with no benchmark or universally accepted standard.

Sign 7: Big Problems Remain Unsolved

Technology has advanced, and moderation has evolved alongside it. Despite this, the fundamental issues plaguing content moderation remain unsolved. Good users are still unfairly penalized, while bad actors continue to slip through the cracks.

Take as an example fresh in everyone’s minds Meta’s sweeping changes to its moderation policies. Over nearly a decade, Meta was forced to invest millions in their automated systems by advertisers concerned for brand safety and threats of fines and penalties all over the world. Despite reports of superior performance, it was never enough for everyone. By Zuckerberg’s own recent admission, these systems still fall short.

Meanwhile, savvier trolls and criminals bypass these guardrails to wreak havoc within online communities; horror stories continue to emerge.

The promise of automation was to alleviate these problems, but instead, it often exacerbates them. And with reliance on automation growing, these issues will only become more pronounced unless we address them right now.

The Quest for the Better Moderation Mousetrap

Automatic moderation alone is not a panacea, nor is it meant to be. It is, however, an essential and versatile tool. While today’s state of affairs may look as if the progress has stalled, there are always new directions to explore.

We in Tisane Labs attacked the problem from a different angle. Instead of picking the components and technologies in vogue at the moment, we analyzed the headaches, both from the perspective of online communities and their users. We designed a system transparent and deterministic by design, a system that looks at the big picture before judging. If we were to express in one word what Tisane brings to the content moderation conversation, the word would be clarity.

We encourage our peers in the industry to challenge existing paradigms and try new ideas. We look forward to the future when the online communities become places where it pays off to be a normal, sane, decent person.


Copyright © 2017-2025 Tisane Labs