What is 2-factor moderation?


Similar in principle to 2-factor authentication, the 2-factor moderation paradigm uses two low-trust inputs to yield an actionable result. 2-factor moderation is used in high-traffic one-to-one communication, such as in-game real-time chats.

Real-time chats are often toxic enough to drive customers and players away, but with the high traffic, it is financially challenging to maintain a dedicated human moderation team. Moderating real-time chats is also a notoriously unpopular task among the human moderators.

2-factor moderation solves the issue by:

  1. Identifying attacks on another chat user (targeted harassment).
  2. Letting that user carry out a punitive action against the offender by assigning temporary moderator privileges for this particular incident.

Problematic content not targeting another user can be forwarded to human moderators or muted. Personal attacks usually constitute over 90% of all instances of abuse. With the personal attacks out of the way, only a small fraction of the potential violations will require human intervention. As a side effect, would-be trolls become wary of insulting others knowing that their victim may momentarily become their executioner.

Tisane makes it possible by classifying the type of the violation, and having a very high precision / positive predictive value (PPV).

If the chat is branched, then the would-be target is the poster of the ancestor message in the thread. If the chat is flat (no branches), then it’s one of the last posters (3 is more than enough).

See below the flowchart of the 2-factor moderation process and a brief explanation of outcomes.

2-factor moderation flowchart

Let’s explore different scenarios and outcomes:

Scenario 1

User 1 was abusive toward user 2. Tisane identifies the post as a personal attack. User 2 is granted temporary moderation privileges and bans User 1.

Scenario 2

User 1 publishes a post that Tisane mistakenly misidentifies as a personal attack. User 2 is granted temporary moderation privileges, but since User 1 did not really attack him/her, User 2 takes no action.

Scenario 3

User 1 publishes a post abusive toward a protected class or otherwise targeting a broad group. Tisane classifies it as bigotry (hate speech). Normal moderation process is followed.