Project Tetris: Discord's AI analytics
Discord's data packages have AI-estimated age & gender demographics, is your data being ran through AI?

We all know about Clyde, many of us know about summaries, and a few of us will remember Discord's AI-powered channel emotes. But what if I told you there was one more AI feature Discord was using all the way back in 2022?
What's all this about?
Recently, prolific Discord-Meta Twitter account Discord Previews, posted about age & gender demographic estimates showing up in Discord data packages:
Discord has been using machine learning models to determine the gender and age group of some of its users since at least August 2022.
— Discord Previews (@DiscordPreviews) May 13, 2024
The data can be found in the "activity/analytics/events-[...].json" file of some Discord data packages, though the exact requirements are unknown. pic.twitter.com/6Y8yFRJ3pD
This post alone generated a large & emotional response from a lot of Discord users, many taking issue with the invasiveness of generating this information. But what actually is this information and how is it captured?
Discord's own Gender Genie
This data capture started in 2022, meaning the model(s) used likely only ingest data avaliable on users at that time. As pronouns weren't added to profiles until 2023, this means alternative data sources must have been used to profile gender demographics. This isn't anything new, algorithmic gender profiling for writing styles have existed since 2003, it's absolutely not a foreign concept to imagine an AI being used to the same ends.
As for age demographic guessing? Writing/reading age have been solid concepts for a very long time, regardless of AI. Again, it's not a stretch to imagine an AI getting similar results to these long-known algorithmic approaches. So, we know this model likely ingested Discord messages, maybe your bio, maybe your server list. But wait - Discord is putting your messages through AI?

It's not just one more AI feature
Discord does process messages you send through AI. This is an absolute truth, Discord themselves acknowledge their use of AI for both platform moderation and more superficial reasons. Discord's now head of Trust & Safety/former co-founder of the Discord-Acquired Sentropy published this article covering Discord's use of in-house models for platform moderation.

This contrasts to Discord's more superficial uses of AI, like Discord's AI summaries. This feature pipes every message you send in a summaries-enabled channel through an LLM. While this is now an in-house model, this reportedly used to be powered by OpenAI.
So, Discord is - through at least 3 distinct methods - processing your messages with AI. Which ones do you object to?
What's method 3?
We know Discord uses AI for moderation, we know they use it for summaries, but what about these estimated demographics?
It's likely these data points were generated using in-house models (hinted by the model_version parameter). It's also likely they were given messages as a dataset. However, many users have shown this appearing in their data package at wildly different times. Clearly there must be some trigger for Discord to collect this information, but to understand what that is we need to know what this data is for.
Project Tetris
An experiment sitting in Discord's developer portal.

This codename covers partner-gated expanded analytics for servers, specifically adding 3 new pages. Announcement reach is basically Announcement channels+ and doesn't have any interesting information, the other two though...

Audience Insights
This page is the missing piece, explaining why this data was captured and what triggers it's capture. Partnered servers' analytics. If a user isn't in any partnered servers, there's no reason for Discord to generate this data, if anything is a trigger this is surely it. Mystery solved!

What about Trending Conversations?
Summaries. It's just summaries. I wish there was more to write about here, but as summaries never saw a larger rollout there's no way I can show what this page looks like with actual data.

If you're not sure what summaries are, you can read about them here, and read about a security vulnerability found in them here.
Is the model accurate?
No AI model is going to have 100% accuracy at guessing these demographics, even algorithmic approaches aren't reliably accurate (though algorithms generally have an edge). Both suffer from the same biases, sample bias being the easiest to approach.
There are a lot of people on the internet, but there's more males than females (or other identities) and there's more people in their teens or 20s. So that means, algorithm or AI, there's going to be an inherent bias towards profiling people into those demographics simply because they're a majority.
What's happening with this data?
For the forseeable future, nothing. Project Tetris seems to have been spearheaded by Discord's Community Team for the Partner Program. Both the Community Team & Partner Program were axed as part of a 17% layoff the company faced. This means very few people can see this data, this data isn't being regenerated & Discord doesn't seem to have plans to revive this project.
Project Tetris aimed to give valuable insights to Discord Partners, but with the accuracy considerations, was it really worth the perceptually invasive processing of users' messages?
I want to see my Discord Assigned Gender™️
All you need to do is follow this handy flowchart from Twitter user big nutty:

Why doesn't Discord ask your age/gender?
They do! Discord asks your age on sign-up and retains your age data only while you're <18 to gate you from NSFW channels. However, they can only store data for exactly that use case, not for passing on to partners for analytics.
However, this raises a question, why not just use the pronouns feature & add a birthday feature for tracking your age? Sites have used the birthday trick for decades to legally keep your DOB on file, in return showing a funny balloon animation or giving you a discount on your birthday.
Sometimes the best solution is the simplest solution, and Discord's Project Tetris was a very complicated solution for a very niche feature that ultimately never quite made it.