Thread
Ah yes, let's refactor the Twitter complexity away: Remove global search! Remove following! Remove retweets! Remove tweets! Leave just DMs!

Congrats, you can now run Twitter on 95% fewer engineers, just like WhatsApp!

šŸ¤¦ā€ā™‚ļø

It's so painfully obvious that these Elon stan comments are all from people who have *no* clue what the hard scalability problems are in a global platform like Twitter.

As if user count means everything, completely ignoring that scaling isn't a linear problem.
I once had a scalability problem with a 2000-user service. It was basically an IRC channel. The server running it was well designed using async processing, but it was single thread/core, and eventually we hit the limits of the physical server.
Why? Because scaling a big chat room like that is an O(nĀ²) problem. Every time a user sends a message, it has to be delivered to *all* other users, and that's one system call per user to send the packet. We were hitting upwards of 10000 system calls per second.
So with 2000 users, you have *4* times as much load as with 1000 users, and *16* times as much load as with 500 users.

I knew we didn't have to scale much bigger than that, so I designed a proxy that sat in front, and patched the server to send broadcasts to the proxy only once.
Basically it was just a TCP reverse proxy that multiplexed all connections over a single one, and did the final fan-out at the proxy side. This let us spread the critical part of the load (broadcast fan-out) over the proxy instances, and we could run a bunch.
Now consider that, in some ways, Twitter is like a 400 million user chatroom.

Thankfully the fan-out is sparser than a chatroom, but there's no way to neatly shard out Twitter for linear scalability.
Stuff like global Twitter search is a *monumental* task and achievement. Do you realize just how complicated it is to have a single, global, searchable, continuously updated index of every tweet ever, that reacts in real time to thinks like deletions?
Obviously they cheat (they have to). I'm pretty sure tweets aren't deleted from the index immediately when the tweets are deleted, they just check if tweets are deleted when showing search results and hide anything that's gone.
They probably have several tiers of search indexes, one for the most recent tweets, and progressively colder ones for older stuff. They also search accounts and bios (and I know those aren't updated in real time), so there have to be batch processes handling those.
All this is stuff you can just ignore for a chat system like WhatsApp, because there most interactions are one-to-one, there is no global search, and group chats are relatively limited in scope and large group chats are the exception, not the rule.
Mentions
See All