How We Handle 10 Million Concurrent User Connections and 5 Million Messages Per Minute Throughput

Scalability has always been a priority in Qiscus. Our Research and Development (R&D) team has completed an extensive research and benchmarking exercise to identify the best way to handle high concurrent user connections. After much resources poured into this mission, we found the potent formula.

In this article, we will share a gist of what we did to bring our chat solution to a whole new scale, with the ability to handle 10 million concurrent user connections and 5 million messages per minute throughput.

Concurrent User Connections

When we talk about chat systems or real-time applications, in general, concurrent user connections is one of the most important metrics to measure. Unsurprisingly, it is also one of the most difficult issue to scale as well. User connections are necessary to ensure that users have a real-time update for every event that happened in the backend. For example: receiving new messages, receiving typing status, receiving online/offline status, receiving a delivery receipt, and so on.

In our context, user connections will be established when our users open their apps from the web or mobile. Once they opened the app and are connected to Qiscus Chat SDK successfully, they will then be considered as connected users. So if there are 1 million users opening an app at the same time, means there are 1 million concurrent user connections taking place. In another scenario, if only 100,000 users open the app at the same time and then closed it, after which another 100,000 users opened the app at the same time, this means there are only 100,000 concurrent user connections at peak.

Based on our observation, typically, only less than 5% of monthly active connected users will open the app and chat at the same time.

So, to get 1 million users opening an app at the same time, the app must probably have about 20 million users or more.

Qiscus is using MQTT as protocol to handle real-time communication. It means that all user connections in this real-time event are being handled by our MQTT servers. We have been actively looking for and trying out multiple MQTT brokers specific for our use case. After much researching and experimenting, we found out that Erlang based MQTT is suitable for us and is able to handle our load. We then made use of a distributed system that Erlang natively support.

We did a load testing on the MQTT broker servers based on various conditions and configurations. We found some specific configurations that involve OS tuning, instance type, number of CPUs, RAM and also load balancing mechanism. By using about 300-500 instance servers as testing the server and about 10-20 broker servers, we able to scale our system to handle 10 million concurrent user connections!


Throughput Messages

Other than concurrent user connections, another very important metric is throughput messages; when users post their messages, we calculate how many messages are able to be sent per minute.

Our R&D team has been tried out several approaches to maximize this number. We tried out several tools and engines to handle this. After much studying, we found several tools that are suitable for handling high levels of throughput; MQTT as a real-time server connection to receive messages, and then Kafka to queue the message and let it be consumed by other backend services.

In our research and stress test, by putting MQTT in front as subscriber, we are able to receive millions of messages from many Client MQTT Publisher and then immediately stream them to Kafka. Once the messages are already in Kafka, multiple services like ElasticSearch, Database, or Push Notifications Service can be used. Using this configuration, the number of throughput messages possible was almost 5 million!

We have been doing a lot of fun stuff in our R&D and engineering team. Feel free to contact us at [email protected] to know more about us!

You May Also Like