How crypto exchanges websocket endpoints perform during (mild) high activity periods

Following my previous exploration into the feasability of a real-time price aggregate, I continued gathering data. More data: more symbols, more connectors, and order books in addition to trades! But with more data comes more headaches. We'll dig into that.

I was lucky enough to capture an episode of high market activity that happened on Friday Aug 18th. In this post, I'll explore how the aggregate pipeline handled the event. Spoiler: pretty badly. However, the data shows the difference in how exchanges perform during a mild high-activity market.

I initially planned to include the continuation of the true mid price exploration in this post. However, there was so much value in the latency points that I decided to split it up in a second post to come later. I hope you enjoy this latency-focused post!

The notebook and the data of this experiment is much larger than the previous one. They'll soon be available when I open-source this pipeline project. In the meantime, reach out to me on Discord @dataroc or on LinkedIn if you wish to access either one of them!

Subscribe to the newsletter

Never miss a new post

Data collection changes

(1) Now indexing trades and order books

If you recall, my data gathering code was using CCXT's watch_ticker websocket endpoint to gather the last price of symbols accross exchanges. This endpoint is essentially the candles feed of exchanges. I decided to stop using this endpoint and instead use a combination of watch_trades and watch_order_book.

This allows me to have a real-time aggregate based on both trades (last price) and current bid/ask spreads (mid price).

(2) Focus on specific exchanges

Instead of trying as many exchanges as possible and hoping for the best, I instead made sure specific exchanges were included. Binance calculates the futures Mark price using a specific list of "good" exchanges: KuCoin, Huobi, OKX, HitBTC, Gate.io, Ascendex, MXC, Bitfinex, Coinbase, Bitstamp, Kraken, and Bybit. Include Binance itself and this gets you 12 exchanges to index.

(3) NTP time synchronization

Some users inquired on my previous post regarding time synchronization. Although it wasn't one my concerns initially, I realized the server that runs the data collection was (and still is) using the Ubuntu NTP server by default. Therefore, without being aware, my server was already keeping my time synchronized to an NTP server with a reasonable precision. At the time of writing, timedatectl timesync-status was reporting a mean time offset of +2.7ms.

Therefore, we can assume the timestamps provided by my server are accurate to at least 5ms, which means any latency measurement shown in this post should be considered accurate to about 5ms.

Data overview

I decided to index the data of 6 symbols: BTC/USD, BTC/USDT, ETH/USDT, ETH/USDT, RPL/USD and RPL/USDT. 3 assets, with both their USD and USDT traded pairs. Figure 1 shows the active connectors from Aug 18 to Aug 22.

Crypto exchanges generate a lot of data

The bad news: trades and order books from 12 exchanges for 6 symbols (including some of the biggest markets) kinda generate a lot of data to process. Over 4 days, the data pipeline processed 13,057,702 trades and 7,333,657 order book updates. During the highest activity peak, it's between 300 and 500 messages that needed to be processed every second. Figure 2 shows the count of processed messages during the experiment.

Unfortunately, the single-process data pipeline did not keep up at that moment :-) I hit the limit of what a single-threaded, asyncio pipeline written in unoptimized Python, running on 8 year old hardware can achieve.

Database latency

The database latency is the time delta between the reception of a certain message (i.e. a market trade of BTC/USDT on Binance) and the database insertion timestamp using an SQL function such as NOW(). This delta has an effective error margin of 0 since both timestamps are provided by the same machine.

Database insertion latency was unnaceptably high

The first clear indicator that the pipeline did not keep up is this database insertion latency. The pipeline and the database both run on the same machine and share the same CPU, memory and disk resources. Therefore, any sustained database insertion latency is a sign of the system running at the limit of its capacities.

Figure 3 shows the database insertion latency over the duration of the experiment. Notice how certain periods have a very high sustained latency, such as starting at 13:00 on Aug. 18. I actually heard my server's fan running full speed at that moment. Sure enough, the server was running at 75% CPU+ when I looked. I found the "culprit" soon enough: the crypto market was in a high volatility event on the major pairs.

This is a clear indicator that the pipeline needs to be distributed over multiple machines - which is what I'm working on right now. Although you could argue having a literal sound indicator of market volatility is actually a good thing...

Database insertion latency was directly correlated with market activity

If you superpose the trades count and the median database latency as in Figure 4, you can clearly see the 2 are strongly correlated. High trades activity equals high insertion latency.

One interesting property of the database latency is that it only considers the programmatic pipeline delay - not any network delay. It is the time between the pipeline receiving a message and the local database inserting the record. Therefore, we can conclude with high confidence that our pipeline in it's current state is unable to support this amount of trading activity.

Connector latency

The connector latency is the time delta between the server-provided timestamp of when trades happened versus the timestamp of when we actually receive the message.

The data here is a bit harder to interpret because the connector latency is affected by the increased load on the server running the pipeline. Since it's the receiving pipeline job to stamp the time at which the message is received, any delay in processing artifically increases the perceived delay of the connector.

Figure 5 shows the unfiltered, median connector latency throughout the experiment.

Connector latency stability varied by exchange

Nonetheless, it's interesting to see that some connectors performed relatively well throughout the duration of the experiment even if the machine was under heavy load. Considering all connectors were subjected to the same heavy load conditions, differences in performance between connectors can reasonably be attributed to the respective exchange ability to perform during high market activity events.

If you look at Figure 6, you can see that Binance seems to be performing without much impact around Aug 18, 13:00 - the high market activity event. Compare that to Kucoin that has a noticeable increase in connector latency.

Even though we conclude here that the discrepencies between exchanges are not due to the pipeline, some properties could shift back the blaim on the pipeline itself:

There is a possibility that certain connectors asyncio coroutines were executed more frequently than others and therefore reducing the median latency.
It's possible the ratio of the number of trades in a low activity market versus a high activity market is much higher in smaller exchanges than in bigger ones. E.g. maybe Binance BTC/USDT has 10000 trades in a "normal" hour, and 30000 trades in a "volatile" hour (ratio of 3), but Kucoin has respectively 1000 and 10000 trades (ratio of 10) in these hours.
Because we're only looking at 30 minutes buckets of median latencies, short high activity zones could be "hidden" in the bucket. And more so for exchanges with a higher ratio of high/low activity trades.

The more I'm thinking about this the more I think these connector latencies are actually affected by the pipeline itself. I'll have to dig into this deeper in the next installment of this serie.

Conclusion

Although it is unfortunate that the pipeline hit a hardware limit during the experiment, it's interesting to see that some exchange performance data could be infered nonetheless.

I will soon share the code used to run these experiments. I'm currently refactoring the pipeline to support distributed execution. this includes adding Redis as an in-memory, high speed database for inter-node communications and support an eventual real-time API.

I'm happy to take comments and suggestions! Reach out either on Discord @dataroc or on LinkedIn.