Working with real-time market data can feel like drinking from a firehose. When I started streaming Hong Kong equity trades over WebSockets, I quickly noticed that not all prints are equal. Some represent actual investor orders, while others are system-generated auto-matches or odd-lot transactions. In this post, I’ll share a straightforward approach to classify them on the fly.
The Problem
If you feed raw trades directly into a strategy or a volume indicator, odd lots and auto-matches will corrupt your metrics. For instance, a burst of auto-matches can inflate trade count without any real price movement, creating false breakouts. Manually filtering them is impossible at real-time speeds. So, automatic classification isn’t just nice to have — it’s essential.
Message Anatomy
Typical WebSocket trade data includes:
| Field | Meaning |
|---|---|
| time | Trade time |
| price | Trade price |
| volume | Number of shares |
| trade_type | Often unreliable category |
| match_id | Matching identifier |
Since trade_type rarely helps, I rely on three practical heuristics:
- Volume check: HK stocks usually trade in board lots of 100 shares. Any trade with a non-round-lot volume (e.g., <100 shares) is tagged as an odd lot.
- Time clustering: Auto-matched trades occur in dense bursts — multiple fills within milliseconds. Odd lots don’t show this pattern.
- Counterparty inspection: If buyer and seller are both system accounts (like “SYS”), it’s an auto-match.
Implementation
I used the AllTick API to get a WebSocket connection for HK stocks. The Python snippet below subscribes to a symbol and tags every incoming trade:
from websocket import create_connection
import json
# Insert your AllTick API token here
API_TOKEN = 'your_api_token'
ws_url = f"wss://ws.alltick.co/stock?token={API_TOKEN}"
ws = create_connection(ws_url)
# Subscribe to real-time trades for HK stock 00700.HK
subscribe_msg = {
"action": "subscribe",
"symbol": "00700.HK",
"type": "transaction"
}
ws.send(json.dumps(subscribe_msg))
def check_auto_match(tick):
# Assume system auto-match counterparties are both "SYS"
return tick.get('buyer') == 'SYS' and tick.get('seller') == 'SYS'
while True:
data = ws.recv()
tick = json.loads(data)
volume = tick.get('volume', 0)
if volume < 100:
tick['tag'] = 'odd_lot'
elif check_auto_match(tick):
tick['tag'] = 'auto_match'
else:
tick['tag'] = 'normal'
print(tick['time'], tick['price'], tick['volume'], tick['tag'])
Impact on My Work
Since adopting this classification layer, my downstream applications only consume “normal” trades, resulting in cleaner analytics and more trustworthy signals. The auto-match and odd-lot streams are still stored, allowing me to analyze market microstructure separately. It’s a simple yet powerful pattern that I recommend to anyone dealing with HK real-time data.






