Crypto Exchange APIs: Architecture, Rate Limits, and

Exchange APIs are the primary interface between your trading infrastructure and liquidity venues. They expose REST endpoints for account queries and order placement, WebSocket streams for market data, and authentication schemes that determine throughput, latency, and permitted operations. This article examines API design patterns across centralized exchanges, practical rate limit models, and common integration failures that degrade fill quality or trigger throttling.

REST vs WebSocket Design Patterns

Most exchanges expose two parallel interfaces. REST endpoints handle stateful operations like placing orders, canceling positions, querying balances, and retrieving historical trade data. These are synchronous request/response calls authenticated via API key signatures. WebSocket connections deliver streaming market data (order book snapshots, incremental updates, recent trades) and private account events (fills, balance changes, liquidations). The WebSocket model reduces polling overhead and latency but requires persistent connection management and replay logic for dropped messages.

Order placement typically happens over REST because it requires acknowledgment and idempotency guarantees. You send a signed POST request with order parameters, receive a server assigned order ID, and the exchange confirms acceptance or rejection. Market data consumption favors WebSocket because top of book prices and depth updates arrive at sub-millisecond intervals on liquid pairs. Polling a REST ticker endpoint every second introduces stale prices and wastes rate limit quota.

Some exchanges offer order placement over WebSocket to save round trip time. This pattern requires careful sequence number handling. If the WebSocket disconnects mid-flight, you must reconcile open orders via REST to avoid duplicate placements.

Rate Limit Models and Quota Accounting

Exchanges enforce rate limits to prevent infrastructure overload and ensure fair access. Limits are typically expressed as requests per unit time, but the accounting method varies. Token bucket models allow bursts up to a cap, then refill at a steady rate. Fixed window models reset the counter every N seconds, creating a cliff where usage drops to zero at rollover. Sliding window implementations smooth this by tracking request timestamps over a rolling interval.

Rate limit scopes differ by endpoint and API tier. Public market data endpoints often share a common per IP limit, while authenticated endpoints assign per API key quotas. Weight based systems assign different costs to resource intensive operations. Fetching a full order book snapshot might consume 10 units, placing a limit order consumes 1 unit, and a simple ping costs 0. Your quota depletes based on cumulative weight, not raw request count.

Exceeding limits triggers HTTP 429 responses with Retry-After headers or temporary IP bans. Backoff strategies matter. Naive retry loops amplify the problem. Implement exponential backoff with jitter and track your remaining quota via response headers. Many exchanges return X-RateLimit-Remaining and X-RateLimit-Reset headers on every response. Parse these to throttle proactively before hitting the ceiling.

Authentication and Signature Schemes

API keys consist of a public identifier and a private secret. The secret never transmits over the wire. Instead, you construct a signature by hashing the request payload (timestamp, endpoint path, query parameters, request body) using HMAC-SHA256 or a similar algorithm with the secret as the key. The exchange recomputes the signature server side and rejects mismatches.

Timestamp tolerances prevent replay attacks. If your request timestamp diverges from server time by more than the allowed window (commonly 5 seconds), the exchange rejects it. NTP drift on your server causes intermittent authentication failures. Monitor clock skew and sync regularly.

Some exchanges use separate API key permissions for read, trade, and withdraw operations. A compromised read only key leaks positions but cannot execute trades. Withdraw permissions should live on separate, rarely used keys stored in hardware security modules or encrypted vaults, not on hot trading servers.

Order Lifecycle and Execution Reports

When you place an order via REST, the exchange returns an acknowledgment containing the assigned order ID and initial status (NEW, PENDING_NEW). The order enters the matching engine queue. Subsequent state transitions (PARTIALLY_FILLED, FILLED, CANCELED, REJECTED, EXPIRED) arrive asynchronously via WebSocket execution reports or must be polled via REST.

Idempotency headers prevent duplicate orders during retries. If your TCP connection drops after sending an order but before receiving the response, you don’t know if the exchange accepted it. Resending the same request risks double placement. Client order IDs (clOrdID) provide a solution. Generate a unique identifier client side, include it in every order request, and the exchange deduplicates. If you retry with the same clOrdID, the exchange returns the existing order state instead of creating a duplicate.

Handling partial fills requires tracking cumulative filled quantity. An order for 10 BTC might fill in three increments: 3 BTC, 5 BTC, 2 BTC. Each execution report includes cumQty and leavesQty. Your logic must reconcile these updates to avoid position sizing errors.

Worked Example: Market Maker Order Refresh

A market maker quotes both sides of BTC/USDT with a 5 basis point spread. Current mid price is 40,000 USDT. The bot places:

Bid: 10 BTC at 39,980 USDT
Ask: 10 BTC at 40,020 USDT

Every 200 milliseconds, the bot receives a WebSocket order book update. If mid price moves to 40,050, the quotes are now off market. The bot must:

Send two REST DELETE requests to cancel existing orders (2 requests, 2 rate limit units)
Wait for cancel confirmations via WebSocket (CANCELED execution reports)
Send two REST POST requests to place new orders at 40,030 bid, 40,070 ask (2 requests, 2 rate limit units)
Parse acknowledgments and store new order IDs

At 5 requotes per second, this consumes 20 rate limit units per second (4 requests per cycle × 5 cycles). If the exchange allows 100 units per second, this strategy fits comfortably. But adding more pairs or tighter requote intervals quickly exhausts quota. Optimizations include replace order endpoints (amend price without cancel/replace) or batching multiple cancels into a single request where supported.

Common Mistakes and Misconfigurations

Ignoring nonce/timestamp drift. Your authentication fails randomly when server clock and client clock diverge beyond the tolerance window. Log rejected requests and check timestamp deltas.

Polling REST endpoints for market data. Fetching ticker or order book snapshots every 100ms wastes rate limits and introduces latency. Use WebSocket subscriptions instead.

Failing to handle WebSocket reconnections. Connections drop due to network hiccups or server restarts. If your process doesn’t detect disconnection and replay missed messages, you trade on stale data.

Not tracking rate limit headers. Blindly sending requests until you hit 429 errors causes unnecessary failures. Parse X-RateLimit headers and implement client side throttling.

Hardcoding order IDs instead of using client order IDs. When retries occur, you create duplicate orders. Always generate and track client side identifiers for idempotency.

Assuming order acknowledgment means filled. The exchange confirms order acceptance, not execution. You must wait for fill reports or query order status before assuming the position changed.

What to Verify Before You Rely on This

Current rate limit quotas and weight assignments per endpoint. Exchanges adjust these based on load and tier.
Supported authentication schemes and timestamp tolerance windows. Some exchanges migrated from older HMAC variants.
WebSocket message format versioning. Order book update schemas change. Ensure your parser matches the current specification.
Idempotency semantics. Not all exchanges support client order IDs, and behavior on duplicate clOrdID submission varies.
Cancel on disconnect settings. Some exchanges auto cancel open orders when your WebSocket drops. Others leave them resting.
Order type support and constraints. Not all venues support GTD (good till date), iceberg orders, or post only flags.
Minimum notional and lot size rules per trading pair. These change and vary by market.
API key IP whitelisting requirements. Production keys may require pre-registered IPs.
Testnet vs mainnet endpoint URLs and whether testnet supports all mainnet features.
Status page or incident history for API reliability. Frequent outages affect uptime assumptions.

Next Steps

Implement rate limit tracking by parsing response headers and maintaining a local counter that mirrors exchange quota state.
Set up WebSocket reconnection logic with exponential backoff and snapshot reconciliation to handle connection drops without data loss.
Test order placement with client order IDs in a staging environment, then deliberately trigger retries to confirm the exchange deduplicates correctly.

Category: Crypto Exchanges

REST vs WebSocket Design Patterns

Rate Limit Models and Quota Accounting

Authentication and Signature Schemes

Order Lifecycle and Execution Reports

Worked Example: Market Maker Order Refresh

Common Mistakes and Misconfigurations

What to Verify Before You Rely on This

Next Steps

Related Stories

US Crypto Exchange Selection and Operational Architecture

Evaluating the Largest Crypto Exchange: Measurement Methods and Operational Considerations

Evaluating Top Volume Crypto Exchanges: What the Numbers Actually Tell You