Best Practices
This page describes recommended patterns for getting the most out of the Python API Client. See also the official efficiency guide for guidance that applies to the Coin Metrics API more broadly.
Use Catalog and Reference Data First
Most workflows are faster, cheaper, and more reliable when you discover the universe of valid assets, markets, or metrics with reference_data_* and catalog_*_v2 before hitting the historical-data endpoints. The catalog also tells you which markets actually have recent data, which lets you drop obsolete entries before paying for a full timeseries pull.
For example, follow the flow indicated in light blue.
Step 1. List all spot markets:
spot_markets = client.reference_data_markets(type='spot').to_dataframe()Step 2. Use the catalog to keep only markets that have minute candles available within the last two days:
from datetime import datetime, timedelta, timezone
cat = client.catalog_market_candles_v2(
markets=list(spot_markets.market),
).to_dataframe()
cat = cat.loc[
(cat.frequency == '1m') &
(cat.max_time > datetime.now(timezone.utc) - timedelta(days=2))
].reset_index(drop=True)Step 3. Pass the filtered list to a historical-data method (and, for large pulls, layer parallel() on top — see the next section). The same pattern applies to asset metrics: instead of calling get_asset_metrics directly with *, derive the asset and metric lists from reference_data_assets and catalog_asset_metrics_v2 and pass them in explicitly.
Driving CoinMetricsClient methods from a pre-filtered list of assets / markets / metrics is consistently more performant than relying on wildcards, and it pairs naturally with .parallel() because each parallel worker gets a single, well-scoped query.
Parallel Execution
For large historical exports the most effective lever is to split your request into many smaller requests and run them in parallel. The client supports this directly via .parallel():
.parallel() either writes one file per worker (export_to_csv_files(), export_to_json_files(), export_to_parquet_files()) or merges every worker's output into a single result (to_list(), to_dataframe(), export_to_csv(), export_to_json()). Internally it uses Python's concurrent.futures, so it consumes more resources than a single-threaded request and may approach the Coin Metrics rate limits.
In rough order of resource usage and speed (most performant first):
.export_to_parquet_files().export_to_json_files().export_to_csv_files().to_list().export_to_json().to_dataframe()
Splitting Across Time
Use time_increment to split a single query into many parallel requests along the time axis. This example pulls a year of minute-frequency reference rates for three assets in parallel:
time_increment=relativedelta(months=1) runs 36 workers in total — 12 monthly windows for each of the 3 assets. The wall-clock difference is dramatic:
Use datetime.timedelta for sub-month windows and dateutil.relativedelta.relativedelta for month- or year-sized windows.
Guidelines
.parallel()is best when you can split a request across many list-type parameters (assets,markets,metrics, ...) or along the time axis. Single-market or single-asset requests will not see a meaningful speedup.The
*_files()exports are the safest and most performant choice — every worker writes its own file, so the client never needs to merge results in memory. The merging variants (to_dataframe(),to_list(),export_to_csv()) can use a lot of memory for high-volume endpoints likemarket-tradesormarket-orderbooksand may fail outright on very large windows.By default
*_files()writes to/{endpoint}/{parallelize_on}/.... For example,client.get_market_trades("coinbase-eth-btc-spot,coinbase-eth-usdc-spot").parallel("markets").export_to_json_files()produces./market-trades/coinbase-eth-btc-spot.jsonand./market-trades/coinbase-eth-usdc-spot.json. Addingtime_increment=timedelta(days=1)further nests the output understart_time=...directories..parallel()is highly configurable —max_workers, a customexecutor(e.g.ProcessPoolExecutor), andprogress_barare all available. Multithreaded code is harder to debug than single-threaded code, so this tool is best suited for historical exports rather than for real-time production systems.If you see
BrokenProcessPool, you are probably missing aif __name__ == '__main__':guard.Pass
verbose=Trueordebug=TruetoCoinMetricsClient(...)if a parallel run is taking longer than expected.
Lazy Execution
Lazy execution lets you describe transformations on a DataCollection without materializing the result, which is useful when you want to filter on a column the API does not expose as a parameter. Convert a DataCollection into a polars LazyFrame with to_lazyframe() and chain transformations onto it. See the polars guide for the full lazy API.
Wildcards
Wildcards (*) let you query several entities — assets, exchanges, markets — with a single parameter:
Wildcards are convenient, but for historical exports prefer the catalog-driven flow described in Use Catalog and Reference Data First: resolving the concrete list of assets / markets first and feeding it to .parallel() is consistently faster, gives you a stable input you can reproduce, and lets you drop obsolete entries before they cost you a request.
Last updated
Was this helpful?