Best Practices

This page describes recommended patterns for getting the most out of the Python API Client. See also the official efficiency guidearrow-up-right for guidance that applies to the Coin Metrics API more broadly.

Use Catalog and Reference Data First

Most workflows are faster, cheaper, and more reliable when you discover the universe of valid assets, markets, or metrics with reference_data_* and catalog_*_v2 before hitting the historical-data endpoints. The catalog also tells you which markets actually have recent data, which lets you drop obsolete entries before paying for a full timeseries pull.

For example, follow the flow indicated in light blue.

Step 1. List all spot markets:

spot_markets = client.reference_data_markets(type='spot').to_dataframe()

Step 2. Use the catalog to keep only markets that have minute candles available within the last two days:

from datetime import datetime, timedelta, timezone

cat = client.catalog_market_candles_v2(
    markets=list(spot_markets.market),
).to_dataframe()
cat = cat.loc[
    (cat.frequency == '1m') &
    (cat.max_time > datetime.now(timezone.utc) - timedelta(days=2))
].reset_index(drop=True)

Demo Videoarrow-up-right

Step 3. Pass the filtered list to a historical-data method (and, for large pulls, layer parallel() on top — see the next section). The same pattern applies to asset metrics: instead of calling get_asset_metrics directly with *, derive the asset and metric lists from reference_data_assets and catalog_asset_metrics_v2 and pass them in explicitly.

circle-info

Driving CoinMetricsClient methods from a pre-filtered list of assets / markets / metrics is consistently more performant than relying on wildcards, and it pairs naturally with .parallel() because each parallel worker gets a single, well-scoped query.

Parallel Execution

For large historical exports the most effective lever is to split your request into many smaller requests and run them in parallel. The client supports this directly via .parallel():

.parallel() either writes one file per worker (export_to_csv_files(), export_to_json_files(), export_to_parquet_files()) or merges every worker's output into a single result (to_list(), to_dataframe(), export_to_csv(), export_to_json()). Internally it uses Python's concurrent.futuresarrow-up-right, so it consumes more resources than a single-threaded request and may approach the Coin Metrics rate limitsarrow-up-right.

In rough order of resource usage and speed (most performant first):

  • .export_to_parquet_files()

  • .export_to_json_files()

  • .export_to_csv_files()

  • .to_list()

  • .export_to_json()

  • .to_dataframe()

Splitting Across Time

Use time_increment to split a single query into many parallel requests along the time axis. This example pulls a year of minute-frequency reference rates for three assets in parallel:

time_increment=relativedelta(months=1) runs 36 workers in total — 12 monthly windows for each of the 3 assets. The wall-clock difference is dramatic:

Use datetime.timedelta for sub-month windows and dateutil.relativedelta.relativedelta for month- or year-sized windows.

Guidelines

  • .parallel() is best when you can split a request across many list-type parameters (assets, markets, metrics, ...) or along the time axis. Single-market or single-asset requests will not see a meaningful speedup.

  • The *_files() exports are the safest and most performant choice — every worker writes its own file, so the client never needs to merge results in memory. The merging variants (to_dataframe(), to_list(), export_to_csv()) can use a lot of memory for high-volume endpoints like market-trades or market-orderbooks and may fail outright on very large windows.

  • By default *_files() writes to /{endpoint}/{parallelize_on}/.... For example, client.get_market_trades("coinbase-eth-btc-spot,coinbase-eth-usdc-spot").parallel("markets").export_to_json_files() produces ./market-trades/coinbase-eth-btc-spot.json and ./market-trades/coinbase-eth-usdc-spot.json. Adding time_increment=timedelta(days=1) further nests the output under start_time=... directories.

  • .parallel() is highly configurable — max_workers, a custom executor (e.g. ProcessPoolExecutor), and progress_bar are all available. Multithreaded code is harder to debug than single-threaded code, so this tool is best suited for historical exports rather than for real-time production systems.

  • Pass verbose=True or debug=True to CoinMetricsClient(...) if a parallel run is taking longer than expected.

Lazy Execution

Lazy executionarrow-up-right lets you describe transformations on a DataCollection without materializing the result, which is useful when you want to filter on a column the API does not expose as a parameter. Convert a DataCollection into a polars LazyFramearrow-up-right with to_lazyframe() and chain transformations onto it. See the polars guidearrow-up-right for the full lazy API.

Wildcards

Wildcards (*) let you query several entities — assets, exchanges, markets — with a single parameter:

Wildcards are convenient, but for historical exports prefer the catalog-driven flow described in Use Catalog and Reference Data First: resolving the concrete list of assets / markets first and feeding it to .parallel() is consistently faster, gives you a stable input you can reproduce, and lets you drop obsolete entries before they cost you a request.

Last updated

Was this helpful?