Skip to content

Working with Parquet

Tips for working with Radar parquet files.

Why Parquet?

  • Columnar storage for fast analytics
  • ~10x smaller than CSV
  • Schema embedded in file
  • Works with all major tools

Loading Data

pandas

python
import pandas as pd
df = pd.read_parquet('events.parquet')

polars

python
import polars as pl
df = pl.read_parquet('events.parquet')

DuckDB

sql
SELECT * FROM 'events.parquet' LIMIT 10;

Common Queries

Top Purchases by Value

python
import polars as pl

df = (
    pl.scan_parquet('events.parquet')
    .filter(pl.col('transaction_code') == 'P')
    .sort('total_value', descending=True)
    .head(20)
    .collect()
)

Insider Activity by Company

sql
-- DuckDB
SELECT
    issuer,
    COUNT(*) as trades,
    SUM(total_value) as volume
FROM 'events.parquet'
WHERE transaction_code IN ('P', 'S')
GROUP BY issuer
ORDER BY volume DESC
LIMIT 20;
python
import polars as pl

monthly = (
    pl.scan_parquet('events.parquet')
    .with_columns(pl.col('filed_at').dt.month().alias('month'))
    .group_by('month')
    .agg(pl.col('total_value').sum())
    .sort('month')
    .collect()
)

Performance Tips

  1. Use lazy evaluation (polars scan_parquet, DuckDB)
  2. Select only needed columns
  3. Filter early in the pipeline
  4. Use the index file for lookups

Built for traders who value data provenance.