Working with Parquet
Tips for working with Radar parquet files.
Why Parquet?
- Columnar storage for fast analytics
- ~10x smaller than CSV
- Schema embedded in file
- Works with all major tools
Loading Data
pandas
python
import pandas as pd
df = pd.read_parquet('events.parquet')polars
python
import polars as pl
df = pl.read_parquet('events.parquet')DuckDB
sql
SELECT * FROM 'events.parquet' LIMIT 10;Common Queries
Top Purchases by Value
python
import polars as pl
df = (
pl.scan_parquet('events.parquet')
.filter(pl.col('transaction_code') == 'P')
.sort('total_value', descending=True)
.head(20)
.collect()
)Insider Activity by Company
sql
-- DuckDB
SELECT
issuer,
COUNT(*) as trades,
SUM(total_value) as volume
FROM 'events.parquet'
WHERE transaction_code IN ('P', 'S')
GROUP BY issuer
ORDER BY volume DESC
LIMIT 20;Monthly Trends
python
import polars as pl
monthly = (
pl.scan_parquet('events.parquet')
.with_columns(pl.col('filed_at').dt.month().alias('month'))
.group_by('month')
.agg(pl.col('total_value').sum())
.sort('month')
.collect()
)Performance Tips
- Use lazy evaluation (polars
scan_parquet, DuckDB) - Select only needed columns
- Filter early in the pipeline
- Use the index file for lookups