binary-ensemble¶
binary-ensemble
Compress, store, and stream massive ensembles of districting plans.
Redistricting samplers like GerryChain’s ReCom, ForestReCom, and Sequential Monte Carlo routinely emit millions of plans. Stored as JSONL, a single ensemble can run to tens of gigabytes — most of it redundant, because consecutive plans barely differ. BEN (Binary-Ensemble) is a compression format and toolkit built for exactly this data: it turns those JSONL mountains into compact binary files you can store, share, and stream sample-by-sample without unpacking the whole thing.
binary-ensemble is the Python interface to the
binary-ensemble Rust crate.
How much smaller?
A real 50k-plan ensemble on Colorado’s ~140k census blocks is 13.5 GB as JSONL.
Reordered by GEOID20 it compresses to a ~280 MB BEN stream, and then to a
5.6 MB XBEN file — over a 2,400× reduction, fully lossless.
Install¶
pip install binary-ensemble
A first taste¶
Write an ensemble into one self-describing .bendl file, then read it back:
from binary_ensemble import BendlEncoder, BendlDecoder
plans = [[1, 1, 2, 2], [1, 2, 2, 2], [1, 1, 1, 2]]
# The stream context finalizes the bundle when it closes.
encoder = BendlEncoder("ensemble.bendl", overwrite=True)
with encoder.ben_stream() as ensemble:
for assignment in plans:
ensemble.write(assignment)
# Iterate the assignments straight back out, one at a time.
for assignment in BendlDecoder("ensemble.bendl"):
print(assignment)
Where to next¶
Install the package and compress your first ensemble in a few lines.
Dual graphs, assignments, the BEN/XBEN/BENDL formats, and the compression levers — the mental model, data contract, performance model, and compatibility story behind the API.
Task-focused recipes: compress a GerryChain run, subsample, convert formats, shrink a bundle for sharing, diagnose errors, and copy cookbook patterns.
Every public class and function in binary_ensemble, organized by module.
Executable notebooks with rendered outputs. CI runs them end to end against the live API.