binary-ensemble

binary-ensemble

Compress, store, and stream massive ensembles of districting plans.


Redistricting samplers like GerryChain’s ReCom, ForestReCom, and Sequential Monte Carlo routinely emit millions of plans. Stored as JSONL, a single ensemble can run to tens of gigabytes — most of it redundant, because consecutive plans barely differ. BEN (Binary-Ensemble) is a compression format and toolkit built for exactly this data: it turns those JSONL mountains into compact binary files you can store, share, and stream sample-by-sample without unpacking the whole thing.

binary-ensemble is the Python interface to the binary-ensemble Rust crate.

How much smaller?

A real 50k-plan ensemble on Colorado’s ~140k census blocks is 13.5 GB as JSONL. Reordered by GEOID20 it compresses to a ~280 MB BEN stream, and then to a 5.6 MB XBEN file — over a 2,400× reduction, fully lossless.

Install

pip install binary-ensemble

A first taste

Write an ensemble into one self-describing .bendl file, then read it back:

from binary_ensemble import BendlEncoder, BendlDecoder

plans = [[1, 1, 2, 2], [1, 2, 2, 2], [1, 1, 1, 2]]

# The stream context finalizes the bundle when it closes.
encoder = BendlEncoder("ensemble.bendl", overwrite=True)
with encoder.ben_stream() as ensemble:
    for assignment in plans:
        ensemble.write(assignment)

# Iterate the assignments straight back out, one at a time.
for assignment in BendlDecoder("ensemble.bendl"):
    print(assignment)

Where to next

Getting started

Install the package and compress your first ensemble in a few lines.

Quickstart
Concepts

Dual graphs, assignments, the BEN/XBEN/BENDL formats, and the compression levers — the mental model, data contract, performance model, and compatibility story behind the API.

Overview
How-to guides

Task-focused recipes: compress a GerryChain run, subsample, convert formats, shrink a bundle for sharing, diagnose errors, and copy cookbook patterns.

Overview
API reference

Every public class and function in binary_ensemble, organized by module.

Overview
Tutorial notebooks

Executable notebooks with rendered outputs. CI runs them end to end against the live API.

Working with .bendl files