Quickstart¶

This page takes you from zero to a compressed, self-describing ensemble in a few minutes. If a term is unfamiliar, the Concepts section explains the model behind the API.

The one thing to know¶

A districting plan is represented as an assignment: a flat list of integers, one per node of a dual graph, giving the district id of each node.

assignment = [1, 1, 2, 2]   # nodes 0 and 1 are in district 1; nodes 2 and 3 in district 2

An ensemble is just a sequence of these. binary-ensemble compresses that sequence.

Write an ensemble¶

The recommended container is a .bendl file — a single self-describing file. Open a BendlEncoder, attach any metadata, then write assignments through a ben_stream() context that finalizes the bundle when it closes:

from binary_ensemble import BendlEncoder

plans = [[1, 1, 2, 2], [1, 2, 2, 2], [1, 1, 1, 2]]

encoder = BendlEncoder("ensemble.bendl", overwrite=True)
encoder.add_metadata({"sampler": "demo", "seed": 1234})
with encoder.ben_stream() as ensemble:
    for assignment in plans:
        ensemble.write(assignment)
# bundle is finalized here

Read it back¶

Open a BendlDecoder and iterate. The bundle knows how many samples it holds and what it carries:

from binary_ensemble import BendlDecoder

decoder = BendlDecoder("ensemble.bendl")

print(len(decoder))             # 3
print(decoder.asset_names())    # ['metadata.json']
print(decoder.read_metadata())  # {'sampler': 'demo', 'seed': 1234}

for assignment in decoder:
    print(assignment)

Make it self-describing¶

The real value of a bundle is embedding the dual graph so a collaborator can open the file without hunting down the matching graph JSON. add_graph accepts a graph in NetworkX adjacency form (a dict) or a path to a graph JSON file:

import networkx as nx
from binary_ensemble import BendlEncoder, BendlDecoder

graph = nx.grid_2d_graph(2, 2)
graph = nx.convert_node_labels_to_integers(graph)
adjacency = nx.adjacency_data(graph)          # the dict shape add_graph expects

encoder = BendlEncoder("ensemble.bendl", overwrite=True)
encoder.add_graph(adjacency, sort=None)       # store as-is; see below for reordering
with encoder.ben_stream() as ensemble:
    for assignment in [[1, 1, 2, 2], [1, 2, 2, 2]]:
        ensemble.write(assignment)

decoder = BendlDecoder("ensemble.bendl")
graph = decoder.read_graph()                  # back as a live NetworkX graph
print(graph.number_of_nodes(), "nodes")

Tip

Passing sort="rcm" or sort="mlc" instead of sort=None reorders the graph’s nodes for much better compression and records a reversible permutation map. See Why reordering shrinks files.

Already have JSONL files?¶

If your sampler already wrote a JSONL ensemble, the codec helpers convert whole files in one call — no iteration required:

from binary_ensemble import encode_jsonl_to_ben, encode_ben_to_xben, decode_ben_to_jsonl

encode_jsonl_to_ben("plans.jsonl", "plans.ben")        # JSONL -> BEN (fast, working format)
encode_ben_to_xben("plans.ben", "plans.xben")          # BEN -> XBEN (smallest, for storage)
decode_ben_to_jsonl("plans.ben", "plans_again.jsonl")  # round-trip back to JSONL

Next steps¶

Compress a GerryChain run — the most common workflow.
Subsample a large ensemble without decoding the whole thing.
Concepts — formats, encoding variants, and how the compression works.