Compress a GerryChain run¶
The most common workflow: run a GerryChain ReCom chain
and stream every plan straight into a single self-describing .bendl file, so you never
materialize a giant JSONL file.
Note
This recipe needs GerryChain installed: pip install gerrychain. binary-ensemble itself
only ever sees plain lists of integers, so the same pattern works with any sampler.
Reorder the graph before building the chain¶
The best compression wins come from graph order. BendlEncoder.add_graph(..., sort="mlc")
embeds an MLC-reordered graph and returns that reordered graph as a live NetworkX graph. Build
the GerryChain run on that returned graph so the sampler and the bundle agree on node order.
from functools import partial
from gerrychain import Partition, Graph, MarkovChain, updaters, accept
from gerrychain.proposals import recom
from gerrychain.constraints import contiguous
from binary_ensemble import BendlEncoder
encoder = BendlEncoder("ensemble.bendl", overwrite=True)
# Explicitly show the default: MLC reorders the graph for better run-length compression.
mlc_graph = encoder.add_graph("gerrymandria.json", sort="mlc")
# Hand the reordered graph back into GerryChain. This is the load-bearing step:
# the chain now runs in the same node order the bundle stores.
graph = Graph.from_networkx(mlc_graph)
node_order = list(graph.nodes)
initial_partition = Partition(
graph,
assignment="district",
updaters={"population": updaters.Tally("TOTPOP")},
)
ideal_population = sum(initial_partition["population"].values()) / len(initial_partition)
proposal = partial(
recom, pop_col="TOTPOP", pop_target=ideal_population, epsilon=0.01, node_repeats=2
)
chain = MarkovChain(
proposal=proposal,
constraints=[contiguous],
accept=accept.always_accept,
initial_state=initial_partition,
total_steps=1000,
)
Stream the chain into a bundle¶
The one thing to get right is still node order. Since the chain was built on
Graph.from_networkx(mlc_graph), each plan should be written in node_order, the node order
from that same GerryChain graph.
encoder.add_metadata(
{
"sampler": "ReCom",
"epsilon": 0.01,
"steps": 1000,
"node_order": "mlc",
}
)
with encoder.ben_stream(variant="twodelta") as ensemble: # twodelta suits ReCom chains
for partition in chain:
series = partition.assignment.to_series()
assignment = series.loc[node_order].astype(int).tolist()
ensemble.write(assignment)
# the bundle is finalized when the stream context closes
That’s it — ensemble.bendl now holds all 1,000 plans plus the graph and metadata in one
file. To read it back, see Read and iterate an ensemble.
Why this is better than reordering later¶
You can write a raw-order .bendl file with a BEN stream and later call
relabel_bundle() to reorder the graph and rewrite the stream. But when you control the
sampling code, it is cleaner to reorder first:
add_graph(..., sort="mlc")stores the reordered graph and permutation map.Graph.from_networkx(mlc_graph)makes GerryChain run on that exact graph.series.loc[node_order]writes assignments in that exact order.
That means the working BEN file is already locality-friendly, so every downstream step starts from the compressed-friendly order.
Archive the result¶
After the run, recompress the embedded BEN stream to XBEN for sharing:
from binary_ensemble import compress_stream
compress_stream("ensemble.bendl", out_file="ensemble-archive.bendl")
For more on final archival workflows, see Shrink a bundle for sharing.
Tip
Encoding twodelta (the default) delta-compresses pairwise ReCom moves. If you log a full
MCMC chain including rejections, variant="mkv_chain" collapses the repeated plans
instead. See Encoding variants.