End-to-end workflow¶
This tutorial follows the recommended lifecycle:
prepare a graph,
write a
.bendlfile with a BEN stream while producing assignments,inspect and analyze the bundle,
add provenance,
relabel and recompress for sharing.
The code uses a tiny NetworkX grid so it runs anywhere. The same structure applies to a GerryChain run.
Prepare the graph¶
import networkx as nx
SIDE = 4
dual_graph = nx.convert_node_labels_to_integers(nx.grid_2d_graph(SIDE, SIDE))
for node in dual_graph.nodes:
row, col = divmod(node, SIDE)
dual_graph.nodes[node]["TOTPOP"] = 1
dual_graph.nodes[node]["GEOID20"] = f"{row:02d}{col:02d}"
adjacency = nx.adjacency_data(dual_graph)
Write the working bundle¶
add_graph() returns the graph in the order assignments should use. In this toy example the
assignment generator already uses integer node positions, so we only need the node count.
from binary_ensemble import BendlEncoder
encoder = BendlEncoder("workflow.bendl", overwrite=True)
ordered_graph = encoder.add_graph(adjacency, sort="key", key="GEOID20")
encoder.add_metadata({"sampler": "toy-grid", "seed": 2026, "node_order": "GEOID20"})
node_count = ordered_graph.number_of_nodes()
with encoder.ben_stream(variant="twodelta") as ensemble:
for step in range(20):
assignment = [(node + step) % 4 + 1 for node in range(node_count)]
ensemble.write(assignment)
Inspect the result¶
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("workflow.bendl")
print(decoder.count_samples())
print(decoder.assignment_format())
print(decoder.asset_names())
assert decoder.read_graph().number_of_nodes() == node_count
assert decoder.read_metadata()["sampler"] == "toy-grid"
Analyze a subset¶
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("workflow.bendl")
district_one_sizes = []
for assignment in decoder.subsample_every(5):
district_one_sizes.append(sum(1 for district in assignment if district == 1))
print(district_one_sizes)
Attach post-run provenance¶
from binary_ensemble import BendlEncoder
encoder = BendlEncoder.append("workflow.bendl")
encoder.add_asset("analysis-notes.txt", "Checked with the end-to-end tutorial.", content_type="text")
encoder.close()
Adapting this to GerryChain¶
The only GerryChain-specific step is extracting assignments in the same node order as the graph stored in the bundle.
write_order = list(ordered_graph.nodes)
with encoder.ben_stream(variant="twodelta") as ensemble:
for partition in chain:
series = partition.assignment.to_series()
ensemble.write(series.loc[write_order].astype(int).tolist())
The invariant is the same for every sampler: the list you pass to ensemble.write() must be in
the embedded graph’s node order.