binary_ensemble.bundle¶
The bundle module is the recommended high-level API. It writes and reads .bendl files:
single-file containers that hold an assignment stream plus graph, metadata, permutation
maps, and custom assets.
When to use it¶
Use this module when you want the file to be self-describing. That is the normal case for redistricting ensembles because an assignment is only meaningful with the graph node order it was written against.
Task |
API |
|---|---|
Create a new bundle |
|
Attach a dual graph |
|
Stream assignments while sampling |
|
Read assignments and assets |
|
Reorder/relabel an existing bundle |
|
Recompress a bundle to XBEN |
|
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("ensemble.bendl")
assert len(decoder) == decoder.count_samples()
assert decoder.assignment_format() in {"ben", "xben"}
The .bendl bundle format: the recommended single-file container.
A bundle wraps a BEN/XBEN assignment stream together with front-loaded assets: a dual
graph.json, a node_permutation_map.json, a metadata.json, and arbitrary custom blobs.
BendlEncoder writes one; BendlDecoder reads and iterates one.
Typical write:
with BendlEncoder(path, overwrite=True) as enc:
enc.add_graph(graph, sort="rcm") # sort=None => store raw
enc.add_metadata({"seed": 1234})
with enc.ben_stream() as ensemble:
for assignment in chain:
ensemble.write(assignment)
Typical read:
dec = BendlDecoder(path)
graph = dec.read_graph()
for assignment in dec:
...
Encoder¶
BendlEncoder has two modes:
Mode |
Open with |
Stream writes |
Asset writes |
|---|---|---|---|
Create |
|
one stream |
before or after the stream |
Append |
|
unavailable |
immediate appends to a finalized bundle |
The stream context finalizes the bundle when it closes cleanly. You only need to use the encoder itself as a context manager for assets-only bundles or if that style is clearer in your code.
from binary_ensemble import BendlEncoder
encoder = BendlEncoder("api-demo.bendl", overwrite=True)
encoder.add_metadata({"sampler": "demo"})
with encoder.ben_stream() as ensemble:
ensemble.write([1, 1, 2, 2])
ensemble.write([1, 2, 2, 2])
Graph handling¶
add_graph() accepts NetworkX adjacency JSON, a path to that JSON, raw bytes, or a readable
object. By default it reorders with sort="mlc" for better compression and returns the
reordered NetworkX graph. Write assignments in the returned graph’s node order.
|
Meaning |
Needs |
Stores permutation map? |
|---|---|---|---|
|
Multi-level clustering; topology-based default |
no |
yes |
|
Reverse Cuthill-McKee topology ordering |
no |
yes |
|
Sort nodes by a node attribute |
yes |
yes |
|
Store the graph as-is |
no |
no |
import networkx as nx
from binary_ensemble import BendlEncoder
graph = nx.convert_node_labels_to_integers(nx.path_graph(4))
for node in graph.nodes:
graph.nodes[node]["GEOID20"] = f"{node:04d}"
encoder = BendlEncoder("api-graph.bendl", overwrite=True)
ordered_graph = encoder.add_graph(nx.adjacency_data(graph), sort="key", key="GEOID20")
with encoder.ben_stream() as ensemble:
ensemble.write([1, 1, 2, 2])
assert ordered_graph.number_of_nodes() == 4
- class BendlEncoder(file_path, overwrite=False)[source]¶
Bases:
objectWriter for a
.bendlbundle (create mode) or an asset appender (append mode).In create mode (the constructor), assets may be added before or after a single-use
ben_stream(). You do not need to useBendlEncoderitself as a context manager: closing theben_stream()context finalizes the bundle, so the common pattern is:enc = BendlEncoder(path, overwrite=True) graph = enc.add_graph(my_graph) # MLC-reordered by default with enc.ben_stream() as ensemble: # only the stream needs ``with`` for assignment in chain: ensemble.write(assignment) # bundle is finalized here
The encoder is still usable as a context manager if you prefer, and that is the easy way to finalize an assets-only bundle (one written with no
ben_stream()): eitherwith BendlEncoder(...) as enc: ...or an explicitclose(). In append mode (append()), an existing finalized bundle is grown with new assets andben_stream()is unavailable.- Parameters:
file_path (StrPath) – Output path for the new bundle (
stroros.PathLike, e.g.pathlib.Path). Must not exist unlessoverwrite=True.overwrite (bool, optional) – Replace an existing file at
file_path. Default isFalse. Unlike the one-shot transforms, this truncates the existing file when the encoder opens it, so an interrupted write leaves a truncated, unfinalized bundle (recoverable withallow_unfinalized) rather than the original file. Write to a fresh path and rename if the existing file is precious.
- Raises:
OSError – If
file_pathexists andoverwriteisFalse, or it cannot be created.
- classmethod append(file_path)[source]¶
Open an existing finalized bundle to append new assets.
ben_stream()is unavailable in append mode; eachadd_*commits immediately.- Parameters:
file_path (StrPath) – Path to an existing, finalized
.bendlbundle (stroros.PathLike).- Returns:
BendlEncoder – An encoder in append mode.
- Raises:
Exception – If the file is missing, is not a bundle, or is not finalized.
- Return type:
- add_graph(graph, sort='mlc', key=None, *, compress=None, compression_level=None)[source]¶
Embed the dual
graph.jsonand return the (possibly reordered) graph.When reordering, both
graph.jsonandnode_permutation_map.jsonare stored and the reordered graph is returned so the chain runs on that ordering. Reordering is pre-stream only; a raw graph (sort=None) may also be attached post-stream / in append mode.- Parameters:
graph (GraphInput) – The dual graph (
GraphInput): a livenetworkx.Graph(subclasses such asgerrychain.Graphcount; its node iteration order is preserved), or adjacency-format JSON as a parseddictorlist, rawbytes, a file-like object with.read(), or astr/os.PathLikepath to a JSON file. A plainstris a path here.sort (SortMethod | None, optional) – How to order the nodes (
SortMethodorNone):"mlc"(multi-level clustering that reorders the graph for better compression),"rcm"(reverse Cuthill-McKee),"key"(sort by the node attribute named inkey), orNoneto store the graph as-is with no permutation map. Default is"mlc".key (str | None, optional) – Node attribute to sort by, e.g.
key="GEOID";key="id"sorts by the NetworkX node id. Required with (and only valid with)sort="key". Default isNone.compress (bool | None)
compression_level (int | None)
- Returns:
networkx.Graph – The stored graph after any reordering (matching
BendlDecoder.read_graph()). Its node iteration order is the order the chain must write assignments in.- Raises:
ValueError – If
sort/keyis invalid.Exception – If a reordering graph is added after the stream has started.
- Return type:
nx.Graph
- add_metadata(metadata, *, compress=None, compression_level=None)[source]¶
Embed the canonical
metadata.jsonasset (run provenance).- Parameters:
- Raises:
Exception – If the payload cannot be converted to JSON bytes, or the encoder is in an invalid state.
- Return type:
None
- add_asset(name: str, payload: dict[str, Any] | list[Any] | bytes | bytearray | memoryview | str | SupportsRead | PathLike[str], content_type: Literal['json'], *, compress: bool | None = None, compression_level: int | None = None) None[source]¶
- add_asset(name: str, payload: bytes | bytearray | memoryview | str | SupportsRead | PathLike[str], content_type: Literal['text'], *, compress: bool | None = None, compression_level: int | None = None) None
- add_asset(name: str, payload: bytes | bytearray | memoryview | str | SupportsRead | PathLike[str], content_type: Literal['binary'], *, compress: bool | None = None, compression_level: int | None = None) None
- add_asset(name: str, payload: str | PathLike[str], content_type: Literal['file'], *, compress: bool | None = None, compression_level: int | None = None) None
Embed a custom asset under
name.Every asset carries a CRC32C integrity checksum, and payloads of 1 KiB or more are xz-compressed on disk by default (both transparent on read).
- Parameters:
name (str) – Asset name, the key used to read it back (e.g.
"params.json").payload (JsonAssetPayload | TextAssetPayload | BinaryAssetPayload | StrPath) –
The asset content; the accepted shapes depend on
content_type:for
"json"(JsonAssetPayload): adict/list(serialized viajson.dumps), a JSONstr, bytes-like JSON, a file-like object with.read(), or anos.PathLikewhose file is read. Must yield valid UTF-8 JSON; the decoder will auto-parse it.for
"text"(TextAssetPayload): the same shapes, minusdict/list; must yield valid UTF-8.for
"binary"(BinaryAssetPayload): the same shapes as"text"; stored verbatim (e.g. a zipped shapefile or a GeoPackage).for
"file"(StrPath): astroros.PathLikenaming a file whose contents are read and stored as binary.
Outside
content_type="file", a plainstris always content, never a path; pass apathlib.Pathto read from disk (e.g. aPathwithcontent_type="json"stores a JSON file the decoder will auto-parse).content_type (AssetContentType) – One of
"json","text","binary", or"file"(AssetContentType).compress (bool | None, optional) –
Truerequests xz storage compression,Falsestores the payload raw,None(default) follows the size policy. Even when requested, compression is kept only if it makes the stored form smaller, and very large payloads are probed on a prefix first so an already-compressed blob skips the full pass.compression_level (int | None, optional) – xz preset 0–9 for the compression pass. Default is the writer’s preset (6). Assets are write-once and read-many, so the level only trades one-time write CPU against permanent file size.
- Raises:
ValueError – If the payload does not satisfy
content_type(e.g. malformed JSON, non-UTF-8 text, an unknown content type).TypeError – If the payload shape is not accepted (e.g. a
dictwithcontent_type="text", or a non-path withcontent_type="file").
- Return type:
None
- remove_asset(name)[source]¶
Remove a named asset from a finalized bundle, reclaiming its bytes.
Available wherever
add_asset()commits immediately: append mode, or create mode after the stream has closed. The directory drop and the compaction commit as one operation, so the asset’s payload bytes are actually gone from the file (not just unreferenced), and on any error the bundle is left untouched, the asset still present for a retry. The name (and any singleton-type claim, e.g.metadata.json) becomes free again, so remove-then-add is the way to replace an asset’s payload. For the canonical assets, re-add through the typed methods (add_metadata(),add_graph()); a genericadd_asset()under a standardized name is refused, because the result would be invisible to the type-keyed readers.Removing appended (post-stream) assets is cheap at any scale: the compaction rebuilds only the small post-stream tail and never touches the assignment stream, even when the stream is tens of gigabytes. Removing a pre-stream asset (the graph, or metadata added before streaming) costs one whole-file rewrite instead. Note that each immediate-commit
add_asset(append mode, or create mode after the stream) leaves the superseded directory behind as a few dead bytes; the compaction here reclaims those too. For a bundle that arrives with dead space from other tooling, the raw_core.compact_bundle_in_placereclaims it directly, and the raw_core.BendlEncoder.remove_assetdrops only the directory entry if you specifically need that form.
- ben_stream(*, variant='twodelta')[source]¶
Open the single-use assignment stream context manager.
The embedded stream is always written in the BEN wire format; produce an XBEN bundle with
compress_stream()after writing (XBEN is a whole-stream LZMA2 wrap, so it cannot be written live sample-by-sample).- Parameters:
variant (Variant, optional) – BEN encoding variant (
Variant):"standard","mkv_chain", or"twodelta". Default is"twodelta".- Returns:
BendlStreamSession – A single-use context manager.
writeeach assignment inside thewithblock; a clean close finalizes the bundle, an exception leaves it unfinalized.- Raises:
ValueError – If
variantis invalid.Exception – If a stream was already written, append mode is active, or the encoder is closed.
- Return type:
The stream session¶
BendlEncoder.ben_stream() returns a BendlStreamSession. It is intentionally small: write
assignments, then close. A bundle can have only one assignment stream.
from binary_ensemble import BendlEncoder
encoder = BendlEncoder("api-session.bendl", overwrite=True)
with encoder.ben_stream(variant="twodelta") as ensemble:
for assignment in [[1, 1, 2, 2], [1, 2, 2, 2]]:
ensemble.write(assignment)
- class BendlStreamSession¶
Bases:
objectSingle-use context manager over a bundle’s assignment stream.
Obtained from
binary_ensemble.bundle.BendlEncoder.ben_stream(); you don’t construct it directly. Write assignments withwrite()inside awithblock. Closing the context cleanly finalizes the bundle; if the block exits via an exception the bundle is left unfinalized (recoverable, rather than stamped complete over a truncated stream).- close()¶
Finalize the bundle and close the stream. Idempotent after a clean close.
You usually do not call this directly; leaving the stream
withblock cleanly calls it. If the finalize fails (e.g. the disk fills while the directory is written), the encoder is poisoned: this and every later call keeps reporting the failure instead of claiming success, and the bundle on disk stays unfinalized.
- write(assignment)¶
Encode a single assignment into the bundle’s stream.
- Parameters:
assignment (Sequence[int]) – The plan as a sequence of district ids (e.g. a
list[int]), one per node in dual-graph node order.- Returns:
None.
- Raises:
ValueError – If the bundle carries a pre-stream graph and the assignment length does not equal the graph’s node count.
OSError – If the session is already closed, or the write fails.
Example
>>> ensemble.write([1, 1, 2, 2])
Decoder¶
BendlDecoder iterates the embedded stream and exposes bundle inspection methods.
Method |
Use |
|---|---|
|
Expanded number of samples |
|
|
|
Bundle header inspection |
|
Asset directory inspection |
|
Check every asset and stream checksum; raises on corruption |
|
|
|
Parsed |
|
Parsed permutation map, or |
|
Parse a JSON asset |
|
Raw bytes for any asset |
|
Copy the embedded stream out as |
|
Iterate only selected samples |
from binary_ensemble import BendlDecoder
decoder = BendlDecoder("ensemble.bendl")
print(decoder.asset_names())
print(decoder.read_metadata())
for assignment in decoder.subsample_range(1, 3):
print(assignment[:4])
Iteration rewinds on a fresh for loop. Do not drive two simultaneous loops from the same
decoder object; open a second decoder if you need independent cursors.
- class BendlDecoder(file_path)¶
Bases:
objectReader and iterator for a
.bendlbundle.Iterate the decoder to yield the embedded assignment stream one plan at a time (each a
list[int]of district ids), and uselen()for the sample count. Alongside the stream, a bundle carries assets (the dual graph, metadata, a node permutation map, and any custom blobs) exposed through the canonical getters (read_graph(),read_metadata(),read_node_permutation_map()) and the genericread_asset_bytes()/read_json_asset(). Inspect the directory withasset_names(),list_assets(),version(), andis_complete().A decoder is a snapshot of the file it opened: if the bundle changes on disk afterwards (an in-place transform swaps in a rewritten file, or an append rewrites the directory), every data-reading call refuses with a clear error rather than mixing old and new bytes; open a fresh decoder to read the current file.
This decoder is bundle-only: opening it on a plain
.ben/.xbenstream raises and points the caller atBenDecoder. A finalized assets-only bundle (one written with no assignment stream) iterates to nothing withlen() == 0.- Parameters:
file_path (StrPath) – Path to the input
.bendlfile (stroros.PathLike). Whether the embedded stream is BEN or XBEN is read from the bundle header; an XBEN stream warns about a one-time decompression startup cost.- Raises:
Exception – If
file_pathis not a bundle (useBenDecoderfor plain streams), or its header cannot be parsed.OSError – If the file cannot be opened.
Example
>>> from binary_ensemble import BendlDecoder >>> dec = BendlDecoder("ensemble.bendl") >>> graph = dec.read_graph() >>> for assignment in dec: ... ...
- asset_names()¶
Names of every entry in the bundle’s directory, in directory order.
- Returns:
list[str] – Asset names such as
"graph.json"and"metadata.json".
- asset_size(name, /)¶
Return the on-disk byte length of a named asset’s stored payload.
Read straight from the bundle directory; no decoding or copying. For assets stored xz-compressed (the
"xz"flag inlist_assets()), this is the compressed size; the decoded payload can be larger, so uselen(read_asset_bytes(name))for that.- Parameters:
name (str) – The asset’s name, as listed by
asset_names().- Returns:
int – Stored byte length of the asset’s payload region.
- Raises:
KeyError – If no asset with that name exists in the bundle.
- assignment_format()¶
Return the container format of the embedded assignment stream.
- Returns:
str –
"ben"or"xben".
- count_samples()¶
Count the samples in the embedded stream.
The result is the expanded sample count (a frame repeating five identical samples contributes five). On a finalized bundle the count is read from the bundle header, so it never requires scanning the stream; it is cached either way, so repeated calls and
len()are cheap.- Returns:
int – The number of samples in the bundle’s stream.
- extract_stream(out_path, overwrite=False, allow_unfinalized=False)¶
Copy the embedded assignment stream out to a standalone
.ben/.xbenfile.The bytes are copied verbatim, so the result can be opened directly with
BenDecoder(out_path, mode=dec.assignment_format()).- Parameters:
out_path (StrPath) – Path to write the extracted stream to (
stroros.PathLike).overwrite (bool, optional) – Replace
out_pathif it already exists. Default isFalse.allow_unfinalized (bool, optional) – Permit extraction from a bundle that was never finalized (recovering a partial stream). Default is
False.
- Raises:
OSError – If
out_pathexists andoverwriteisFalse, or the copy fails.
- is_complete()¶
Whether the bundle was successfully finalized.
- Returns:
bool –
Truefor a complete bundle,Falsefor a recoverable partial bundle.
- list_assets()¶
Return the full bundle directory.
- Returns:
list[dict] – Each dict has
name,type,offset,len, andflags.flagsis a list of string tags such as"json","xz", and"checksum".
- read_asset_bytes(name, /)¶
Read the (decoded) bytes of a named asset as a Python
bytesobject.- Parameters:
name (str) – The asset’s name, as listed by
asset_names().- Returns:
bytes – The asset’s decoded payload.
- Raises:
KeyError – If no asset with that name exists in the bundle.
- read_graph()¶
Read the bundle’s graph.json asset as a NetworkX graph, or None if absent.
The stored adjacency-format JSON is rebuilt into a live graph via networkx.readwrite.json_graph.adjacency_graph, so its node order matches the order assignments were written in and it can be handed straight to consumers like GerryChain’s Partition. The result is a
networkx.Graph, or anetworkx.MultiGraphif the stored adjacency declares itself a multigraph. The raw JSON is still available through read_json_asset(“graph.json”).
- read_json_asset(name, /)¶
Parse a JSON asset into a Python object (
dict,list, …).- Parameters:
name (str) – The asset’s name, as listed by
asset_names().- Returns:
The parsed JSON value.
- Raises:
- read_metadata()¶
Read the bundle’s metadata.json asset as parsed JSON, or None if absent.
- read_node_permutation_map()¶
Read the bundle’s node_permutation_map.json asset as parsed JSON, or None if absent.
- stream_size()¶
Return the on-disk byte length of the embedded assignment stream.
Read straight from the bundle header’s
stream_lenfield; no decoding or copying. This is the size of the stream region as stored (BEN bytes, or compressed XBEN bytes), the same bytesextract_streamwould copy out. For an unfinalized bundle the stream is taken to extend to the directory (or EOF), matching recovery extraction.- Returns:
int – Byte length of the embedded stream region;
0for an assets-only bundle.
Example
>>> BendlDecoder("ensemble.bendl").stream_size() 40110
- subsample_every(step, offset=1)¶
Restrict iteration to every
step-th sample.
- subsample_indices(indices, /)¶
Restrict iteration to the samples at the given 1-indexed positions.
Skipped samples are never materialized as Python lists, and where the encoding variant allows it (
standard,mkv_chain) whole frames are skipped without being unpacked.- Parameters:
indices (Sequence[int]) – The 1-indexed sample numbers to keep. Duplicates are dropped; an unsorted list is sorted, with a
UserWarning.- Returns:
BendlDecoder –
self, so the call can be chained into aforloop.- Raises:
Exception – If
indicesis empty, contains0(indices are 1-based), or contains an index greater than the number of samples in the stream.
- subsample_range(start, end, /)¶
Restrict iteration to a contiguous, 1-indexed inclusive range of samples.
- Parameters:
- Returns:
BendlDecoder –
self, for chaining into aforloop.- Raises:
Exception – If
startis0,endis less thanstart, orendis greater than the number of samples in the stream.
Example
>>> list(BendlDecoder("ensemble.bendl").subsample_range(10, 15)) # samples 10, 11, 12, 13, 14, and 15
- verify()¶
Verify the bundle’s integrity: asset and stream checksums, plus the header sample count.
Scans the raw on-disk bytes of every asset and of the assignment stream and compares them against the CRC32C checksums recorded when the bundle was written, then walks the stream’s frame boundaries to confirm the decoded sample count matches the (unchecksummed) header sample_count. Iterating or subsampling a decoder reads the stream without checking the checksums (partial reads cannot prove a whole-stream checksum) and trusts the header count for finalized bundles, so call this when integrity matters, e.g. after downloading a bundle or before an important run.
- Raises:
Exception – If any asset checksum or the stream checksum does not match the on-disk bytes, if the header sample_count disagrees with the decoded stream, or if the bundle is unfinalized (an unfinalized bundle’s stream checksum and sample count are not authoritative).
Example
>>> dec = BendlDecoder("ensemble.bendl") >>> dec.verify() # raises on any corruption
- version()¶
Return the bundle’s format version as a
(major, minor)tuple.- Returns:
tuple[int, int] – Bundle format version.
Whole-bundle transforms¶
These functions preserve bundle assets while rewriting the embedded stream.
from binary_ensemble import compress_stream, relabel_bundle
relabel_bundle("ensemble.bendl", out_file="api-sorted.bendl", sort="mlc")
compress_stream("api-sorted.bendl", out_file="api-archive.bendl")
Both transforms take an optional out_file: pass one to create a new file (overwrite=True
replaces an existing one), or leave it off to atomically replace the input in place.
- compress_stream(path, out_file=None, overwrite=False)[source]¶
Recompress a bundle’s embedded BEN stream to XBEN, preserving every asset.
All assets (graph, metadata, node_permutation_map, custom blobs) are preserved by decoded payload, name, type, and JSON flag; storage compression is normalized to the writer’s default policy. An assets-only bundle (empty stream) recompresses to an empty XBEN bundle.
- Parameters:
path (StrPath) – Path to the source
.bendlbundle (stroros.PathLike).out_file (StrPath | None, optional) – Destination path for the recompressed bundle (
stroros.PathLike), leavingpathuntouched. Default isNonewhich recompresses in place: the result is written to a temp file and atomically swapped overpath.overwrite (bool, optional) – Replace
out_fileif it already exists. Irrelevant in place, which always replacespath. Default isFalse.
- Raises:
OSError – If
out_fileexists andoverwriteisFalse.- Return type:
None
- relabel_bundle(path, out_file=None, sort='mlc', key=None, overwrite=False)[source]¶
Reorder a BEN bundle’s graph and relabel its stream to match.
Reorders the embedded
graph.json, rewrites every assignment into the new node order, and writes a fresh bundle storing the reordered graph and anode_permutation_map.json(so the reordering is reversible). Metadata and custom assets are preserved. This is the bundle-level form of the CLI’sben relabelordering flow, typically run to shrink a bundle before an XBEN recompress.Only BEN bundles are supported (relabel before compressing to XBEN); the source must carry a graph.
- Parameters:
path (StrPath) – Path to the source
.bendlbundle (stroros.PathLike). Must hold a BEN (not XBEN) stream and agraph.json.out_file (StrPath | None, optional) – Destination path for the relabeled bundle (
stroros.PathLike), leavingpathuntouched. Default isNonewhich relabels in place: the result is written to a temp file and atomically swapped overpath.sort (SortMethod, optional) – The ordering (
SortMethod):"mlc"(multi-level clustering),"rcm"(reverse Cuthill-McKee), or"key"(sort by the node attribute named inkey). Default is"mlc".key (str | None, optional) – Node attribute to sort by, e.g.
key="GEOID". Required with (and only valid with)sort="key". Default isNone.overwrite (bool, optional) – Replace
out_fileif it already exists. Irrelevant in place, which always replacespath. Default isFalse.
- Raises:
ValueError – If
sort/keyis invalid, or if the bundle has no graph or a non-BEN stream.OSError – If
out_fileexists andoverwriteisFalse.
- Return type:
None