binary_ensemble.codec¶
The codec module contains whole-file transforms. These functions do not expose an iterator: they read one file and write another.
Use them for conversion jobs. Use binary_ensemble.stream for sample-by-sample access
to plain streams, and binary_ensemble.bundle when graph and metadata should stay with
the assignments.
Inputs and outputs¶
Function family |
Input |
Output |
Carries assets? |
|---|---|---|---|
|
JSON Lines with an |
BEN or XBEN stream |
no |
|
BEN stream |
XBEN stream |
no |
|
BEN or XBEN stream |
JSON Lines |
no |
|
XBEN stream |
BEN stream |
no |
The expected JSONL shape is:
{"assignment": [1, 1, 2, 2], "sample": 1}
{"assignment": [1, 2, 2, 2], "sample": 2}
Only the assignment values are encoded into the stream. Store graph data, sampler
settings, scores, and provenance in a .bendl file if they need to travel with the file.
Whole-file stream/JSONL transforms.
These helpers convert entire files in one call, without an iterator: JSONL ↔ BEN ↔ XBEN. For
streaming sample-by-sample access use binary_ensemble.stream.BenDecoder; for the
single-file bundle format use binary_ensemble.bundle.
Encoders¶
from binary_ensemble import encode_ben_to_xben, encode_jsonl_to_ben, encode_jsonl_to_xben
encode_jsonl_to_ben("plans.jsonl", "api-plans.ben", overwrite=True)
encode_ben_to_xben("api-plans.ben", "api-plans.xben", overwrite=True)
encode_jsonl_to_xben(
"plans.jsonl",
"api-direct.xben",
overwrite=True,
variant="twodelta",
compression_level=9,
)
variant= is only used when creating BEN frames from assignments. XBEN-specific knobs
(n_threads, compression_level, and xz_block_size) tune the LZMA2 stage.
- encode_jsonl_to_ben(in_file, out_file, overwrite=False, variant='twodelta')¶
Encode a canonicalized JSONL ensemble into a BEN stream.
Expects one
{"assignment": [...], "sample": n}object per line. BEN is the fast working format; encode further to XBEN withencode_ben_to_xben()for storage.- Parameters:
in_file (StrPath) – Path to the input
.jsonlfile (stroros.PathLike).out_file (StrPath) – Path to write the
.benoutput (stroros.PathLike).overwrite (bool, optional) – Replace
out_fileif it already exists. Default isFalse.variant (Variant, optional) – BEN encoding variant:
"standard","mkv_chain", or"twodelta". Default is"twodelta".
- Raises:
OSError – If
out_fileexists andoverwriteisFalse, or the conversion fails.ValueError – If
variantis not a recognized variant name, orin_fileandout_fileare the same path.
- encode_jsonl_to_xben(in_file, out_file, overwrite=False, variant='twodelta', n_threads=None, compression_level=None, xz_block_size=None)¶
Encode a canonicalized JSONL ensemble directly into an XBEN file.
A one-step shortcut for
encode_jsonl_to_ben()followed byencode_ben_to_xben(). Expects one{"assignment": [...], "sample": n}object per line. Compression can be slow for large block-level ensembles.- Parameters:
in_file (StrPath) – Path to the input
.jsonlfile (stroros.PathLike).out_file (StrPath) – Path to write the
.xbenoutput (stroros.PathLike).overwrite (bool, optional) – Replace
out_fileif it already exists. Default isFalse.variant (Variant, optional) – BEN encoding variant:
"standard","mkv_chain", or"twodelta". Default is"twodelta".n_threads (int | None, optional) – Number of worker threads. Default is
Nonewhich uses all available cores.compression_level (int | None, optional) – LZMA2 level from 0 (fastest) to 9 (smallest). Default is
Nonewhich uses level 9.xz_block_size (int | None, optional) – Override the xz block size in bytes. Default is
Nonewhich uses the xz default.
- Raises:
OSError – If
out_fileexists andoverwriteisFalse, or the conversion fails.ValueError – If
variantis not a recognized variant name, orin_fileandout_fileare the same path.
- encode_ben_to_xben(in_file, out_file, overwrite=False, n_threads=None, compression_level=None, xz_block_size=None)¶
Compress a BEN stream into an XBEN file with LZMA2.
XBEN is the smallest format and is meant for storage and transfer. Compression can be slow for large block-level ensembles; relabel and reorder first (see
relabel_bundle()) for the best ratios.- Parameters:
in_file (StrPath) – Path to the input
.benfile (stroros.PathLike).out_file (StrPath) – Path to write the
.xbenoutput (stroros.PathLike).overwrite (bool, optional) – Replace
out_fileif it already exists. Default isFalse.n_threads (int | None, optional) – Number of worker threads. Default is
Nonewhich uses all available cores.compression_level (int | None, optional) – LZMA2 level from 0 (fastest) to 9 (smallest). Default is
Nonewhich uses level 9.xz_block_size (int | None, optional) – Override the xz block size in bytes. Default is
Nonewhich uses the xz default.
- Raises:
OSError – If
out_fileexists andoverwriteisFalse, or the conversion fails.ValueError – If
in_fileandout_fileare the same path.
Decoders¶
from binary_ensemble import decode_ben_to_jsonl, decode_xben_to_ben, decode_xben_to_jsonl
decode_ben_to_jsonl("chain.ben", "api-chain.jsonl", overwrite=True)
decode_xben_to_ben("chain.xben", "api-chain.ben", overwrite=True)
decode_xben_to_jsonl("chain.xben", "api-chain-from-xben.jsonl", overwrite=True)
Decoding auto-detects the stream variant from the file; you never pass variant= when
reading.
- decode_ben_to_jsonl(in_file, out_file, overwrite=False)¶
Decode a BEN stream back to canonicalized JSONL.
Produces one
{"assignment": [...], "sample": n}object per line, with sample numbers starting at 1. This is the inverse ofencode_jsonl_to_ben().- Parameters:
in_file (StrPath) – Path to the input
.benfile (stroros.PathLike).out_file (StrPath) – Path to write the
.jsonloutput (stroros.PathLike).overwrite (bool, optional) – Replace
out_fileif it already exists. Default isFalse.
- Raises:
OSError – If
out_fileexists andoverwriteisFalse, or the conversion fails.ValueError – If
in_fileandout_fileare the same path.
- decode_xben_to_jsonl(in_file, out_file, overwrite=False)¶
Decode an XBEN file back to canonicalized JSONL.
Produces one
{"assignment": [...], "sample": n}object per line, with sample numbers starting at 1.- Parameters:
in_file (StrPath) – Path to the input
.xbenfile (stroros.PathLike).out_file (StrPath) – Path to write the
.jsonloutput (stroros.PathLike).overwrite (bool, optional) – Replace
out_fileif it already exists. Default isFalse.
- Raises:
OSError – If
out_fileexists andoverwriteisFalse, or the conversion fails.ValueError – If
in_fileandout_fileare the same path.
- decode_xben_to_ben(in_file, out_file, overwrite=False)¶
Decompress an XBEN file into a plain BEN stream.
XBEN decompression is fast; converting to BEN gives you a stream you can read, replay, and subsample. The encoding variant is preserved and detected automatically on the next read.
- Parameters:
in_file (StrPath) – Path to the input
.xbenfile (stroros.PathLike).out_file (StrPath) – Path to write the
.benoutput (stroros.PathLike).overwrite (bool, optional) – Replace
out_fileif it already exists. Default isFalse.
- Raises:
OSError – If
out_fileexists andoverwriteisFalse, or the conversion fails.ValueError – If
in_fileandout_fileare the same path.