binary_ensemble.codec

The codec module contains whole-file transforms. These functions do not expose an iterator: they read one file and write another.

Use them for conversion jobs. Use binary_ensemble.stream for sample-by-sample access to plain streams, and binary_ensemble.bundle when graph and metadata should stay with the assignments.

Inputs and outputs

Function family

Input

Output

Carries assets?

encode_jsonl_to_*

JSON Lines with an assignment field

BEN or XBEN stream

no

encode_ben_to_xben

BEN stream

XBEN stream

no

decode_*_to_jsonl

BEN or XBEN stream

JSON Lines

no

decode_xben_to_ben

XBEN stream

BEN stream

no

The expected JSONL shape is:

{"assignment": [1, 1, 2, 2], "sample": 1}
{"assignment": [1, 2, 2, 2], "sample": 2}

Only the assignment values are encoded into the stream. Store graph data, sampler settings, scores, and provenance in a .bendl file if they need to travel with the file.

Whole-file stream/JSONL transforms.

These helpers convert entire files in one call, without an iterator: JSONL ↔ BEN ↔ XBEN. For streaming sample-by-sample access use binary_ensemble.stream.BenDecoder; for the single-file bundle format use binary_ensemble.bundle.

Encoders

from binary_ensemble import encode_ben_to_xben, encode_jsonl_to_ben, encode_jsonl_to_xben

encode_jsonl_to_ben("plans.jsonl", "api-plans.ben", overwrite=True)
encode_ben_to_xben("api-plans.ben", "api-plans.xben", overwrite=True)

encode_jsonl_to_xben(
    "plans.jsonl",
    "api-direct.xben",
    overwrite=True,
    variant="twodelta",
    compression_level=9,
)

variant= is only used when creating BEN frames from assignments. XBEN-specific knobs (n_threads, compression_level, and xz_block_size) tune the LZMA2 stage.

encode_jsonl_to_ben(in_file, out_file, overwrite=False, variant='twodelta')

Encode a canonicalized JSONL ensemble into a BEN stream.

Expects one {"assignment": [...], "sample": n} object per line. BEN is the fast working format; encode further to XBEN with encode_ben_to_xben() for storage.

Parameters:
  • in_file (StrPath) – Path to the input .jsonl file (str or os.PathLike).

  • out_file (StrPath) – Path to write the .ben output (str or os.PathLike).

  • overwrite (bool, optional) – Replace out_file if it already exists. Default is False.

  • variant (Variant, optional) – BEN encoding variant: "standard", "mkv_chain", or "twodelta". Default is "twodelta".

Raises:
  • OSError – If out_file exists and overwrite is False, or the conversion fails.

  • ValueError – If variant is not a recognized variant name, or in_file and out_file are the same path.

encode_jsonl_to_xben(in_file, out_file, overwrite=False, variant='twodelta', n_threads=None, compression_level=None, xz_block_size=None)

Encode a canonicalized JSONL ensemble directly into an XBEN file.

A one-step shortcut for encode_jsonl_to_ben() followed by encode_ben_to_xben(). Expects one {"assignment": [...], "sample": n} object per line. Compression can be slow for large block-level ensembles.

Parameters:
  • in_file (StrPath) – Path to the input .jsonl file (str or os.PathLike).

  • out_file (StrPath) – Path to write the .xben output (str or os.PathLike).

  • overwrite (bool, optional) – Replace out_file if it already exists. Default is False.

  • variant (Variant, optional) – BEN encoding variant: "standard", "mkv_chain", or "twodelta". Default is "twodelta".

  • n_threads (int | None, optional) – Number of worker threads. Default is None which uses all available cores.

  • compression_level (int | None, optional) – LZMA2 level from 0 (fastest) to 9 (smallest). Default is None which uses level 9.

  • xz_block_size (int | None, optional) – Override the xz block size in bytes. Default is None which uses the xz default.

Raises:
  • OSError – If out_file exists and overwrite is False, or the conversion fails.

  • ValueError – If variant is not a recognized variant name, or in_file and out_file are the same path.

encode_ben_to_xben(in_file, out_file, overwrite=False, n_threads=None, compression_level=None, xz_block_size=None)

Compress a BEN stream into an XBEN file with LZMA2.

XBEN is the smallest format and is meant for storage and transfer. Compression can be slow for large block-level ensembles; relabel and reorder first (see relabel_bundle()) for the best ratios.

Parameters:
  • in_file (StrPath) – Path to the input .ben file (str or os.PathLike).

  • out_file (StrPath) – Path to write the .xben output (str or os.PathLike).

  • overwrite (bool, optional) – Replace out_file if it already exists. Default is False.

  • n_threads (int | None, optional) – Number of worker threads. Default is None which uses all available cores.

  • compression_level (int | None, optional) – LZMA2 level from 0 (fastest) to 9 (smallest). Default is None which uses level 9.

  • xz_block_size (int | None, optional) – Override the xz block size in bytes. Default is None which uses the xz default.

Raises:
  • OSError – If out_file exists and overwrite is False, or the conversion fails.

  • ValueError – If in_file and out_file are the same path.

Decoders

from binary_ensemble import decode_ben_to_jsonl, decode_xben_to_ben, decode_xben_to_jsonl

decode_ben_to_jsonl("chain.ben", "api-chain.jsonl", overwrite=True)
decode_xben_to_ben("chain.xben", "api-chain.ben", overwrite=True)
decode_xben_to_jsonl("chain.xben", "api-chain-from-xben.jsonl", overwrite=True)

Decoding auto-detects the stream variant from the file; you never pass variant= when reading.

decode_ben_to_jsonl(in_file, out_file, overwrite=False)

Decode a BEN stream back to canonicalized JSONL.

Produces one {"assignment": [...], "sample": n} object per line, with sample numbers starting at 1. This is the inverse of encode_jsonl_to_ben().

Parameters:
  • in_file (StrPath) – Path to the input .ben file (str or os.PathLike).

  • out_file (StrPath) – Path to write the .jsonl output (str or os.PathLike).

  • overwrite (bool, optional) – Replace out_file if it already exists. Default is False.

Raises:
  • OSError – If out_file exists and overwrite is False, or the conversion fails.

  • ValueError – If in_file and out_file are the same path.

decode_xben_to_jsonl(in_file, out_file, overwrite=False)

Decode an XBEN file back to canonicalized JSONL.

Produces one {"assignment": [...], "sample": n} object per line, with sample numbers starting at 1.

Parameters:
  • in_file (StrPath) – Path to the input .xben file (str or os.PathLike).

  • out_file (StrPath) – Path to write the .jsonl output (str or os.PathLike).

  • overwrite (bool, optional) – Replace out_file if it already exists. Default is False.

Raises:
  • OSError – If out_file exists and overwrite is False, or the conversion fails.

  • ValueError – If in_file and out_file are the same path.

decode_xben_to_ben(in_file, out_file, overwrite=False)

Decompress an XBEN file into a plain BEN stream.

XBEN decompression is fast; converting to BEN gives you a stream you can read, replay, and subsample. The encoding variant is preserved and detected automatically on the next read.

Parameters:
  • in_file (StrPath) – Path to the input .xben file (str or os.PathLike).

  • out_file (StrPath) – Path to write the .ben output (str or os.PathLike).

  • overwrite (bool, optional) – Replace out_file if it already exists. Default is False.

Raises:
  • OSError – If out_file exists and overwrite is False, or the conversion fails.

  • ValueError – If in_file and out_file are the same path.