batchie package

Submodules

batchie.common module

batchie.common.copy_array_with_control_treatments_set_to_zero(arr: numpy.ndarray, treatment_array: numpy.ndarray)
batchie.common.select_unique_zipped_numpy_arrays(arrs)

Returns a boolean array that selects unique combinations of several same length numpy arrays.

Parameters:

arrs – Arrays of the same length.

Returns:

Boolean array indicating unique combinations.

batchie.core module

class batchie.core.BayesianModel(experiment_space: ExperimentSpace)

Bases: ABC

This class represents a Bayesian model.

A Bayesian model has internal state. Each batchie.core.BayesianModel should have a companion batchie.core.Theta and batchie.core.ThetaHolder class which represents the models internal state in a serializable way.

The internal state of the model can be set explicitly via batchie.core.BayesianModel.set_model_state(). or it can be advanced via batchie.core.BayesianModel.step().

A batchie.core.BayesianModel can have data added to it via batchie.core.BayesianModel.add_observations(). If data is present, the model should use that data somehow when BayesianModel#step is called. batchie.core.BayesianModel.n_obs() should report the number of datapoints that have been added to the model.

A batchie.core.BayesianModel can be used to predict the outcome of an Experiment via batchie.core.BayesianModel.predict().

A batchie.core.BayesianModel must report its variance via batchie.core.BayesianModel.variance().

add_observations(data: ScreenBase)

Add observations to the model.

Parameters:

data – The data to add.

abstract n_obs() int

Return the number of observations that have been added to the model.

Returns:

Integer number of observations

abstract reset_model()

Reset the internal state of the model to its initial state.

abstract property rng: numpy.random.Generator

Return the PRNG for this model instance.

Returns:

The PRNG for this model instance.

abstract set_rng(rng: numpy.random.Generator)

Set the PRNG for this model instance.

Parameters:

rng – The PRNG to use.

class batchie.core.DistanceMatrix

Bases: ABC

abstract add_value(i, j, value)

Add a value to the distance matrix.

Parameters:
  • i – The row index.

  • j – The column index.

  • value – The value to add.

abstract classmethod load(filename)

Load a distance matrix from a file.

Parameters:

filename – The filename to load from.

abstract save(filename)

Save the distance matrix to a file.

Parameters:

filename – The filename to save to.

abstract to_dense()

Return a dense representation of the distance matrix.

Returns:

A dense representation of the distance matrix.

class batchie.core.DistanceMetric

Bases: object

This class represents a symmetric distance metric between two arrays of model predictions.

distance(a: numpy.ndarray, b: numpy.ndarray) float

Calculate the distance between two arrays of model predictions.

Parameters:
  • a – The first array of model predictions.

  • b – The second array of model predictions.

Returns:

The distance between the two arrays.

class batchie.core.InitialRetrospectivePlateGenerator

Bases: ABC

When running a retrospective active learning simulation, results are sensitive to the initial plate which is revealed. For this reason users might want to implement a special routine for revealing the initial plate separate from the subsequent plates.

generate_and_unmask_initial_plate(screen: Screen, rng: numpy.random.BitGenerator) Screen

Generate and unmask the initial plate.

Parameters:
Returns:

The same batchie.data.Screen with the initial plate observed, and all other plates

masked.

class batchie.core.MCMCModel

Bases: object

This class subclasses BayesianModel and implements batchie.core.MCMCModel.step()

abstract get_model_state() Theta

Get the internal state of the model.

abstract step()

Advance the internal state of the model by one step.

In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.

class batchie.core.Metric(model: BayesianModel)

Bases: object

evaluate(sample: Theta) float

Evaluate the metric on a single parameter set.

Parameters:

sample – The parameter set to evaluate.

Returns:

The value of the metric.

evaluate_all(results_holder: ThetaHolder) numpy.ndarray

Evaluate the metric on all parameter sets in the results_holder.

Parameters:

results_holder – The parameter sets to evaluate.

Returns:

An array of metric values.

class batchie.core.PlatePolicy

Bases: object

Given a batchie.data.Screen, which is a set of potential :py:class:`batchie.data.Plate`s, implementations of this class will determine which set of :py:class:`batchie.data.Plate`s is eligible for the next round.

filter_eligible_plates(batch_plates: list[Plate], unobserved_plates: list[Plate], rng: numpy.random.Generator) list[Plate]
class batchie.core.RetrospectivePlateGenerator

Bases: ABC

When running a retrospective active learning simulation, the user might want to reorganize the dataset into different plates then were originally run. This class will generate these plate groupings from the individual observations in the retrospective dataset.

generate_plates(screen: Screen, rng: numpy.random.BitGenerator) Screen

Generate plates from the remaining unobserved experiments in the input screen.

Parameters:
class batchie.core.RetrospectivePlateSmoother

Bases: ABC

After plates have been generated for a retrospective simulation using a batchie.core.RetrospectivePlateGenerator, those plates may be of very uneven sizes, which is not desirable. Implementations of this class should aim to merge plates together and/or drop experiments until plate sizes are more even. We call this process “plate smoothing”.

smooth_plates(screen: Screen, rng: numpy.random.BitGenerator) Screen

Smooth the plates in the screen.

Parameters:
class batchie.core.Scorer

Bases: object

This class represents a scoring function for batchie.data.Plate instances.

The score should represent how desirable it is to observe the given plate, with a lower score being more desirable.

score(plates: dict[int, ScreenSubset], distance_matrix: DistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]
class batchie.core.ScoresHolder

Bases: ABC

This class represents a set of scores for a set of plates.

add_score(plate_id: int, score: float)

Add a score for a given plate.

Parameters:
  • plate_id – The plate id to add the score for.

  • score – The score to add.

get_score(plate_id: int) float

Get the score for a given plate.

Parameters:

plate_id – The plate id to get the score for.

Returns:

The score for the given plate.

plate_id_with_minimum_score(eligible_plate_ids: list[int] = None) int

Get the plate id with the minimum score.

Parameters:

eligible_plate_ids – The set of plates to consider.

Returns:

The plate id with the minimum score.

class batchie.core.SimulationTracker(plate_ids_selected: list[list[int]], losses: list[float], seed: int)

Bases: object

This class tracks the state of a retrospective active learning simulation. It will record the plates that were revealed at each step and the total loss of the predictor trained on the plates revealed up until that point.

classmethod load(fn)

Load this instance from a JSON file.

Parameters:

fn – The filename to load from.

save(fn)

Save this instance to a JSON file.

Parameters:

fn – The filename to save to.

class batchie.core.Theta

Bases: object

This class represents the set of parameters for a BayesianModel. Should be implemented by a dataclass or similarly serializable class.

equals(other)
abstract classmethod from_dicts(private_params: dict, shared_params: dict)

Instantiate batchie.core.Theta from dictionary

Returns:

a dictionary mapping class variables to arrays/numerical values.

abstract predict_conditional_mean(data: ScreenBase) numpy.ndarray

Predict the conditional mean of an batchie.data.ExperimentBase in modeling space.

Returns:

An array of means for each item in the Experiment.

abstract predict_conditional_variance(data: ScreenBase) numpy.ndarray

Predict the conditional variance of an batchie.data.ExperimentBase.

Returns:

An array of variances for each item in the Experiment.

abstract predict_viability(data: ScreenBase) numpy.ndarray

Predict the conditional mean of an batchie.data.ExperimentBase in viability space.

Returns:

An array of means for each item in the Experiment.

abstract private_parameters_dict() dict[str, numpy.ndarray]

The private parameters of a batchie.core.Theta.

Returns:

a dictionary mapping class variables to arrays/numerical values.

shared_parameters_dict() dict[str, numpy.ndarray]

The shared parameters of a batchie.core.Theta.

Returns:

a dictionary mapping class variables to arrays.

class batchie.core.ThetaHolder(n_thetas: int, *args, **kwargs)

Bases: ABC

This class represents a container for multiple parameter sets for a BayesianModel. This class provides methods to save these parameter sets to an H5 file.

add_theta(theta: Theta)

Add a new parameter set to the container.

Parameters:

theta – The parameter set to add.

combine(other)

Combine these parameters sets with another container of parameter sets.

Parameters:

other – Another ThetaHolder instance.

classmethod concat(instances: list)

Combine multiple instances of ThetaHolder into one.

Parameters:

instances – A list of ThetaHolder instances.

get_theta(step_index: int) Theta

Returns the parameter set at the given index.

Parameters:

step_index – The index of the parameter set to return.

property is_complete
Returns:

True if the container is full, False otherwise.

static load_h5(path: str)

Load a ThetaHolder from an H5 file.

Parameters:

path – The path to the H5 file.

property n_thetas
Returns:

The number of parameter sets in the container.

save_h5(fn: str)

Save the parameter sets to an H5 file.

Parameters:

fn – The filename to save to.

class batchie.core.VIModel

Bases: object

This class subclasses BayesianModel and implements batchie.core.VIModel.sample()

abstract sample(num_samples: int) list[Theta]

Returns a list of Theta samples. Length of the list should be num_samples.

batchie.data module

class batchie.data.ExperimentSpace(treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None, sample_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None, control_treatment_name: str = '')

Bases: object

This class represents the universe of possible experimental conditions.

Models can use this object to define the size of their embeddings etc. without having to look at the full dataset.

doses_for_treatment(treatment_name: str) numpy.ndarray
classmethod from_screen(screen: Screen)
classmethod load_h5(path: str)
property n_unique_doses
property n_unique_samples
property n_unique_treatment_types
property n_unique_treatments
sample_id_from_sample_name(sample_name: str)
sample_name_from_sample_id(sample_id: int)
save_h5(path: str)
treatment_ids_from_treatment_name(treatment_name: str)
class batchie.data.Plate(screen: Screen, selection_vector: numpy.ndarray)

Bases: ScreenSubset

A subset of an batchie.data.Screen defined by a boolean selection vector

This class is not meant to be instantiated directly, but rather is returned by the batchie.data.Screen.get_plate method.

The difference between a batchie.data.Plate and an batchie.data.ScreenSubset is that a batchie.data.Plate is guaranteed to contain only one unique plate id.

merge(other)

Merge this plate with another plate, mutate the parent batchie.data.Screen in place.

Parameters:

otherbatchie.data.Plate

property plate_id

Return the plate id of this plate.

Returns:

int, plate id

property plate_name

Return the original plate name of this plate.

Returns:

str, plate name

class batchie.data.Screen(treatment_names: numpy.ndarray, treatment_doses: numpy.ndarray, sample_names: numpy.ndarray, plate_names: numpy.ndarray, observations: numpy.ndarray | None = None, observation_mask: numpy.ndarray | None = None, control_treatment_name='', treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None = None, sample_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None = None)

Bases: ScreenBase

The principal data structure in batchie.

An batchie.data.Screen is a collection of experiments. Some of the experiments may be observed and some may not be observed. Anything not enumerated as an experimental condition in this top level class will be “invisible” to batchie.

An batchie.data.Screen can be subset into batchie.data.Plate`s or :py:class:`batchie.data.ScreenSubset of multiple plates. batchie.data.Screen is the only data class that can be subdivided.

combine(other)

Union this screen with another screen.

Warning: treatment, sample, and plate ids are not guaranteed to be the same in the resulting new screen instance.

Parameters:

otherbatchie.data.Screen

Returns:

Unioned batchie.data.Screen

classmethod concat(screens: list[Screen])

Concatenate a list of batchie.data.Screen`s into a single :py:class:`batchie.data.Screen.

Parameters:

screens – list of batchie.data.Screen

Returns:

Unioned batchie.data.Screen

get_plate(plate_id: int) Plate

Return a batchie.data.Plate defined by a plate id.

Parameters:

plate_id – int, plate id

Returns:

A batchie.data.Plate

static load_h5(path)

Load screen from h5 archive.

Parameters:

path – str, path to h5 archive

property observation_mask

Return the array of observation masks in the screen. If the array is true, it means the condition is observed, if false it is unobserved.

Returns:

1d array of observation masks

property observations

Return the array of observations in the screen.

We do not use any NaN values in our arrays, the observation value for a condition set where batchie.data.Screen.observation_mask is False is undefined. Its up to the user to decide how to handle this.

Returns:

1d array of observations

property plate_ids

Return the array of plate ids in the screen.

Plate ids are always 0 indexed integers from 0 to batchie.data.ScreenBase.n_unique_plates - 1 with no gaps.

Returns:

1d array of plate ids

property plate_mapping: Tuple[numpy.ndarray, numpy.ndarray]
Returns:

a tuple of two 1d arrays that map plate name to id.

property plates

Return a list of all :py:class:`batchie.data.Plate`s in the screen.

Returns:

list of :py:class:`batchie.data.Plate`s

property sample_ids

Return the array of sample ids in the screen.

Sample ids are always 0 indexed integers from 0 to batchie.data.ScreenBase.n_unique_samples - 1 with no gaps.

Returns:

1d array of sample ids

property sample_mapping: Tuple[numpy.ndarray, numpy.ndarray]
Returns:

a tuple of two 1d arrays that map sample name to id.

property sample_names

Return the array of sample names (provided string names)

Returns:

1d array of sample names

save_h5(fn)

Save screen to h5 archive.

Parameters:

fn – str, path to h5 archive

set_observed(selection_mask: numpy.ndarray, observations: numpy.ndarray)
property single_treatment_effects: numpy.ndarray | None

Return the array of single treatment effects in the screen.

Returns:

2d array of single treatment effects

subset(selection_vector: numpy.ndarray) ScreenSubset

Return a batchie.data.ScreenSubset defined by a boolean selection vector.

Parameters:

selection_vector – 1d array of bools

Returns:

batchie.data.ScreenSubset

subset_observed() ScreenSubset | None

Return a batchie.data.ScreenSubset containing all conditions that are observed. Returns none if all conditions are unobserved.

Returns:

batchie.data.ScreenSubset

subset_unobserved() ScreenSubset | None

Return a batchie.data.ScreenSubset containing all conditions that are not observed. Returns none if batchie.data.Screen.is_observed is True.

Returns:

batchie.data.ScreenSubset

property treatment_doses

Return the array of treatment doses (floating point drug concentrations)

Returns:

N-dimension array of treatment doses

property treatment_ids

Return the array of treatment ids in the screen.

Treatment ids are always 0 indexed integers from 0 to batchie.data.ScreenBase.n_unique_treatments - 1 with no gaps.

Returns:

2d array of treatment ids

property treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
Returns:

a tuple of three 1d arrays that map tuples of (name, dose) to id.

property treatment_names

Return the array of treatment names (provided drug names)

Returns:

N-dimension array of treatment names

class batchie.data.ScreenBase

Bases: ABC

Base class for the principal data structure in batchie.

An batchie.data.Screen is a collection of experimental conditions, and optionally observations for those of those conditions. The conditions are defined by a set of treatment names and doses, and a set of sample names. Observations are scalar floating point numbers, with one scalar per condition.

batchie.data.Screen class also defines the concept of a plate, which is a grouping of experimental conditions. The terminology plate comes from the world of high throughput biological screening, where plastic plates with 96, 384, or 1536 individual wells are used to hold distinct biochemical reactions. In batchie, this concept is abstracted to the concept of a plate being the discrete unit of experimental conditions that can be observed at one time. We also abstract away the concept of the plate having to be a fixed size each time.

abstract combine(other)
property is_observed: bool

Return True if all observations are available, False otherwise

Returns:

bool

property n_plates

Return the number of plates in the screen.

Returns:

int, number of plates

property n_unique_samples

Return the number of unique samples in the screen.

Returns:

int, number of unique samples

property n_unique_treatments

Return the number of unique treatments in the screen.

Returns:

int, number of unique treatments

abstract property observation_mask
abstract property observations
abstract property plate_ids
abstract property plate_mapping
abstract property sample_ids
abstract property sample_mapping
abstract property sample_names
property sample_space_size

Return the size of the universe of possible samples.

Returns:

int

abstract property single_treatment_effects: numpy.ndarray | None
property size

Return the number of experimental conditions contained in the experiment.

Returns:

int, number of experimental conditions

property treatment_arity

Return the number of treatments per experiment.

Returns:

int, number of treatments per experiment

abstract property treatment_doses
abstract property treatment_ids
abstract property treatment_mapping
abstract property treatment_names
property treatment_space_size

Return the size of the universe of possible treatments.

Returns:

int

property unique_plate_ids

Return the unique plate ids in the screen.

Returns:

1d array of unique plate ids

property unique_sample_ids

Return the unique sample ids in the screen.

Returns:

1d array of unique sample ids

property unique_treatments

Return the unique treatments in the screen (excludes “control” treatments).

Returns:

2d array of unique treatments

class batchie.data.ScreenSubset(screen: Screen, selection_vector: numpy.ndarray)

Bases: ScreenBase

A subset of an batchie.data.Screen defined by a boolean selection vector.

This class is not meant to be instantiated directly, but rather is returned by the batchie.data.Screen.subset() method.

combine(other)

Union this subset with another subset of the same screen.

Parameters:

otherbatchie.data.ScreenSubset

Returns:

Unioned batchie.data.ScreenSubset

classmethod concat(screen_subsets: list)

Concatenate a list of batchie.data.ScreenSubset`s into a single :py:class:`batchie.data.ScreenSubset.

Parameters:

screen_subsets – list of batchie.data.ScreenSubset

Returns:

Unioned batchie.data.ScreenSubset

property control_treatment_name
invert()

Return the inverse of this subset, i.e. the subset of the screen that is not contained in this subset.

Returns:

batchie.data.ScreenSubset

property observation_mask
property observations
property plate_ids
property plate_mapping
property sample_ids
property sample_mapping
property sample_names
property single_treatment_effects: numpy.ndarray | None
subset(selection_vector)

Return a new batchie.data.ScreenSubset defined by a boolean selection vector.

Parameters:

selection_vector – 1d array of bools

Returns:

batchie.data.ScreenSubset

to_screen()

Promote this subset to an batchie.data.Screen.

Returns:

batchie.data.Screen

property treatment_doses
property treatment_ids
property treatment_mapping
property treatment_names
batchie.data.create_single_treatment_effect_array(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray)

Create a n_observation x n_treatment array where each entry is the single treatment effect for the corresponding sample and treatment ids in the input arrays.

Parameters:
  • sample_ids – 1d array of sample ids

  • treatment_ids – 2d array of treatment ids

  • observation – 1d array of observations

batchie.data.create_single_treatment_effect_map(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray)

Create a map from (sample_id, treatment_id) to single observation (a scalar).

Parameters:
  • sample_ids – 1d array of sample ids

  • treatment_ids – 1d array of treatment ids

  • observation – 1d array of observations

batchie.data.encode_1d_array_to_0_indexed_ids(arr: numpy.ndarray, existing_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None = None)

Encode a 1d array of strings to 0-indexed integers.

Parameters:
  • arr – 1d array of strings

  • existing_mapping – Prior mapping

Returns:

integer array containing only values between 0 and n-1,

where n is the number of unique values in arr

batchie.data.encode_treatment_arrays_to_0_indexed_ids(treatment_name_arr: numpy.ndarray, treatment_dose_arr: numpy.ndarray, control_treatment_name: str = '', existing_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None = None)

Encode treatment names and doses (which are arrays of string) to 0-indexed integers, where the control treatment is always mapped to batchie.common.CONTROL_SENTINEL_VALUE

Parameters:
  • treatment_name_arr – array of treatment names

  • treatment_dose_arr – array of treatment doses

  • control_treatment_name – The string value of the control treatment

  • existing_mapping – Prior mapping

batchie.data.filter_dataset_to_treatments_that_appear_in_at_least_one_combo(screen: Screen) Screen

Utility function to filter down an batchie.data.Screen to only the treatments that appear in at least one combo.

Parameters:

screen – an batchie.data.Screen

Returns:

A filtered batchie.data.Screen

batchie.data.filter_dataset_to_unique_treatments(screen: Screen | ScreenSubset)

Ensure that the dataset only has one experiment per treatment and sample condition by arbitrarily dropping duplicates.

Parameters:

screen – an batchie.data.ScreenSubset

Returns:

A batchie.data.ScreenSubset with the same or smaller number of experiments compared to the input.

batchie.data.numpy_array_is_0_indexed_integers(arr: numpy.ndarray)

Test numpy array arr contains only integers between 0 and n-1 with no gaps, where n is the number of unique values in arr.

If the array contains batchie.common.CONTROL_SENTINEL_VALUE, then we test that the array contains only integers between 0 and n-2, and the sentinel value.

Parameters:

arr – numpy array

Returns:

bool

batchie.distance_calculation module

class batchie.distance_calculation.ChunkedDistanceMatrix(size, n_chunks=1, chunk_index=0, chunk_size=None)

Bases: DistanceMatrix

Class which can represent part or a whole pairwise distance matrix.

The distance matrix is stored in a sparse format, but can be converted to a dense format if all values are present.

Several partial ChunkedDistanceMatrix classes can be combined. This is useful for parallelization of the distance matrix computation.

add_value(i, j, value)

Add a value to the distance matrix.

Parameters:
  • i – The row index.

  • j – The column index.

  • value – The value to add.

combine(other)
classmethod concat(matrices: list)
is_complete()
classmethod load(filename)

Load a distance matrix from a file.

Parameters:

filename – The filename to load from.

save(filename)

Save the distance matrix to a file.

Parameters:

filename – The filename to save to.

to_dense()

Return a dense representation of the distance matrix.

Returns:

A dense representation of the distance matrix.

batchie.distance_calculation.calculate_pairwise_distance_matrix_on_predictions(thetas: ThetaHolder, distance_metric: DistanceMetric, data: Screen, chunk_index: int, n_chunks: int, progress: bool = False) ChunkedDistanceMatrix

Calculate the pairwise distance matrix between predictions in viability space.

For all pairs of thetas in the given ThetaHolder, predictions will be made on the unobserved conditions in the given Experiment and the distance between the predictions produced by the two theta values will be calculated and populated into a ChunkedDistanceMatrix instance.

If n_chunks > 1, then the distance matrix is split into n_chunks roughly equal chunks, and only the chunk with index chunk_index is calculated. This is useful for parallelization.

Parameters:
  • thetas – The set of model parameters to use for prediction

  • distance_metric – The distance metric to use

  • data – The data to predict

  • chunk_index – The index of the chunk to calculate

  • n_chunks – The number of chunks to split the distance matrix into

  • progress – Whether to show a progress bar

Returns:

A ChunkedDistanceMatrix containing the pairwise distances

batchie.distance_calculation.consume(iterator, n)

Advance the iterator n-steps ahead. If n is none, consume entirely.

Parameters:
  • iterator – The iterator to consume

  • n – The number of steps to advance the iterator

batchie.distance_calculation.get_lower_triangular_indices_chunk(n: int, chunk_index: int, n_chunks: int)

Assuming we want to split the number of lower triangular indices of a square matrix with dimension n into roughly equal chunks, return the indices for the chunk with index chunk_index

Parameters:
  • n – The dimension of the square matrix

  • chunk_index – The index of the chunk to return

  • n_chunks – The number of chunks to split the indices into

Returns:

A list of indices

batchie.distance_calculation.get_number_of_lower_triangular_indices(n: int)

Get the number of lower triangular indices of a square matrix with dimension n

Parameters:

n – The dimension of the square matrix

Returns:

The number of lower triangular indices

batchie.distance_calculation.lower_triangular_indices(n: int)

Iterate all the lower triangular indices of a square matrix with dimension n

Parameters:

n – The dimension of the square matrix

Returns:

A generator which yields the indices

batchie.fast_mvn module

Methods for sampling from multivariate normal distributions.

batchie.fast_mvn.sample_mvn_from_precision(Q, mu=None, mu_part=None, chol_factor=False, rng=None)

Fast sampling from a multivariate normal with precision parameterization.

Supports sparse arrays.

Parameters:
  • Q – The precision matrix

  • mu – If provided, assumes the model is N(mu, Q^-1)

  • mu_part – If provided, assumes the model is N(Q^-1 mu_part, Q^-1)

  • chol_factor – If true, assumes Q is a (lower triangular) Cholesky

decomposition of the precision matrix :param rng: :return:

batchie.introspection module

batchie.introspection.create_instance(package_name: str, class_name: str, base_class: type, kwargs: dict)

Create an instance of a class from a package by name.

Parameters:
  • package_name – The name of the package to search.

  • class_name – The name of the class to search for.

  • base_class – The base class that the class should inherit from.

  • kwargs – Keyword arguments to pass to the class constructor.

batchie.introspection.get_class(package_name: str, class_name: str, base_class: type) type

Get a class from a package by name.

Parameters:
  • package_name – The name of the package to search.

  • class_name – The name of the class to search for.

  • base_class – The base class that the class should inherit from.

batchie.introspection.get_required_init_args_with_annotations(cls) Dict[str, Any]

Get a dictionary of required __init__ arguments and their type annotations for a given class.

Parameters:

cls – The class to inspect.

Returns:

A dictionary with argument names as keys and their type annotations as values.

batchie.log_config module

batchie.log_config.add_logging_args(parser)
batchie.log_config.configure_logging(args)

Configure logging based on the given arguments.

Parameters:

args – Parsed command line arguments.

batchie.retrospective module

class batchie.retrospective.BatchieEnsemblePlateSmoother(min_size: int, n_iterations: int, min_n_cell_line_plates: int)

Bases: RetrospectivePlateSmoother

Apply the following smoothers in sequence to the input batchie.data.Screen:

MergeMinPlateSmoother MergeTopBottomPlateSmoother OptimalSizeSmoother NPlatePerCellLineSmoother

class batchie.retrospective.FixedSizeSmoother(plate_size: int)

Bases: RetrospectivePlateSmoother

Filter all plates smaller than the given size and randomly truncate all plates larger than a fixed size to the given size.

class batchie.retrospective.MergeMinPlateSmoother(min_size: int)

Bases: RetrospectivePlateSmoother

Iteratively combine the smallest two plates for each sample until all plates are above a user specified size.

class batchie.retrospective.MergeTopBottomPlateSmoother(n_iterations: int)

Bases: RetrospectivePlateSmoother

Iteratively combine the largest and smallest plates for each sample. Runs for a user specified number of iterations.

class batchie.retrospective.NPlatePerCellLineSmoother(min_n_cell_line_plates: int)

Bases: RetrospectivePlateSmoother

Remove all experiments involving cell lines which have less than the user specified min_n_cell_line_plates

class batchie.retrospective.OptimalSizeSmoother

Bases: RetrospectivePlateSmoother

The cost function for any particular plate size is the sum of two terms, the first term is the number of experiments you have to completely throw out because they are in plates below the threshold, the second term is the number of experiments that need to be trimmed out of plates that are over the threshold. This smoother optimizes this cost function and then drops all plates smaller than the optimal size and sub-samples all plates larger than the optimal size until all plates are the same size.

class batchie.retrospective.PairwisePlateGenerator(subset_size: int, anchor_size: int)

Bases: RetrospectivePlateGenerator

class batchie.retrospective.PlatePermutationPlateGenerator(force_include_plate_names: list[str] | None = None)

Bases: RetrospectivePlateGenerator

This generator will create new plates by permuting the plate labels.

Plates can be excluded from permutation with the force_include_plate_names argument

class batchie.retrospective.SampleSegregatingPermutationPlateGenerator(max_plate_size: int)

Bases: RetrospectivePlateGenerator

This generator will generate plates that only contain experiments for a single sample. If there are more than max_plate_size experiments for a single sample then the experiments will be split across multiple equal sized plates.

class batchie.retrospective.SparseCoverPlateGenerator(reveal_single_treatment_experiments: bool)

Bases: InitialRetrospectivePlateGenerator

batchie.retrospective.calculate_mse(observed_screen: Screen, thetas: ThetaHolder) float

Calculate the mean squared error between the masked observations and the unmasked observations

Parameters:
  • observed_screen – A Screen that is fully observed

  • thetas – The set of model parameters to use for prediction

Returns:

The average mean squared error between predicted and observed values

batchie.retrospective.create_plate_balanced_holdout_set_among_masked_plates(screen: ~batchie.data.Screen, fraction: float, rng: numpy.random.BitGenerator) -> (<class 'batchie.data.Screen'>, <class 'batchie.data.Screen'>)

Create a holdout set from a retrospective screen (where all data is observed but some plates are artificially masked) by sampling a fraction of each unobserved plate.

Parameters:
  • screen – The screen to create a holdout set for

  • fraction – The fraction of each unobserved plate to hold out

Returns:

A tuple of (training_screen, holdout_screen)

batchie.retrospective.create_random_holdout(screen: ~batchie.data.Screen, fraction: float, rng: numpy.random.BitGenerator) -> (<class 'batchie.data.Screen'>, <class 'batchie.data.Screen'>)

Create a random subset of a screen, of size fraction of the original screen.

Parameters:
  • screen – The screen to create a holdout set for

  • fraction – The fraction of the screen to hold out

Returns:

A tuple of (training_screen, holdout_screen)

batchie.retrospective.mask_screen(screen: Screen) Screen
batchie.retrospective.reveal_plates(screen: Screen, plate_ids: list[int]) Screen

Utility function to reveal observations in the masked screen from the observed screen.

Parameters:
  • screen – A batchie.data.Screen that is partially masked, but with real observations present in the internal observation array

  • plate_ids – The plate ids to reveal

batchie.retrospective.unmask_screen(screen: Screen) Screen

batchie.sampling module

batchie.sampling.sample(model, results: ThetaHolder, seed: int, n_chains: int = None, chain_index: int = None, n_burnin: int = None, thin: int = None, progress_bar=False) ThetaHolder

Sample from the model posterior using the given parameters.

Parameters:
  • model – The model which will be sampled from.

  • results – The object which will store the results

  • seed – The seed to use for the random number generator

  • n_chains – The number of parallel chains to run

  • chain_index – The index of the current chain

  • n_burnin – The number of burnin steps to run

  • thin – The thinning factor

  • progress_bar – Whether to display a progress bar

Returns:

a ThetaHolder containing the sampled parameters

batchie.synergy module

batchie.synergy.calculate_synergy(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray, strict: bool = False)

Calculate synergy for a given set of observations, sample ids, and treatment ids.

If single treatment observations for all of the treatments in a multi-treatment observation are not present, the observation is skipped. If strict is True, an error is raised instead.

batchie.distance.mse module

class batchie.distance.mse.MSEDistance(sigmoid: bool = True)

Bases: DistanceMetric

Mean squared error distance metric

distance(a: numpy.ndarray, b: numpy.ndarray)

Calculate the distance between two arrays of model predictions.

Parameters:
  • a – The first array of model predictions.

  • b – The second array of model predictions.

Returns:

The distance between the two arrays.

batchie.models.sparse_combo module

class batchie.models.sparse_combo.LegacySparseDrugComboImpl(n_dims: int, n_drugdoses: int, n_clines: int, intercept: bool = True, fake_intercept: bool = True, individual_eff: bool = True, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0, **kwargs)

Bases: object

Original implementation of Bayesian tensor factorization model for predicting combination drug response. Preserved here without changes to ensure reproducibility of results.

bliss(cline: numpy.ndarray, dd1: numpy.ndarray, dd2: numpy.ndarray)
encode_obs()
ess_pars()
get(attr, ix)
mcmc_step() None
n_obs()
predict(cline: numpy.ndarray, dd1: numpy.ndarray, dd2: numpy.ndarray)
predict_single_drug(cline: numpy.ndarray, dd1: numpy.ndarray)
reset_model()
class batchie.models.sparse_combo.SparseDrugCombo(experiment_space: ExperimentSpace, n_embedding_dimensions: int, fake_intercept: bool = True, individual_eff: bool = True, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0, rng: numpy.random.Generator | None = None, predict_interactions: bool = False, interaction_log_transform: bool = True, intercept: bool = True)

Bases: BayesianModel, MCMCModel

get_model_state() SparseDrugComboMCMCSample

Get the internal state of the model.

n_obs() int

Return the number of observations that have been added to the model.

Returns:

Integer number of observations

reset_model()

Reset the internal state of the model to its initial state.

property rng: numpy.random.Generator

Return the PRNG for this model instance.

Returns:

The PRNG for this model instance.

set_rng(rng: numpy.random.Generator)

Set the PRNG for this model instance.

Parameters:

rng – The PRNG to use.

step()

Advance the internal state of the model by one step.

In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.

class batchie.models.sparse_combo.SparseDrugComboMCMCSample(W: numpy.ndarray, W0: numpy.ndarray, V2: numpy.ndarray, V1: numpy.ndarray, V0: numpy.ndarray, alpha: float, precision: float)

Bases: Theta

A single sample from the MCMC chain for the sparse drug combo model

V0: numpy.ndarray
V1: numpy.ndarray
V2: numpy.ndarray
W: numpy.ndarray
W0: numpy.ndarray
alpha: float
classmethod from_dicts(private_params, shared_params)

Instantiate batchie.core.Theta from dictionary

Returns:

a dictionary mapping class variables to arrays/numerical values.

precision: float
predict_conditional_mean(data: ScreenBase) numpy.ndarray

Predict the conditional mean of an batchie.data.ExperimentBase in modeling space.

Returns:

An array of means for each item in the Experiment.

predict_conditional_variance(data: ScreenBase) numpy.ndarray

Predict the conditional variance of an batchie.data.ExperimentBase.

Returns:

An array of variances for each item in the Experiment.

predict_viability(data: ScreenBase) numpy.ndarray

Predict the conditional mean of an batchie.data.ExperimentBase in viability space.

Returns:

An array of means for each item in the Experiment.

private_parameters_dict() dict[str, numpy.ndarray]

The private parameters of a batchie.core.Theta.

Returns:

a dictionary mapping class variables to arrays/numerical values.

batchie.models.sparse_combo.interactions_to_logits(interaction: numpy.ndarray, single_effects: numpy.ndarray, log_transform: bool)
batchie.models.sparse_combo.predict(mcmc_sample: SparseDrugComboMCMCSample, data: ScreenBase, viability: bool)
batchie.models.sparse_combo.predict_single_drug(mcmc_sample: SparseDrugComboMCMCSample, data: ScreenBase, viability: bool)

batchie.models.sparse_combo_interaction module

class batchie.models.sparse_combo_interaction.LegacySparseDrugComboInteractionImpl(n_dims: int, n_drugdoses: int, n_clines: int, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0)

Bases: object

This is the original implementation of the sparse drug combo interaction model. Preserved here without changes to ensure reproducibility of results.

encode_obs()
mcmc_step() None
n_obs()
reset_model()
class batchie.models.sparse_combo_interaction.SparseDrugComboInteraction(experiment_space: ExperimentSpace, n_embedding_dimensions: int, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0)

Bases: BayesianModel, MCMCModel

get_model_state() SparseDrugComboInteractionMCMCSample

Get the internal state of the model.

n_obs() int

Return the number of observations that have been added to the model.

Returns:

Integer number of observations

reset_model()

Reset the internal state of the model to its initial state.

property rng: numpy.random.Generator

Return the PRNG for this model instance.

Returns:

The PRNG for this model instance.

set_rng(rng: numpy.random.Generator)

Set the PRNG for this model instance.

Parameters:

rng – The PRNG to use.

step()

Advance the internal state of the model by one step.

In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.

class batchie.models.sparse_combo_interaction.SparseDrugComboInteractionMCMCSample(W: numpy.ndarray, V2: numpy.ndarray, precision: float, single_effect_lookup: dict)

Bases: Theta

A single sample from the MCMC chain for the sparse drug combo model

V2: numpy.ndarray
W: numpy.ndarray
classmethod from_dicts(private_params: dict, shared_params: dict)

Instantiate batchie.core.Theta from dictionary

Returns:

a dictionary mapping class variables to arrays/numerical values.

precision: float
predict_conditional_mean(data: ScreenBase) numpy.ndarray

Predict the conditional mean of an batchie.data.ExperimentBase in modeling space.

Returns:

An array of means for each item in the Experiment.

predict_conditional_variance(data: ScreenBase) numpy.ndarray

Predict the conditional variance of an batchie.data.ExperimentBase.

Returns:

An array of variances for each item in the Experiment.

predict_viability(data: ScreenBase) numpy.ndarray

Predict the conditional mean of an batchie.data.ExperimentBase in viability space.

Returns:

An array of means for each item in the Experiment.

private_parameters_dict() dict[str, numpy.ndarray]

The private parameters of a batchie.core.Theta.

Returns:

a dictionary mapping class variables to arrays/numerical values.

shared_parameters_dict() dict[str, numpy.ndarray]

The shared parameters of a batchie.core.Theta.

Returns:

a dictionary mapping class variables to arrays.

single_effect_lookup: dict

batchie.policies.k_per_sample module

class batchie.policies.k_per_sample.KPerSamplePlatePolicy(k: int)

Bases: PlatePolicy

filter_eligible_plates(batch_plates: list[Plate], unobserved_plates: list[Plate], rng: numpy.random.Generator) list[Plate]

batchie.scoring.gaussian_dbal module

class batchie.scoring.gaussian_dbal.GaussianDBALScorer(max_chunk=50, max_triples=5000, **kwargs)

Bases: Scorer

score(plates: dict[int, ScreenSubset], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]
batchie.scoring.gaussian_dbal.dbal_fast_gauss_scoring_vectorized(predictions: numpy.ndarray, variances: numpy.ndarray, distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)

Compute the Monte Carlo approximation of the DBAL ideal score \(\widehat{s}_n(P)\) in a vectorized way for each of the given plates.

\[\widehat{s}_n(P) = \frac{1}{{m \choose 3}} \sum_{i < j < k} d(\theta_i, \theta_j) L_{\theta_i}(\theta_j, \theta_k ; P ) e^{2H_{\theta_i}(P)}\]

Parameters:
  • predictions – model predictions over all plates of shape (n_plates, n_thetas, n_experiments)

  • variances – an array of variances for model predictions over all plates, of size (n_plate, n_thetas, max_n_experiments). For plates smaller than the maximum size, the variances should be padded with NaNs up to the maximum size.

  • distance_matrix – a square array of shape (n_thetas, n_thetas) of distances between model parameterizations

  • rng – PRNG

  • max_combos – the maximum number of theta triplets to sample

  • distance_factor – a multiplicative factor for the distance matrix

Returns:

an array of shape (n_plates,) of approximated scores for each plate in per_plate_predictions

batchie.scoring.gaussian_dbal.dbal_fast_gaussian_scoring_heteroscedastic(per_plate_predictions: list[numpy.ndarray], variances: list[numpy.ndarray], distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)
batchie.scoring.gaussian_dbal.dbal_fast_gaussian_scoring_homoscedastic(per_plate_predictions: list[numpy.ndarray], variances: numpy.ndarray, distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)
Parameters:
  • per_plate_predictions – Ragged array of model predictions of length n_plates, each list element is an array of shape (n_thetas, n_plate_experiments)

  • variances – an array of variances for model predictions over all plates, of size (n_plate, n_thetas).

  • distance_matrix – a square array of shape (n_thetas, n_thetas) of distances between model parameterizations

  • rng – PRNG

  • max_combos – the maximum number of theta triplets to sample

  • distance_factor – a multiplicative factor for the distance matrix

Returns:

an array of shape (n_plates,) of approximated scores for each plate in per_plate_predictions

batchie.scoring.gaussian_dbal.generate_combination_at_sorted_index(index, n, k)

Generate all range(n) choose k combinations.

Represent each combination as a descending sorted tuple.

Sort all the tuples is ascending order, and return the tuple that would be found at index.

Do this without materializing the actual list of combinations.

Parameters:
  • index – The index of the combination to return

  • n – The number of items to choose from

  • k – The number of items to choose

Returns:

A tuple of length k representing the combination

batchie.scoring.gaussian_dbal.get_combination_at_sorted_index(index, n, k)
batchie.scoring.gaussian_dbal.pad_ragged_arrays_to_dense_array(arrays: list[numpy.ndarray], pad_value: float = 0.0)

Given a list of arrays, each with N dimensions, each of which have different sizes, return a dense array of N + 1 dimensions, of size (len(array), maximum_of_dimension_0, … maximum_of_dimension_N) where all the arrays are padded to the maximum size. Padding value defaults to 0.0.

Parameters:
  • arrays – A list of arrays

  • pad_value – A floating point number (default is 0)

Returns:

A dense array of the arrays

batchie.scoring.main module

class batchie.scoring.main.ChunkedScoresHolder(size: int)

Bases: ScoresHolder

add_score(plate_id: int, score: float)

Add a score for a given plate.

Parameters:
  • plate_id – The plate id to add the score for.

  • score – The score to add.

combine(other: ScoresHolder)
classmethod concat(scores_list: list[ScoresHolder])
get_score(plate_id: int) float

Get the score for a given plate.

Parameters:

plate_id – The plate id to get the score for.

Returns:

The score for the given plate.

classmethod load_h5(fn)
plate_id_with_minimum_score(eligible_plate_ids: list[int] = None) int

Get the plate id with the minimum score.

Parameters:

eligible_plate_ids – The set of plates to consider.

Returns:

The plate id with the minimum score.

save_h5(fn)
batchie.scoring.main.score_chunk(scorer: Scorer, thetas: ThetaHolder, screen: Screen, distance_matrix: ChunkedDistanceMatrix, rng: numpy.random.Generator | None = None, progress_bar: bool = False, n_chunks: int = 1, chunk_index: int = 0, batch_plate_ids: list[int] | None = None) ChunkedScoresHolder

Score a subset of all unobserved plates in a screen.

Parameters:
  • scorer – The scorer to use for scoring

  • thetas – The samples to use for scoring

  • screen – The screen to score

  • distance_matrix – The distance matrix to use for scoring

  • rng – PRNG to use for sampling

  • progress_bar – Whether to show a progress bar

  • n_chunks – The number of chunks to split the unobserved plates into

  • chunk_index – The index of the chunk to score

  • batch_plate_ids – A list of plate ids that have already been selected in the batch

Returns:

ChunkedScoresHolder containing the scores for each plate in the current chunk

batchie.scoring.main.select_next_plate(scores: ScoresHolder, screen: Screen, policy: PlatePolicy | None, batch_plate_ids: list[int] | None = None, rng: numpy.random.Generator | None = None) Plate | None

Select the next batchie.data.Plate to observe

Parameters:
  • scores – The scores for each plate

  • screen – The screen which defines the set of plates to choose from

  • policy – The policy to use for plate selection

  • batch_plate_ids – The plates currently selected in the batch

  • rng – PRNG to use for sampling

Returns:

A list of plates to observe

batchie.scoring.size module

class batchie.scoring.size.SizeScorer

Bases: Scorer

A scorer that returns the number of conditions in the Plate as the score.

score(plates: dict[int, Plate], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]

batchie.scoring.rand module

class batchie.scoring.rand.RandomScorer

Bases: Scorer

A scorer that returns a random score for each plate, used for baseline comparison

score(plates: dict[int, Plate], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]