batchie package¶

Submodules¶

batchie.common module¶

batchie.common.copy_array_with_control_treatments_set_to_zero(arr: numpy.ndarray, treatment_array: numpy.ndarray)¶

batchie.common.select_unique_zipped_numpy_arrays(arrs)¶

Returns a boolean array that selects unique combinations of several same length numpy arrays.

Parameters:: arrs – Arrays of the same length.
Returns:: Boolean array indicating unique combinations.

batchie.core module¶

class batchie.core.BayesianModel(experiment_space: ExperimentSpace)¶

Bases: ABC

This class represents a Bayesian model.

A Bayesian model has internal state. Each batchie.core.BayesianModel should have a companion batchie.core.Theta and batchie.core.ThetaHolder class which represents the models internal state in a serializable way.

The internal state of the model can be set explicitly via batchie.core.BayesianModel.set_model_state(). or it can be advanced via batchie.core.BayesianModel.step().

A batchie.core.BayesianModel can have data added to it via batchie.core.BayesianModel.add_observations(). If data is present, the model should use that data somehow when BayesianModel#step is called. batchie.core.BayesianModel.n_obs() should report the number of datapoints that have been added to the model.

A batchie.core.BayesianModel can be used to predict the outcome of an Experiment via batchie.core.BayesianModel.predict().

A batchie.core.BayesianModel must report its variance via batchie.core.BayesianModel.variance().

add_observations(data: ScreenBase)¶

Add observations to the model.

Parameters:: data – The data to add.

abstract n_obs() → int¶

Return the number of observations that have been added to the model.

Returns:: Integer number of observations

abstract reset_model()¶: Reset the internal state of the model to its initial state.

abstract property rng: numpy.random.Generator¶

Return the PRNG for this model instance.

Returns:: The PRNG for this model instance.

abstract set_rng(rng: numpy.random.Generator)¶

Set the PRNG for this model instance.

Parameters:: rng – The PRNG to use.

class batchie.core.DistanceMatrix¶

Bases: ABC

abstract add_value(i, j, value)¶

Add a value to the distance matrix.

Parameters:

i – The row index.
j – The column index.
value – The value to add.

abstract classmethod load(filename)¶

Load a distance matrix from a file.

Parameters:: filename – The filename to load from.

abstract save(filename)¶

Save the distance matrix to a file.

Parameters:: filename – The filename to save to.

abstract to_dense()¶

Return a dense representation of the distance matrix.

Returns:: A dense representation of the distance matrix.

class batchie.core.DistanceMetric¶

Bases: object

This class represents a symmetric distance metric between two arrays of model predictions.

distance(a: numpy.ndarray, b: numpy.ndarray) → float¶

Calculate the distance between two arrays of model predictions.

Parameters:

a – The first array of model predictions.
b – The second array of model predictions.

Returns:

The distance between the two arrays.

class batchie.core.InitialRetrospectivePlateGenerator¶

Bases: ABC

When running a retrospective active learning simulation, results are sensitive to the initial plate which is revealed. For this reason users might want to implement a special routine for revealing the initial plate separate from the subsequent plates.

generate_and_unmask_initial_plate(screen: Screen, rng: numpy.random.BitGenerator) → Screen¶

Generate and unmask the initial plate.

Parameters:

screen – A fully observed batchie.data.Screen
rng – The PRNG to use.

Returns:

The same batchie.data.Screen with the initial plate observed, and all other plates

masked.

class batchie.core.MCMCModel¶

Bases: object

This class subclasses BayesianModel and implements batchie.core.MCMCModel.step()

abstract get_model_state() → Theta¶: Get the internal state of the model.

abstract step()¶

Advance the internal state of the model by one step.

In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.

class batchie.core.Metric(model: BayesianModel)¶

Bases: object

evaluate(sample: Theta) → float¶

Evaluate the metric on a single parameter set.

Parameters:: sample – The parameter set to evaluate.
Returns:: The value of the metric.

evaluate_all(results_holder: ThetaHolder) → numpy.ndarray¶

Evaluate the metric on all parameter sets in the results_holder.

Parameters:: results_holder – The parameter sets to evaluate.
Returns:: An array of metric values.

class batchie.core.PlatePolicy¶

Bases: object

Given a batchie.data.Screen, which is a set of potential :py:class:`batchie.data.Plate`s, implementations of this class will determine which set of :py:class:`batchie.data.Plate`s is eligible for the next round.

filter_eligible_plates(batch_plates: list[Plate], unobserved_plates: list[Plate], rng: numpy.random.Generator) → list[Plate]¶

class batchie.core.RetrospectivePlateGenerator¶

Bases: ABC

When running a retrospective active learning simulation, the user might want to reorganize the dataset into different plates then were originally run. This class will generate these plate groupings from the individual observations in the retrospective dataset.

generate_plates(screen: Screen, rng: numpy.random.BitGenerator) → Screen¶

Generate plates from the remaining unobserved experiments in the input screen.

Parameters:

screen – A partially observed batchie.data.Screen
rng – The PRNG to use.

class batchie.core.RetrospectivePlateSmoother¶

Bases: ABC

After plates have been generated for a retrospective simulation using a batchie.core.RetrospectivePlateGenerator, those plates may be of very uneven sizes, which is not desirable. Implementations of this class should aim to merge plates together and/or drop experiments until plate sizes are more even. We call this process “plate smoothing”.

smooth_plates(screen: Screen, rng: numpy.random.BitGenerator) → Screen¶

Smooth the plates in the screen.

Parameters:

screen – A partially observed batchie.data.Screen
rng – The PRNG to use.

class batchie.core.Scorer¶

Bases: object

This class represents a scoring function for batchie.data.Plate instances.

The score should represent how desirable it is to observe the given plate, with a lower score being more desirable.

score(plates: dict[int, ScreenSubset], distance_matrix: DistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) → dict[int, float]¶

class batchie.core.ScoresHolder¶

Bases: ABC

This class represents a set of scores for a set of plates.

add_score(plate_id: int, score: float)¶

Add a score for a given plate.

Parameters:

plate_id – The plate id to add the score for.
score – The score to add.

get_score(plate_id: int) → float¶

Get the score for a given plate.

Parameters:: plate_id – The plate id to get the score for.
Returns:: The score for the given plate.

plate_id_with_minimum_score(eligible_plate_ids: list[int] = None) → int¶

Get the plate id with the minimum score.

Parameters:: eligible_plate_ids – The set of plates to consider.
Returns:: The plate id with the minimum score.

class batchie.core.SimulationTracker(plate_ids_selected: list[list[int]], losses: list[float], seed: int)¶

Bases: object

This class tracks the state of a retrospective active learning simulation. It will record the plates that were revealed at each step and the total loss of the predictor trained on the plates revealed up until that point.

classmethod load(fn)¶

Load this instance from a JSON file.

Parameters:: fn – The filename to load from.

save(fn)¶

Save this instance to a JSON file.

Parameters:: fn – The filename to save to.

class batchie.core.Theta¶

Bases: object

This class represents the set of parameters for a BayesianModel. Should be implemented by a dataclass or similarly serializable class.

equals(other)¶

abstract classmethod from_dicts(private_params: dict, shared_params: dict)¶

Instantiate batchie.core.Theta from dictionary

Returns:: a dictionary mapping class variables to arrays/numerical values.

abstract predict_conditional_mean(data: ScreenBase) → numpy.ndarray¶

Predict the conditional mean of an batchie.data.ExperimentBase in modeling space.

Returns:: An array of means for each item in the Experiment.

abstract predict_conditional_variance(data: ScreenBase) → numpy.ndarray¶

Predict the conditional variance of an batchie.data.ExperimentBase.

Returns:: An array of variances for each item in the Experiment.

abstract predict_viability(data: ScreenBase) → numpy.ndarray¶

Predict the conditional mean of an batchie.data.ExperimentBase in viability space.

Returns:: An array of means for each item in the Experiment.

abstract private_parameters_dict() → dict[str, numpy.ndarray]¶

The private parameters of a batchie.core.Theta.

Returns:: a dictionary mapping class variables to arrays/numerical values.

shared_parameters_dict() → dict[str, numpy.ndarray]¶

The shared parameters of a batchie.core.Theta.

Returns:: a dictionary mapping class variables to arrays.

class batchie.core.ThetaHolder(n_thetas: int, *args, **kwargs)¶

Bases: ABC

This class represents a container for multiple parameter sets for a BayesianModel. This class provides methods to save these parameter sets to an H5 file.

add_theta(theta: Theta)¶

Add a new parameter set to the container.

Parameters:: theta – The parameter set to add.

combine(other)¶

Combine these parameters sets with another container of parameter sets.

Parameters:: other – Another ThetaHolder instance.

classmethod concat(instances: list)¶

Combine multiple instances of ThetaHolder into one.

Parameters:: instances – A list of ThetaHolder instances.

get_theta(step_index: int) → Theta¶

Returns the parameter set at the given index.

Parameters:: step_index – The index of the parameter set to return.

property is_complete¶

Returns:: True if the container is full, False otherwise.

static load_h5(path: str)¶

Load a ThetaHolder from an H5 file.

Parameters:: path – The path to the H5 file.

property n_thetas¶

Returns:: The number of parameter sets in the container.

save_h5(fn: str)¶

Save the parameter sets to an H5 file.

Parameters:: fn – The filename to save to.

class batchie.core.VIModel¶

Bases: object

This class subclasses BayesianModel and implements batchie.core.VIModel.sample()

abstract sample(num_samples: int) → list[Theta]¶: Returns a list of Theta samples. Length of the list should be num_samples.

batchie.data module¶

class batchie.data.ExperimentSpace(treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None, sample_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None, control_treatment_name: str = '')¶

Bases: object

This class represents the universe of possible experimental conditions.

Models can use this object to define the size of their embeddings etc. without having to look at the full dataset.

doses_for_treatment(treatment_name: str) → numpy.ndarray¶

classmethod from_screen(screen: Screen)¶

classmethod load_h5(path: str)¶

property n_unique_doses¶

property n_unique_samples¶

property n_unique_treatment_types¶

property n_unique_treatments¶

sample_id_from_sample_name(sample_name: str)¶

sample_name_from_sample_id(sample_id: int)¶

save_h5(path: str)¶

treatment_ids_from_treatment_name(treatment_name: str)¶

class batchie.data.Plate(screen: Screen, selection_vector: numpy.ndarray)¶

Bases: ScreenSubset

A subset of an batchie.data.Screen defined by a boolean selection vector

This class is not meant to be instantiated directly, but rather is returned by the batchie.data.Screen.get_plate method.

The difference between a batchie.data.Plate and an batchie.data.ScreenSubset is that a batchie.data.Plate is guaranteed to contain only one unique plate id.

merge(other)¶

Merge this plate with another plate, mutate the parent batchie.data.Screen in place.

Parameters:: other – batchie.data.Plate

property plate_id¶

Return the plate id of this plate.

Returns:: int, plate id

property plate_name¶

Return the original plate name of this plate.

Returns:: str, plate name

class batchie.data.Screen(treatment_names: numpy.ndarray, treatment_doses: numpy.ndarray, sample_names: numpy.ndarray, plate_names: numpy.ndarray, observations: numpy.ndarray | None = None, observation_mask: numpy.ndarray | None = None, control_treatment_name='', treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None = None, sample_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None = None)¶

Bases: ScreenBase

The principal data structure in batchie.

An batchie.data.Screen is a collection of experiments. Some of the experiments may be observed and some may not be observed. Anything not enumerated as an experimental condition in this top level class will be “invisible” to batchie.

An batchie.data.Screen can be subset into batchie.data.Plate`s or :py:class:`batchie.data.ScreenSubset of multiple plates. batchie.data.Screen is the only data class that can be subdivided.

combine(other)¶

Union this screen with another screen.

Warning: treatment, sample, and plate ids are not guaranteed to be the same in the resulting new screen instance.

Parameters:: other – batchie.data.Screen
Returns:: Unioned batchie.data.Screen

classmethod concat(screens: list[Screen])¶

Concatenate a list of batchie.data.Screen`s into a single :py:class:`batchie.data.Screen.

Parameters:: screens – list of batchie.data.Screen
Returns:: Unioned batchie.data.Screen

get_plate(plate_id: int) → Plate¶

Return a batchie.data.Plate defined by a plate id.

Parameters:: plate_id – int, plate id
Returns:: A batchie.data.Plate

static load_h5(path)¶

Load screen from h5 archive.

Parameters:: path – str, path to h5 archive

property observation_mask¶

Return the array of observation masks in the screen. If the array is true, it means the condition is observed, if false it is unobserved.

Returns:: 1d array of observation masks

property observations¶

Return the array of observations in the screen.

We do not use any NaN values in our arrays, the observation value for a condition set where batchie.data.Screen.observation_mask is False is undefined. Its up to the user to decide how to handle this.

Returns:: 1d array of observations

property plate_ids¶

Return the array of plate ids in the screen.

Plate ids are always 0 indexed integers from 0 to batchie.data.ScreenBase.n_unique_plates - 1 with no gaps.

Returns:: 1d array of plate ids

property plate_mapping: Tuple[numpy.ndarray, numpy.ndarray]¶

Returns:: a tuple of two 1d arrays that map plate name to id.

property plates¶

Return a list of all :py:class:`batchie.data.Plate`s in the screen.

Returns:: list of :py:class:`batchie.data.Plate`s

property sample_ids¶

Return the array of sample ids in the screen.

Sample ids are always 0 indexed integers from 0 to batchie.data.ScreenBase.n_unique_samples - 1 with no gaps.

Returns:: 1d array of sample ids

property sample_mapping: Tuple[numpy.ndarray, numpy.ndarray]¶

Returns:: a tuple of two 1d arrays that map sample name to id.

property sample_names¶

Return the array of sample names (provided string names)

Returns:: 1d array of sample names

save_h5(fn)¶

Save screen to h5 archive.

Parameters:: fn – str, path to h5 archive

set_observed(selection_mask: numpy.ndarray, observations: numpy.ndarray)¶

property single_treatment_effects: numpy.ndarray | None¶

Return the array of single treatment effects in the screen.

Returns:: 2d array of single treatment effects

subset(selection_vector: numpy.ndarray) → ScreenSubset¶

Return a batchie.data.ScreenSubset defined by a boolean selection vector.

Parameters:: selection_vector – 1d array of bools
Returns:: batchie.data.ScreenSubset

subset_observed() → ScreenSubset | None¶

Return a batchie.data.ScreenSubset containing all conditions that are observed. Returns none if all conditions are unobserved.

Returns:: batchie.data.ScreenSubset

subset_unobserved() → ScreenSubset | None¶

Return a batchie.data.ScreenSubset containing all conditions that are not observed. Returns none if batchie.data.Screen.is_observed is True.

Returns:: batchie.data.ScreenSubset

property treatment_doses¶

Return the array of treatment doses (floating point drug concentrations)

Returns:: N-dimension array of treatment doses

property treatment_ids¶

Return the array of treatment ids in the screen.

Treatment ids are always 0 indexed integers from 0 to batchie.data.ScreenBase.n_unique_treatments - 1 with no gaps.

Returns:: 2d array of treatment ids

property treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]¶

Returns:: a tuple of three 1d arrays that map tuples of (name, dose) to id.

property treatment_names¶

Return the array of treatment names (provided drug names)

Returns:: N-dimension array of treatment names

class batchie.data.ScreenBase¶

Bases: ABC

Base class for the principal data structure in batchie.

An batchie.data.Screen is a collection of experimental conditions, and optionally observations for those of those conditions. The conditions are defined by a set of treatment names and doses, and a set of sample names. Observations are scalar floating point numbers, with one scalar per condition.

batchie.data.Screen class also defines the concept of a plate, which is a grouping of experimental conditions. The terminology plate comes from the world of high throughput biological screening, where plastic plates with 96, 384, or 1536 individual wells are used to hold distinct biochemical reactions. In batchie, this concept is abstracted to the concept of a plate being the discrete unit of experimental conditions that can be observed at one time. We also abstract away the concept of the plate having to be a fixed size each time.

abstract combine(other)¶

property is_observed: bool¶

Return True if all observations are available, False otherwise

Returns:: bool

property n_plates¶

Return the number of plates in the screen.

Returns:: int, number of plates

property n_unique_samples¶

Return the number of unique samples in the screen.

Returns:: int, number of unique samples

property n_unique_treatments¶

Return the number of unique treatments in the screen.

Returns:: int, number of unique treatments

abstract property observation_mask¶

abstract property observations¶

abstract property plate_ids¶

abstract property plate_mapping¶

abstract property sample_ids¶

abstract property sample_mapping¶

abstract property sample_names¶

property sample_space_size¶

Return the size of the universe of possible samples.

Returns:: int

abstract property single_treatment_effects: numpy.ndarray | None¶

property size¶

Return the number of experimental conditions contained in the experiment.

Returns:: int, number of experimental conditions

property treatment_arity¶

Return the number of treatments per experiment.

Returns:: int, number of treatments per experiment

abstract property treatment_doses¶

abstract property treatment_ids¶

abstract property treatment_mapping¶

abstract property treatment_names¶

property treatment_space_size¶

Return the size of the universe of possible treatments.

Returns:: int

property unique_plate_ids¶

Return the unique plate ids in the screen.

Returns:: 1d array of unique plate ids

property unique_sample_ids¶

Return the unique sample ids in the screen.

Returns:: 1d array of unique sample ids

property unique_treatments¶

Return the unique treatments in the screen (excludes “control” treatments).

Returns:: 2d array of unique treatments

class batchie.data.ScreenSubset(screen: Screen, selection_vector: numpy.ndarray)¶

Bases: ScreenBase

A subset of an batchie.data.Screen defined by a boolean selection vector.

This class is not meant to be instantiated directly, but rather is returned by the batchie.data.Screen.subset() method.

combine(other)¶

Union this subset with another subset of the same screen.

Parameters:: other – batchie.data.ScreenSubset
Returns:: Unioned batchie.data.ScreenSubset

classmethod concat(screen_subsets: list)¶

Concatenate a list of batchie.data.ScreenSubset`s into a single :py:class:`batchie.data.ScreenSubset.

Parameters:: screen_subsets – list of batchie.data.ScreenSubset
Returns:: Unioned batchie.data.ScreenSubset

property control_treatment_name¶

invert()¶

Return the inverse of this subset, i.e. the subset of the screen that is not contained in this subset.

Returns:: batchie.data.ScreenSubset

property observation_mask¶

property observations¶

property plate_ids¶

property plate_mapping¶

property sample_ids¶

property sample_mapping¶

property sample_names¶

property single_treatment_effects: numpy.ndarray | None¶

subset(selection_vector)¶

Return a new batchie.data.ScreenSubset defined by a boolean selection vector.

Parameters:: selection_vector – 1d array of bools
Returns:: batchie.data.ScreenSubset

to_screen()¶

Promote this subset to an batchie.data.Screen.

Returns:: batchie.data.Screen

property treatment_doses¶

property treatment_ids¶

property treatment_mapping¶

property treatment_names¶

batchie.data.create_single_treatment_effect_array(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray)¶

Create a n_observation x n_treatment array where each entry is the single treatment effect for the corresponding sample and treatment ids in the input arrays.

Parameters:

sample_ids – 1d array of sample ids
treatment_ids – 2d array of treatment ids
observation – 1d array of observations

batchie.data.create_single_treatment_effect_map(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray)¶

Create a map from (sample_id, treatment_id) to single observation (a scalar).

Parameters:

sample_ids – 1d array of sample ids
treatment_ids – 1d array of treatment ids
observation – 1d array of observations

batchie.data.encode_1d_array_to_0_indexed_ids(arr: numpy.ndarray, existing_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None = None)¶

Encode a 1d array of strings to 0-indexed integers.

Parameters:

arr – 1d array of strings
existing_mapping – Prior mapping

Returns:

integer array containing only values between 0 and n-1,

where n is the number of unique values in arr

batchie.data.encode_treatment_arrays_to_0_indexed_ids(treatment_name_arr: numpy.ndarray, treatment_dose_arr: numpy.ndarray, control_treatment_name: str = '', existing_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None = None)¶

Encode treatment names and doses (which are arrays of string) to 0-indexed integers, where the control treatment is always mapped to batchie.common.CONTROL_SENTINEL_VALUE

Parameters:

treatment_name_arr – array of treatment names
treatment_dose_arr – array of treatment doses
control_treatment_name – The string value of the control treatment
existing_mapping – Prior mapping

batchie.data.filter_dataset_to_treatments_that_appear_in_at_least_one_combo(screen: Screen) → Screen¶

Utility function to filter down an batchie.data.Screen to only the treatments that appear in at least one combo.

Parameters:: screen – an batchie.data.Screen
Returns:: A filtered batchie.data.Screen

batchie.data.filter_dataset_to_unique_treatments(screen: Screen | ScreenSubset)¶

Ensure that the dataset only has one experiment per treatment and sample condition by arbitrarily dropping duplicates.

Parameters:: screen – an batchie.data.ScreenSubset
Returns:: A batchie.data.ScreenSubset with the same or smaller number of experiments compared to the input.

batchie.data.numpy_array_is_0_indexed_integers(arr: numpy.ndarray)¶

Test numpy array arr contains only integers between 0 and n-1 with no gaps, where n is the number of unique values in arr.

If the array contains batchie.common.CONTROL_SENTINEL_VALUE, then we test that the array contains only integers between 0 and n-2, and the sentinel value.

Parameters:: arr – numpy array
Returns:: bool

batchie.distance_calculation module¶

class batchie.distance_calculation.ChunkedDistanceMatrix(size, n_chunks=1, chunk_index=0, chunk_size=None)¶

Bases: DistanceMatrix

Class which can represent part or a whole pairwise distance matrix.

The distance matrix is stored in a sparse format, but can be converted to a dense format if all values are present.

Several partial ChunkedDistanceMatrix classes can be combined. This is useful for parallelization of the distance matrix computation.

add_value(i, j, value)¶

Add a value to the distance matrix.

Parameters:

i – The row index.
j – The column index.
value – The value to add.

combine(other)¶

classmethod concat(matrices: list)¶

is_complete()¶

classmethod load(filename)¶

Load a distance matrix from a file.

Parameters:: filename – The filename to load from.

save(filename)¶

Save the distance matrix to a file.

Parameters:: filename – The filename to save to.

to_dense()¶

Return a dense representation of the distance matrix.

Returns:: A dense representation of the distance matrix.

batchie.distance_calculation.calculate_pairwise_distance_matrix_on_predictions(thetas: ThetaHolder, distance_metric: DistanceMetric, data: Screen, chunk_index: int, n_chunks: int, progress: bool = False) → ChunkedDistanceMatrix¶

Calculate the pairwise distance matrix between predictions in viability space.

For all pairs of thetas in the given ThetaHolder, predictions will be made on the unobserved conditions in the given Experiment and the distance between the predictions produced by the two theta values will be calculated and populated into a ChunkedDistanceMatrix instance.

If n_chunks > 1, then the distance matrix is split into n_chunks roughly equal chunks, and only the chunk with index chunk_index is calculated. This is useful for parallelization.

Parameters:

thetas – The set of model parameters to use for prediction
distance_metric – The distance metric to use
data – The data to predict
chunk_index – The index of the chunk to calculate
n_chunks – The number of chunks to split the distance matrix into
progress – Whether to show a progress bar

Returns:

A ChunkedDistanceMatrix containing the pairwise distances

batchie.distance_calculation.consume(iterator, n)¶

Advance the iterator n-steps ahead. If n is none, consume entirely.

Parameters:

iterator – The iterator to consume
n – The number of steps to advance the iterator

batchie.distance_calculation.get_lower_triangular_indices_chunk(n: int, chunk_index: int, n_chunks: int)¶

Assuming we want to split the number of lower triangular indices of a square matrix with dimension n into roughly equal chunks, return the indices for the chunk with index chunk_index

Parameters:

n – The dimension of the square matrix
chunk_index – The index of the chunk to return
n_chunks – The number of chunks to split the indices into

Returns:

A list of indices

batchie.distance_calculation.get_number_of_lower_triangular_indices(n: int)¶

Get the number of lower triangular indices of a square matrix with dimension n

Parameters:: n – The dimension of the square matrix
Returns:: The number of lower triangular indices

batchie.distance_calculation.lower_triangular_indices(n: int)¶

Iterate all the lower triangular indices of a square matrix with dimension n

Parameters:: n – The dimension of the square matrix
Returns:: A generator which yields the indices

batchie.fast_mvn module¶

Methods for sampling from multivariate normal distributions.

batchie.fast_mvn.sample_mvn_from_precision(Q, mu=None, mu_part=None, chol_factor=False, rng=None)¶

Fast sampling from a multivariate normal with precision parameterization.

Supports sparse arrays.

Parameters:

Q – The precision matrix
mu – If provided, assumes the model is N(mu, Q^-1)
mu_part – If provided, assumes the model is N(Q^-1 mu_part, Q^-1)
chol_factor – If true, assumes Q is a (lower triangular) Cholesky

decomposition of the precision matrix :param rng: :return:

batchie.introspection module¶

batchie.introspection.create_instance(package_name: str, class_name: str, base_class: type, kwargs: dict)¶

Create an instance of a class from a package by name.

Parameters:

package_name – The name of the package to search.
class_name – The name of the class to search for.
base_class – The base class that the class should inherit from.
kwargs – Keyword arguments to pass to the class constructor.

batchie.introspection.get_class(package_name: str, class_name: str, base_class: type) → type¶

Get a class from a package by name.

Parameters:

package_name – The name of the package to search.
class_name – The name of the class to search for.
base_class – The base class that the class should inherit from.

batchie.introspection.get_required_init_args_with_annotations(cls) → Dict[str, Any]¶

Get a dictionary of required __init__ arguments and their type annotations for a given class.

Parameters:: cls – The class to inspect.
Returns:: A dictionary with argument names as keys and their type annotations as values.

batchie.log_config module¶

batchie.log_config.add_logging_args(parser)¶

batchie.log_config.configure_logging(args)¶

Configure logging based on the given arguments.

Parameters:: args – Parsed command line arguments.

batchie.retrospective module¶

class batchie.retrospective.BatchieEnsemblePlateSmoother(min_size: int, n_iterations: int, min_n_cell_line_plates: int)¶

Bases: RetrospectivePlateSmoother

Apply the following smoothers in sequence to the input batchie.data.Screen:

MergeMinPlateSmoother MergeTopBottomPlateSmoother OptimalSizeSmoother NPlatePerCellLineSmoother

class batchie.retrospective.FixedSizeSmoother(plate_size: int)¶

Bases: RetrospectivePlateSmoother

Filter all plates smaller than the given size and randomly truncate all plates larger than a fixed size to the given size.

class batchie.retrospective.MergeMinPlateSmoother(min_size: int)¶

Bases: RetrospectivePlateSmoother

Iteratively combine the smallest two plates for each sample until all plates are above a user specified size.

class batchie.retrospective.MergeTopBottomPlateSmoother(n_iterations: int)¶

Bases: RetrospectivePlateSmoother

Iteratively combine the largest and smallest plates for each sample. Runs for a user specified number of iterations.

class batchie.retrospective.NPlatePerCellLineSmoother(min_n_cell_line_plates: int)¶

Bases: RetrospectivePlateSmoother

Remove all experiments involving cell lines which have less than the user specified min_n_cell_line_plates

class batchie.retrospective.OptimalSizeSmoother¶

Bases: RetrospectivePlateSmoother

The cost function for any particular plate size is the sum of two terms, the first term is the number of experiments you have to completely throw out because they are in plates below the threshold, the second term is the number of experiments that need to be trimmed out of plates that are over the threshold. This smoother optimizes this cost function and then drops all plates smaller than the optimal size and sub-samples all plates larger than the optimal size until all plates are the same size.

class batchie.retrospective.PairwisePlateGenerator(subset_size: int, anchor_size: int)¶: Bases: RetrospectivePlateGenerator

class batchie.retrospective.PlatePermutationPlateGenerator(force_include_plate_names: list[str] | None = None)¶

Bases: RetrospectivePlateGenerator

This generator will create new plates by permuting the plate labels.

Plates can be excluded from permutation with the force_include_plate_names argument

class batchie.retrospective.SampleSegregatingPermutationPlateGenerator(max_plate_size: int)¶

Bases: RetrospectivePlateGenerator

This generator will generate plates that only contain experiments for a single sample. If there are more than max_plate_size experiments for a single sample then the experiments will be split across multiple equal sized plates.

class batchie.retrospective.SparseCoverPlateGenerator(reveal_single_treatment_experiments: bool)¶: Bases: InitialRetrospectivePlateGenerator

batchie.retrospective.calculate_mse(observed_screen: Screen, thetas: ThetaHolder) → float¶

Calculate the mean squared error between the masked observations and the unmasked observations

Parameters:

observed_screen – A Screen that is fully observed
thetas – The set of model parameters to use for prediction

Returns:

The average mean squared error between predicted and observed values

batchie.retrospective.create_plate_balanced_holdout_set_among_masked_plates(screen: ~batchie.data.Screen, fraction: float, rng: numpy.random.BitGenerator) -> (<class 'batchie.data.Screen'>, <class 'batchie.data.Screen'>)¶

Create a holdout set from a retrospective screen (where all data is observed but some plates are artificially masked) by sampling a fraction of each unobserved plate.

Parameters:

screen – The screen to create a holdout set for
fraction – The fraction of each unobserved plate to hold out

Returns:

A tuple of (training_screen, holdout_screen)

batchie.retrospective.create_random_holdout(screen: ~batchie.data.Screen, fraction: float, rng: numpy.random.BitGenerator) -> (<class 'batchie.data.Screen'>, <class 'batchie.data.Screen'>)¶

Create a random subset of a screen, of size fraction of the original screen.

Parameters:

screen – The screen to create a holdout set for
fraction – The fraction of the screen to hold out

Returns:

A tuple of (training_screen, holdout_screen)

batchie.retrospective.mask_screen(screen: Screen) → Screen¶

batchie.retrospective.reveal_plates(screen: Screen, plate_ids: list[int]) → Screen¶

Utility function to reveal observations in the masked screen from the observed screen.

Parameters:

screen – A batchie.data.Screen that is partially masked, but with real observations present in the internal observation array
plate_ids – The plate ids to reveal

batchie.retrospective.unmask_screen(screen: Screen) → Screen¶

batchie.sampling module¶

batchie.sampling.sample(model, results: ThetaHolder, seed: int, n_chains: int = None, chain_index: int = None, n_burnin: int = None, thin: int = None, progress_bar=False) → ThetaHolder¶

Sample from the model posterior using the given parameters.

Parameters:

model – The model which will be sampled from.
results – The object which will store the results
seed – The seed to use for the random number generator
n_chains – The number of parallel chains to run
chain_index – The index of the current chain
n_burnin – The number of burnin steps to run
thin – The thinning factor
progress_bar – Whether to display a progress bar

Returns:

a ThetaHolder containing the sampled parameters

batchie.synergy module¶

batchie.synergy.calculate_synergy(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray, strict: bool = False)¶

Calculate synergy for a given set of observations, sample ids, and treatment ids.

If single treatment observations for all of the treatments in a multi-treatment observation are not present, the observation is skipped. If strict is True, an error is raised instead.

batchie.distance.mse module¶

class batchie.distance.mse.MSEDistance(sigmoid: bool = True)¶

Bases: DistanceMetric

Mean squared error distance metric

distance(a: numpy.ndarray, b: numpy.ndarray)¶

Calculate the distance between two arrays of model predictions.

Parameters:

a – The first array of model predictions.
b – The second array of model predictions.

Returns:

The distance between the two arrays.

batchie.models.sparse_combo module¶

class batchie.models.sparse_combo.LegacySparseDrugComboImpl(n_dims: int, n_drugdoses: int, n_clines: int, intercept: bool = True, fake_intercept: bool = True, individual_eff: bool = True, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0, **kwargs)¶

Bases: object

Original implementation of Bayesian tensor factorization model for predicting combination drug response. Preserved here without changes to ensure reproducibility of results.

bliss(cline: numpy.ndarray, dd1: numpy.ndarray, dd2: numpy.ndarray)¶

encode_obs()¶

ess_pars()¶

get(attr, ix)¶

mcmc_step() → None¶

n_obs()¶

predict(cline: numpy.ndarray, dd1: numpy.ndarray, dd2: numpy.ndarray)¶

predict_single_drug(cline: numpy.ndarray, dd1: numpy.ndarray)¶

reset_model()¶

class batchie.models.sparse_combo.SparseDrugCombo(experiment_space: ExperimentSpace, n_embedding_dimensions: int, fake_intercept: bool = True, individual_eff: bool = True, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0, rng: numpy.random.Generator | None = None, predict_interactions: bool = False, interaction_log_transform: bool = True, intercept: bool = True)¶

Bases: BayesianModel, MCMCModel

get_model_state() → SparseDrugComboMCMCSample¶: Get the internal state of the model.

n_obs() → int¶

Return the number of observations that have been added to the model.

Returns:: Integer number of observations

reset_model()¶: Reset the internal state of the model to its initial state.

property rng: numpy.random.Generator¶

Return the PRNG for this model instance.

Returns:: The PRNG for this model instance.

set_rng(rng: numpy.random.Generator)¶

Set the PRNG for this model instance.

Parameters:: rng – The PRNG to use.

step()¶

Advance the internal state of the model by one step.

In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.

class batchie.models.sparse_combo.SparseDrugComboMCMCSample(W: numpy.ndarray, W0: numpy.ndarray, V2: numpy.ndarray, V1: numpy.ndarray, V0: numpy.ndarray, alpha: float, precision: float)¶

Bases: Theta

A single sample from the MCMC chain for the sparse drug combo model

V0: numpy.ndarray¶

V1: numpy.ndarray¶

V2: numpy.ndarray¶

W: numpy.ndarray¶

W0: numpy.ndarray¶

alpha: float¶

classmethod from_dicts(private_params, shared_params)¶

Instantiate batchie.core.Theta from dictionary

Returns:: a dictionary mapping class variables to arrays/numerical values.

precision: float¶

predict_conditional_mean(data: ScreenBase) → numpy.ndarray¶

Predict the conditional mean of an batchie.data.ExperimentBase in modeling space.

Returns:: An array of means for each item in the Experiment.

predict_conditional_variance(data: ScreenBase) → numpy.ndarray¶

Predict the conditional variance of an batchie.data.ExperimentBase.

Returns:: An array of variances for each item in the Experiment.

predict_viability(data: ScreenBase) → numpy.ndarray¶

Predict the conditional mean of an batchie.data.ExperimentBase in viability space.

Returns:: An array of means for each item in the Experiment.

private_parameters_dict() → dict[str, numpy.ndarray]¶

The private parameters of a batchie.core.Theta.

Returns:: a dictionary mapping class variables to arrays/numerical values.

batchie.models.sparse_combo.interactions_to_logits(interaction: numpy.ndarray, single_effects: numpy.ndarray, log_transform: bool)¶

batchie.models.sparse_combo.predict(mcmc_sample: SparseDrugComboMCMCSample, data: ScreenBase, viability: bool)¶

batchie.models.sparse_combo.predict_single_drug(mcmc_sample: SparseDrugComboMCMCSample, data: ScreenBase, viability: bool)¶

batchie.models.sparse_combo_interaction module¶

class batchie.models.sparse_combo_interaction.LegacySparseDrugComboInteractionImpl(n_dims: int, n_drugdoses: int, n_clines: int, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0)¶

Bases: object

This is the original implementation of the sparse drug combo interaction model. Preserved here without changes to ensure reproducibility of results.

encode_obs()¶

mcmc_step() → None¶

n_obs()¶

reset_model()¶

class batchie.models.sparse_combo_interaction.SparseDrugComboInteraction(experiment_space: ExperimentSpace, n_embedding_dimensions: int, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0)¶

Bases: BayesianModel, MCMCModel

get_model_state() → SparseDrugComboInteractionMCMCSample¶: Get the internal state of the model.

n_obs() → int¶

Return the number of observations that have been added to the model.

Returns:: Integer number of observations

reset_model()¶: Reset the internal state of the model to its initial state.

property rng: numpy.random.Generator¶

Return the PRNG for this model instance.

Returns:: The PRNG for this model instance.

set_rng(rng: numpy.random.Generator)¶

Set the PRNG for this model instance.

Parameters:: rng – The PRNG to use.

step()¶

Advance the internal state of the model by one step.

In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.

class batchie.models.sparse_combo_interaction.SparseDrugComboInteractionMCMCSample(W: numpy.ndarray, V2: numpy.ndarray, precision: float, single_effect_lookup: dict)¶

Bases: Theta

A single sample from the MCMC chain for the sparse drug combo model

V2: numpy.ndarray¶

W: numpy.ndarray¶

classmethod from_dicts(private_params: dict, shared_params: dict)¶

Instantiate batchie.core.Theta from dictionary

Returns:: a dictionary mapping class variables to arrays/numerical values.

precision: float¶

predict_conditional_mean(data: ScreenBase) → numpy.ndarray¶

Predict the conditional mean of an batchie.data.ExperimentBase in modeling space.

Returns:: An array of means for each item in the Experiment.

predict_conditional_variance(data: ScreenBase) → numpy.ndarray¶

Predict the conditional variance of an batchie.data.ExperimentBase.

Returns:: An array of variances for each item in the Experiment.

predict_viability(data: ScreenBase) → numpy.ndarray¶

Predict the conditional mean of an batchie.data.ExperimentBase in viability space.

Returns:: An array of means for each item in the Experiment.

private_parameters_dict() → dict[str, numpy.ndarray]¶

The private parameters of a batchie.core.Theta.

Returns:: a dictionary mapping class variables to arrays/numerical values.

shared_parameters_dict() → dict[str, numpy.ndarray]¶

The shared parameters of a batchie.core.Theta.

Returns:: a dictionary mapping class variables to arrays.

single_effect_lookup: dict¶

batchie.policies.k_per_sample module¶

class batchie.policies.k_per_sample.KPerSamplePlatePolicy(k: int)¶

Bases: PlatePolicy

filter_eligible_plates(batch_plates: list[Plate], unobserved_plates: list[Plate], rng: numpy.random.Generator) → list[Plate]¶

batchie.scoring.gaussian_dbal module¶

class batchie.scoring.gaussian_dbal.GaussianDBALScorer(max_chunk=50, max_triples=5000, **kwargs)¶

Bases: Scorer

score(plates: dict[int, ScreenSubset], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) → dict[int, float]¶

batchie.scoring.gaussian_dbal.dbal_fast_gauss_scoring_vectorized(predictions: numpy.ndarray, variances: numpy.ndarray, distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)¶

Compute the Monte Carlo approximation of the DBAL ideal score \(\widehat{s}_n(P)\) in a vectorized way for each of the given plates.

\[\widehat{s}_n(P) = \frac{1}{{m \choose 3}} \sum_{i < j < k} d(\theta_i, \theta_j) L_{\theta_i}(\theta_j, \theta_k ; P ) e^{2H_{\theta_i}(P)}\]

Parameters:

predictions – model predictions over all plates of shape (n_plates, n_thetas, n_experiments)
variances – an array of variances for model predictions over all plates, of size (n_plate, n_thetas, max_n_experiments). For plates smaller than the maximum size, the variances should be padded with NaNs up to the maximum size.
distance_matrix – a square array of shape (n_thetas, n_thetas) of distances between model parameterizations
rng – PRNG
max_combos – the maximum number of theta triplets to sample
distance_factor – a multiplicative factor for the distance matrix

Returns:

an array of shape (n_plates,) of approximated scores for each plate in per_plate_predictions

batchie.scoring.gaussian_dbal.dbal_fast_gaussian_scoring_heteroscedastic(per_plate_predictions: list[numpy.ndarray], variances: list[numpy.ndarray], distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)¶

batchie.scoring.gaussian_dbal.dbal_fast_gaussian_scoring_homoscedastic(per_plate_predictions: list[numpy.ndarray], variances: numpy.ndarray, distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)¶

Parameters:

per_plate_predictions – Ragged array of model predictions of length n_plates, each list element is an array of shape (n_thetas, n_plate_experiments)
variances – an array of variances for model predictions over all plates, of size (n_plate, n_thetas).
distance_matrix – a square array of shape (n_thetas, n_thetas) of distances between model parameterizations
rng – PRNG
max_combos – the maximum number of theta triplets to sample
distance_factor – a multiplicative factor for the distance matrix

Returns:

an array of shape (n_plates,) of approximated scores for each plate in per_plate_predictions

batchie.scoring.gaussian_dbal.generate_combination_at_sorted_index(index, n, k)¶

Generate all range(n) choose k combinations.

Represent each combination as a descending sorted tuple.

Sort all the tuples is ascending order, and return the tuple that would be found at index.

Do this without materializing the actual list of combinations.

Parameters:

index – The index of the combination to return
n – The number of items to choose from
k – The number of items to choose

Returns:

A tuple of length k representing the combination

batchie.scoring.gaussian_dbal.get_combination_at_sorted_index(index, n, k)¶

batchie.scoring.gaussian_dbal.pad_ragged_arrays_to_dense_array(arrays: list[numpy.ndarray], pad_value: float = 0.0)¶

Given a list of arrays, each with N dimensions, each of which have different sizes, return a dense array of N + 1 dimensions, of size (len(array), maximum_of_dimension_0, … maximum_of_dimension_N) where all the arrays are padded to the maximum size. Padding value defaults to 0.0.

Parameters:

arrays – A list of arrays
pad_value – A floating point number (default is 0)

Returns:

A dense array of the arrays

batchie.scoring.main module¶

class batchie.scoring.main.ChunkedScoresHolder(size: int)¶

Bases: ScoresHolder

add_score(plate_id: int, score: float)¶

Add a score for a given plate.

Parameters:

plate_id – The plate id to add the score for.
score – The score to add.

combine(other: ScoresHolder)¶

classmethod concat(scores_list: list[ScoresHolder])¶

get_score(plate_id: int) → float¶

Get the score for a given plate.

Parameters:: plate_id – The plate id to get the score for.
Returns:: The score for the given plate.

classmethod load_h5(fn)¶

plate_id_with_minimum_score(eligible_plate_ids: list[int] = None) → int¶

Get the plate id with the minimum score.

Parameters:: eligible_plate_ids – The set of plates to consider.
Returns:: The plate id with the minimum score.

save_h5(fn)¶

batchie.scoring.main.score_chunk(scorer: Scorer, thetas: ThetaHolder, screen: Screen, distance_matrix: ChunkedDistanceMatrix, rng: numpy.random.Generator | None = None, progress_bar: bool = False, n_chunks: int = 1, chunk_index: int = 0, batch_plate_ids: list[int] | None = None) → ChunkedScoresHolder¶

Score a subset of all unobserved plates in a screen.

Parameters:

scorer – The scorer to use for scoring
thetas – The samples to use for scoring
screen – The screen to score
distance_matrix – The distance matrix to use for scoring
rng – PRNG to use for sampling
progress_bar – Whether to show a progress bar
n_chunks – The number of chunks to split the unobserved plates into
chunk_index – The index of the chunk to score
batch_plate_ids – A list of plate ids that have already been selected in the batch

Returns:

ChunkedScoresHolder containing the scores for each plate in the current chunk

batchie.scoring.main.select_next_plate(scores: ScoresHolder, screen: Screen, policy: PlatePolicy | None, batch_plate_ids: list[int] | None = None, rng: numpy.random.Generator | None = None) → Plate | None¶

Select the next batchie.data.Plate to observe

Parameters:

scores – The scores for each plate
screen – The screen which defines the set of plates to choose from
policy – The policy to use for plate selection
batch_plate_ids – The plates currently selected in the batch
rng – PRNG to use for sampling

Returns:

A list of plates to observe

batchie.scoring.size module¶

class batchie.scoring.size.SizeScorer¶

Bases: Scorer

A scorer that returns the number of conditions in the Plate as the score.

score(plates: dict[int, Plate], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) → dict[int, float]¶

batchie.scoring.rand module¶

class batchie.scoring.rand.RandomScorer¶

Bases: Scorer

A scorer that returns a random score for each plate, used for baseline comparison

score(plates: dict[int, Plate], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) → dict[int, float]¶

batchie package¶

Submodules¶

batchie.common module¶

batchie.core module¶

batchie.data module¶

batchie.distance_calculation module¶

batchie.fast_mvn module¶

batchie.introspection module¶

batchie.log_config module¶

batchie.retrospective module¶

batchie.sampling module¶

batchie.synergy module¶

batchie.distance.mse module¶

batchie.models.sparse_combo module¶

batchie.models.sparse_combo_interaction module¶

batchie.policies.k_per_sample module¶

batchie.scoring.gaussian_dbal module¶

batchie.scoring.main module¶

batchie.scoring.size module¶

batchie.scoring.rand module¶

batchie

Navigation

Related Topics