batchie package¶
Submodules¶
batchie.common module¶
- batchie.common.copy_array_with_control_treatments_set_to_zero(arr: numpy.ndarray, treatment_array: numpy.ndarray)¶
- batchie.common.select_unique_zipped_numpy_arrays(arrs)¶
Returns a boolean array that selects unique combinations of several same length numpy arrays.
- Parameters:
arrs – Arrays of the same length.
- Returns:
Boolean array indicating unique combinations.
batchie.core module¶
- class batchie.core.BayesianModel(experiment_space: ExperimentSpace)¶
Bases:
ABCThis class represents a Bayesian model.
A Bayesian model has internal state. Each
batchie.core.BayesianModelshould have a companionbatchie.core.Thetaandbatchie.core.ThetaHolderclass which represents the models internal state in a serializable way.The internal state of the model can be set explicitly via
batchie.core.BayesianModel.set_model_state(). or it can be advanced viabatchie.core.BayesianModel.step().A
batchie.core.BayesianModelcan have data added to it viabatchie.core.BayesianModel.add_observations(). If data is present, the model should use that data somehow when BayesianModel#step is called.batchie.core.BayesianModel.n_obs()should report the number of datapoints that have been added to the model.A
batchie.core.BayesianModelcan be used to predict the outcome of an Experiment viabatchie.core.BayesianModel.predict().A
batchie.core.BayesianModelmust report its variance viabatchie.core.BayesianModel.variance().- add_observations(data: ScreenBase)¶
Add observations to the model.
- Parameters:
data – The data to add.
- abstract n_obs() int¶
Return the number of observations that have been added to the model.
- Returns:
Integer number of observations
- abstract reset_model()¶
Reset the internal state of the model to its initial state.
- abstract property rng: numpy.random.Generator¶
Return the PRNG for this model instance.
- Returns:
The PRNG for this model instance.
- abstract set_rng(rng: numpy.random.Generator)¶
Set the PRNG for this model instance.
- Parameters:
rng – The PRNG to use.
- class batchie.core.DistanceMatrix¶
Bases:
ABC- abstract add_value(i, j, value)¶
Add a value to the distance matrix.
- Parameters:
i – The row index.
j – The column index.
value – The value to add.
- abstract classmethod load(filename)¶
Load a distance matrix from a file.
- Parameters:
filename – The filename to load from.
- abstract save(filename)¶
Save the distance matrix to a file.
- Parameters:
filename – The filename to save to.
- abstract to_dense()¶
Return a dense representation of the distance matrix.
- Returns:
A dense representation of the distance matrix.
- class batchie.core.DistanceMetric¶
Bases:
objectThis class represents a symmetric distance metric between two arrays of model predictions.
- distance(a: numpy.ndarray, b: numpy.ndarray) float¶
Calculate the distance between two arrays of model predictions.
- Parameters:
a – The first array of model predictions.
b – The second array of model predictions.
- Returns:
The distance between the two arrays.
- class batchie.core.InitialRetrospectivePlateGenerator¶
Bases:
ABCWhen running a retrospective active learning simulation, results are sensitive to the initial plate which is revealed. For this reason users might want to implement a special routine for revealing the initial plate separate from the subsequent plates.
- generate_and_unmask_initial_plate(screen: Screen, rng: numpy.random.BitGenerator) Screen¶
Generate and unmask the initial plate.
- Parameters:
screen – A fully observed
batchie.data.Screenrng – The PRNG to use.
- Returns:
The same
batchie.data.Screenwith the initial plate observed, and all other plates
masked.
- class batchie.core.MCMCModel¶
Bases:
objectThis class subclasses BayesianModel and implements
batchie.core.MCMCModel.step()- abstract step()¶
Advance the internal state of the model by one step.
In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.
- class batchie.core.Metric(model: BayesianModel)¶
Bases:
object- evaluate(sample: Theta) float¶
Evaluate the metric on a single parameter set.
- Parameters:
sample – The parameter set to evaluate.
- Returns:
The value of the metric.
- evaluate_all(results_holder: ThetaHolder) numpy.ndarray¶
Evaluate the metric on all parameter sets in the results_holder.
- Parameters:
results_holder – The parameter sets to evaluate.
- Returns:
An array of metric values.
- class batchie.core.PlatePolicy¶
Bases:
objectGiven a
batchie.data.Screen, which is a set of potential :py:class:`batchie.data.Plate`s, implementations of this class will determine which set of :py:class:`batchie.data.Plate`s is eligible for the next round.
- class batchie.core.RetrospectivePlateGenerator¶
Bases:
ABCWhen running a retrospective active learning simulation, the user might want to reorganize the dataset into different plates then were originally run. This class will generate these plate groupings from the individual observations in the retrospective dataset.
- generate_plates(screen: Screen, rng: numpy.random.BitGenerator) Screen¶
Generate plates from the remaining unobserved experiments in the input screen.
- Parameters:
screen – A partially observed
batchie.data.Screenrng – The PRNG to use.
- class batchie.core.RetrospectivePlateSmoother¶
Bases:
ABCAfter plates have been generated for a retrospective simulation using a
batchie.core.RetrospectivePlateGenerator, those plates may be of very uneven sizes, which is not desirable. Implementations of this class should aim to merge plates together and/or drop experiments until plate sizes are more even. We call this process “plate smoothing”.- smooth_plates(screen: Screen, rng: numpy.random.BitGenerator) Screen¶
Smooth the plates in the screen.
- Parameters:
screen – A partially observed
batchie.data.Screenrng – The PRNG to use.
- class batchie.core.Scorer¶
Bases:
objectThis class represents a scoring function for
batchie.data.Plateinstances.The score should represent how desirable it is to observe the given plate, with a lower score being more desirable.
- score(plates: dict[int, ScreenSubset], distance_matrix: DistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]¶
- class batchie.core.ScoresHolder¶
Bases:
ABCThis class represents a set of scores for a set of plates.
- add_score(plate_id: int, score: float)¶
Add a score for a given plate.
- Parameters:
plate_id – The plate id to add the score for.
score – The score to add.
- get_score(plate_id: int) float¶
Get the score for a given plate.
- Parameters:
plate_id – The plate id to get the score for.
- Returns:
The score for the given plate.
- plate_id_with_minimum_score(eligible_plate_ids: list[int] = None) int¶
Get the plate id with the minimum score.
- Parameters:
eligible_plate_ids – The set of plates to consider.
- Returns:
The plate id with the minimum score.
- class batchie.core.SimulationTracker(plate_ids_selected: list[list[int]], losses: list[float], seed: int)¶
Bases:
objectThis class tracks the state of a retrospective active learning simulation. It will record the plates that were revealed at each step and the total loss of the predictor trained on the plates revealed up until that point.
- classmethod load(fn)¶
Load this instance from a JSON file.
- Parameters:
fn – The filename to load from.
- save(fn)¶
Save this instance to a JSON file.
- Parameters:
fn – The filename to save to.
- class batchie.core.Theta¶
Bases:
objectThis class represents the set of parameters for a BayesianModel. Should be implemented by a dataclass or similarly serializable class.
- equals(other)¶
- abstract classmethod from_dicts(private_params: dict, shared_params: dict)¶
Instantiate
batchie.core.Thetafrom dictionary- Returns:
a dictionary mapping class variables to arrays/numerical values.
- abstract predict_conditional_mean(data: ScreenBase) numpy.ndarray¶
Predict the conditional mean of an
batchie.data.ExperimentBasein modeling space.- Returns:
An array of means for each item in the Experiment.
- abstract predict_conditional_variance(data: ScreenBase) numpy.ndarray¶
Predict the conditional variance of an
batchie.data.ExperimentBase.- Returns:
An array of variances for each item in the Experiment.
- abstract predict_viability(data: ScreenBase) numpy.ndarray¶
Predict the conditional mean of an
batchie.data.ExperimentBasein viability space.- Returns:
An array of means for each item in the Experiment.
- abstract private_parameters_dict() dict[str, numpy.ndarray]¶
The private parameters of a
batchie.core.Theta.- Returns:
a dictionary mapping class variables to arrays/numerical values.
The shared parameters of a
batchie.core.Theta.- Returns:
a dictionary mapping class variables to arrays.
- class batchie.core.ThetaHolder(n_thetas: int, *args, **kwargs)¶
Bases:
ABCThis class represents a container for multiple parameter sets for a BayesianModel. This class provides methods to save these parameter sets to an H5 file.
- add_theta(theta: Theta)¶
Add a new parameter set to the container.
- Parameters:
theta – The parameter set to add.
- combine(other)¶
Combine these parameters sets with another container of parameter sets.
- Parameters:
other – Another ThetaHolder instance.
- classmethod concat(instances: list)¶
Combine multiple instances of ThetaHolder into one.
- Parameters:
instances – A list of ThetaHolder instances.
- get_theta(step_index: int) Theta¶
Returns the parameter set at the given index.
- Parameters:
step_index – The index of the parameter set to return.
- property is_complete¶
- Returns:
True if the container is full, False otherwise.
- static load_h5(path: str)¶
Load a ThetaHolder from an H5 file.
- Parameters:
path – The path to the H5 file.
- property n_thetas¶
- Returns:
The number of parameter sets in the container.
- save_h5(fn: str)¶
Save the parameter sets to an H5 file.
- Parameters:
fn – The filename to save to.
- class batchie.core.VIModel¶
Bases:
objectThis class subclasses BayesianModel and implements
batchie.core.VIModel.sample()
batchie.data module¶
- class batchie.data.ExperimentSpace(treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None, sample_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None, control_treatment_name: str = '')¶
Bases:
objectThis class represents the universe of possible experimental conditions.
Models can use this object to define the size of their embeddings etc. without having to look at the full dataset.
- doses_for_treatment(treatment_name: str) numpy.ndarray¶
- classmethod load_h5(path: str)¶
- property n_unique_doses¶
- property n_unique_samples¶
- property n_unique_treatment_types¶
- property n_unique_treatments¶
- sample_id_from_sample_name(sample_name: str)¶
- sample_name_from_sample_id(sample_id: int)¶
- save_h5(path: str)¶
- treatment_ids_from_treatment_name(treatment_name: str)¶
- class batchie.data.Plate(screen: Screen, selection_vector: numpy.ndarray)¶
Bases:
ScreenSubsetA subset of an
batchie.data.Screendefined by a boolean selection vectorThis class is not meant to be instantiated directly, but rather is returned by the
batchie.data.Screen.get_platemethod.The difference between a
batchie.data.Plateand anbatchie.data.ScreenSubsetis that abatchie.data.Plateis guaranteed to contain only one unique plate id.- merge(other)¶
Merge this plate with another plate, mutate the parent
batchie.data.Screenin place.- Parameters:
other –
batchie.data.Plate
- property plate_id¶
Return the plate id of this plate.
- Returns:
int, plate id
- property plate_name¶
Return the original plate name of this plate.
- Returns:
str, plate name
- class batchie.data.Screen(treatment_names: numpy.ndarray, treatment_doses: numpy.ndarray, sample_names: numpy.ndarray, plate_names: numpy.ndarray, observations: numpy.ndarray | None = None, observation_mask: numpy.ndarray | None = None, control_treatment_name='', treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None = None, sample_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None = None)¶
Bases:
ScreenBaseThe principal data structure in batchie.
An
batchie.data.Screenis a collection of experiments. Some of the experiments may be observed and some may not be observed. Anything not enumerated as an experimental condition in this top level class will be “invisible” to batchie.An
batchie.data.Screencan be subset intobatchie.data.Plate`s or :py:class:`batchie.data.ScreenSubsetof multiple plates.batchie.data.Screenis the only data class that can be subdivided.- combine(other)¶
Union this screen with another screen.
Warning: treatment, sample, and plate ids are not guaranteed to be the same in the resulting new screen instance.
- Parameters:
other –
batchie.data.Screen- Returns:
Unioned
batchie.data.Screen
- classmethod concat(screens: list[Screen])¶
Concatenate a list of
batchie.data.Screen`s into a single :py:class:`batchie.data.Screen.- Parameters:
screens – list of
batchie.data.Screen- Returns:
Unioned
batchie.data.Screen
- get_plate(plate_id: int) Plate¶
Return a
batchie.data.Platedefined by a plate id.- Parameters:
plate_id – int, plate id
- Returns:
- static load_h5(path)¶
Load screen from h5 archive.
- Parameters:
path – str, path to h5 archive
- property observation_mask¶
Return the array of observation masks in the screen. If the array is true, it means the condition is observed, if false it is unobserved.
- Returns:
1d array of observation masks
- property observations¶
Return the array of observations in the screen.
We do not use any NaN values in our arrays, the observation value for a condition set where
batchie.data.Screen.observation_maskis False is undefined. Its up to the user to decide how to handle this.- Returns:
1d array of observations
- property plate_ids¶
Return the array of plate ids in the screen.
Plate ids are always 0 indexed integers from 0 to
batchie.data.ScreenBase.n_unique_plates- 1 with no gaps.- Returns:
1d array of plate ids
- property plate_mapping: Tuple[numpy.ndarray, numpy.ndarray]¶
- Returns:
a tuple of two 1d arrays that map plate name to id.
- property plates¶
Return a list of all :py:class:`batchie.data.Plate`s in the screen.
- Returns:
list of :py:class:`batchie.data.Plate`s
- property sample_ids¶
Return the array of sample ids in the screen.
Sample ids are always 0 indexed integers from 0 to
batchie.data.ScreenBase.n_unique_samples- 1 with no gaps.- Returns:
1d array of sample ids
- property sample_mapping: Tuple[numpy.ndarray, numpy.ndarray]¶
- Returns:
a tuple of two 1d arrays that map sample name to id.
- property sample_names¶
Return the array of sample names (provided string names)
- Returns:
1d array of sample names
- save_h5(fn)¶
Save screen to h5 archive.
- Parameters:
fn – str, path to h5 archive
- set_observed(selection_mask: numpy.ndarray, observations: numpy.ndarray)¶
- property single_treatment_effects: numpy.ndarray | None¶
Return the array of single treatment effects in the screen.
- Returns:
2d array of single treatment effects
- subset(selection_vector: numpy.ndarray) ScreenSubset¶
Return a
batchie.data.ScreenSubsetdefined by a boolean selection vector.- Parameters:
selection_vector – 1d array of bools
- Returns:
- subset_observed() ScreenSubset | None¶
Return a
batchie.data.ScreenSubsetcontaining all conditions that are observed. Returns none if all conditions are unobserved.- Returns:
- subset_unobserved() ScreenSubset | None¶
Return a
batchie.data.ScreenSubsetcontaining all conditions that are not observed. Returns none ifbatchie.data.Screen.is_observedis True.- Returns:
- property treatment_doses¶
Return the array of treatment doses (floating point drug concentrations)
- Returns:
N-dimension array of treatment doses
- property treatment_ids¶
Return the array of treatment ids in the screen.
Treatment ids are always 0 indexed integers from 0 to
batchie.data.ScreenBase.n_unique_treatments- 1 with no gaps.- Returns:
2d array of treatment ids
- property treatment_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]¶
- Returns:
a tuple of three 1d arrays that map tuples of (name, dose) to id.
- property treatment_names¶
Return the array of treatment names (provided drug names)
- Returns:
N-dimension array of treatment names
- class batchie.data.ScreenBase¶
Bases:
ABCBase class for the principal data structure in batchie.
An
batchie.data.Screenis a collection of experimental conditions, and optionally observations for those of those conditions. The conditions are defined by a set of treatment names and doses, and a set of sample names. Observations are scalar floating point numbers, with one scalar per condition.batchie.data.Screenclass also defines the concept of a plate, which is a grouping of experimental conditions. The terminology plate comes from the world of high throughput biological screening, where plastic plates with 96, 384, or 1536 individual wells are used to hold distinct biochemical reactions. In batchie, this concept is abstracted to the concept of a plate being the discrete unit of experimental conditions that can be observed at one time. We also abstract away the concept of the plate having to be a fixed size each time.- abstract combine(other)¶
- property is_observed: bool¶
Return True if all observations are available, False otherwise
- Returns:
bool
- property n_plates¶
Return the number of plates in the screen.
- Returns:
int, number of plates
- property n_unique_samples¶
Return the number of unique samples in the screen.
- Returns:
int, number of unique samples
- property n_unique_treatments¶
Return the number of unique treatments in the screen.
- Returns:
int, number of unique treatments
- abstract property observation_mask¶
- abstract property observations¶
- abstract property plate_ids¶
- abstract property plate_mapping¶
- abstract property sample_ids¶
- abstract property sample_mapping¶
- abstract property sample_names¶
- property sample_space_size¶
Return the size of the universe of possible samples.
- Returns:
int
- abstract property single_treatment_effects: numpy.ndarray | None¶
- property size¶
Return the number of experimental conditions contained in the experiment.
- Returns:
int, number of experimental conditions
- property treatment_arity¶
Return the number of treatments per experiment.
- Returns:
int, number of treatments per experiment
- abstract property treatment_doses¶
- abstract property treatment_ids¶
- abstract property treatment_mapping¶
- abstract property treatment_names¶
- property treatment_space_size¶
Return the size of the universe of possible treatments.
- Returns:
int
- property unique_plate_ids¶
Return the unique plate ids in the screen.
- Returns:
1d array of unique plate ids
- property unique_sample_ids¶
Return the unique sample ids in the screen.
- Returns:
1d array of unique sample ids
- property unique_treatments¶
Return the unique treatments in the screen (excludes “control” treatments).
- Returns:
2d array of unique treatments
- class batchie.data.ScreenSubset(screen: Screen, selection_vector: numpy.ndarray)¶
Bases:
ScreenBaseA subset of an
batchie.data.Screendefined by a boolean selection vector.This class is not meant to be instantiated directly, but rather is returned by the
batchie.data.Screen.subset()method.- combine(other)¶
Union this subset with another subset of the same screen.
- Parameters:
other –
batchie.data.ScreenSubset- Returns:
Unioned
batchie.data.ScreenSubset
- classmethod concat(screen_subsets: list)¶
Concatenate a list of
batchie.data.ScreenSubset`s into a single :py:class:`batchie.data.ScreenSubset.- Parameters:
screen_subsets – list of
batchie.data.ScreenSubset- Returns:
Unioned
batchie.data.ScreenSubset
- property control_treatment_name¶
- invert()¶
Return the inverse of this subset, i.e. the subset of the screen that is not contained in this subset.
- Returns:
- property observation_mask¶
- property observations¶
- property plate_ids¶
- property plate_mapping¶
- property sample_ids¶
- property sample_mapping¶
- property sample_names¶
- property single_treatment_effects: numpy.ndarray | None¶
- subset(selection_vector)¶
Return a new
batchie.data.ScreenSubsetdefined by a boolean selection vector.- Parameters:
selection_vector – 1d array of bools
- Returns:
- to_screen()¶
Promote this subset to an
batchie.data.Screen.- Returns:
- property treatment_doses¶
- property treatment_ids¶
- property treatment_mapping¶
- property treatment_names¶
- batchie.data.create_single_treatment_effect_array(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray)¶
Create a n_observation x n_treatment array where each entry is the single treatment effect for the corresponding sample and treatment ids in the input arrays.
- Parameters:
sample_ids – 1d array of sample ids
treatment_ids – 2d array of treatment ids
observation – 1d array of observations
- batchie.data.create_single_treatment_effect_map(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray)¶
Create a map from (sample_id, treatment_id) to single observation (a scalar).
- Parameters:
sample_ids – 1d array of sample ids
treatment_ids – 1d array of treatment ids
observation – 1d array of observations
- batchie.data.encode_1d_array_to_0_indexed_ids(arr: numpy.ndarray, existing_mapping: Tuple[numpy.ndarray, numpy.ndarray] | None = None)¶
Encode a 1d array of strings to 0-indexed integers.
- Parameters:
arr – 1d array of strings
existing_mapping – Prior mapping
- Returns:
integer array containing only values between 0 and n-1,
where n is the number of unique values in arr
- batchie.data.encode_treatment_arrays_to_0_indexed_ids(treatment_name_arr: numpy.ndarray, treatment_dose_arr: numpy.ndarray, control_treatment_name: str = '', existing_mapping: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] | None = None)¶
Encode treatment names and doses (which are arrays of string) to 0-indexed integers, where the control treatment is always mapped to
batchie.common.CONTROL_SENTINEL_VALUE- Parameters:
treatment_name_arr – array of treatment names
treatment_dose_arr – array of treatment doses
control_treatment_name – The string value of the control treatment
existing_mapping – Prior mapping
- batchie.data.filter_dataset_to_treatments_that_appear_in_at_least_one_combo(screen: Screen) Screen¶
Utility function to filter down an
batchie.data.Screento only the treatments that appear in at least one combo.- Parameters:
screen – an
batchie.data.Screen- Returns:
A filtered
batchie.data.Screen
- batchie.data.filter_dataset_to_unique_treatments(screen: Screen | ScreenSubset)¶
Ensure that the dataset only has one experiment per treatment and sample condition by arbitrarily dropping duplicates.
- Parameters:
screen – an
batchie.data.ScreenSubset- Returns:
A
batchie.data.ScreenSubsetwith the same or smaller number of experiments compared to the input.
- batchie.data.numpy_array_is_0_indexed_integers(arr: numpy.ndarray)¶
Test numpy array arr contains only integers between 0 and n-1 with no gaps, where n is the number of unique values in arr.
If the array contains
batchie.common.CONTROL_SENTINEL_VALUE, then we test that the array contains only integers between 0 and n-2, and the sentinel value.- Parameters:
arr – numpy array
- Returns:
bool
batchie.distance_calculation module¶
- class batchie.distance_calculation.ChunkedDistanceMatrix(size, n_chunks=1, chunk_index=0, chunk_size=None)¶
Bases:
DistanceMatrixClass which can represent part or a whole pairwise distance matrix.
The distance matrix is stored in a sparse format, but can be converted to a dense format if all values are present.
Several partial ChunkedDistanceMatrix classes can be combined. This is useful for parallelization of the distance matrix computation.
- add_value(i, j, value)¶
Add a value to the distance matrix.
- Parameters:
i – The row index.
j – The column index.
value – The value to add.
- combine(other)¶
- classmethod concat(matrices: list)¶
- is_complete()¶
- classmethod load(filename)¶
Load a distance matrix from a file.
- Parameters:
filename – The filename to load from.
- save(filename)¶
Save the distance matrix to a file.
- Parameters:
filename – The filename to save to.
- to_dense()¶
Return a dense representation of the distance matrix.
- Returns:
A dense representation of the distance matrix.
- batchie.distance_calculation.calculate_pairwise_distance_matrix_on_predictions(thetas: ThetaHolder, distance_metric: DistanceMetric, data: Screen, chunk_index: int, n_chunks: int, progress: bool = False) ChunkedDistanceMatrix¶
Calculate the pairwise distance matrix between predictions in viability space.
For all pairs of thetas in the given
ThetaHolder, predictions will be made on the unobserved conditions in the givenExperimentand the distance between the predictions produced by the two theta values will be calculated and populated into aChunkedDistanceMatrixinstance.If n_chunks > 1, then the distance matrix is split into n_chunks roughly equal chunks, and only the chunk with index chunk_index is calculated. This is useful for parallelization.
- Parameters:
thetas – The set of model parameters to use for prediction
distance_metric – The distance metric to use
data – The data to predict
chunk_index – The index of the chunk to calculate
n_chunks – The number of chunks to split the distance matrix into
progress – Whether to show a progress bar
- Returns:
A
ChunkedDistanceMatrixcontaining the pairwise distances
- batchie.distance_calculation.consume(iterator, n)¶
Advance the iterator n-steps ahead. If n is none, consume entirely.
- Parameters:
iterator – The iterator to consume
n – The number of steps to advance the iterator
- batchie.distance_calculation.get_lower_triangular_indices_chunk(n: int, chunk_index: int, n_chunks: int)¶
Assuming we want to split the number of lower triangular indices of a square matrix with dimension n into roughly equal chunks, return the indices for the chunk with index chunk_index
- Parameters:
n – The dimension of the square matrix
chunk_index – The index of the chunk to return
n_chunks – The number of chunks to split the indices into
- Returns:
A list of indices
- batchie.distance_calculation.get_number_of_lower_triangular_indices(n: int)¶
Get the number of lower triangular indices of a square matrix with dimension n
- Parameters:
n – The dimension of the square matrix
- Returns:
The number of lower triangular indices
- batchie.distance_calculation.lower_triangular_indices(n: int)¶
Iterate all the lower triangular indices of a square matrix with dimension n
- Parameters:
n – The dimension of the square matrix
- Returns:
A generator which yields the indices
batchie.fast_mvn module¶
Methods for sampling from multivariate normal distributions.
- batchie.fast_mvn.sample_mvn_from_precision(Q, mu=None, mu_part=None, chol_factor=False, rng=None)¶
Fast sampling from a multivariate normal with precision parameterization.
Supports sparse arrays.
- Parameters:
Q – The precision matrix
mu – If provided, assumes the model is N(mu, Q^-1)
mu_part – If provided, assumes the model is N(Q^-1 mu_part, Q^-1)
chol_factor – If true, assumes Q is a (lower triangular) Cholesky
decomposition of the precision matrix :param rng: :return:
batchie.introspection module¶
- batchie.introspection.create_instance(package_name: str, class_name: str, base_class: type, kwargs: dict)¶
Create an instance of a class from a package by name.
- Parameters:
package_name – The name of the package to search.
class_name – The name of the class to search for.
base_class – The base class that the class should inherit from.
kwargs – Keyword arguments to pass to the class constructor.
- batchie.introspection.get_class(package_name: str, class_name: str, base_class: type) type¶
Get a class from a package by name.
- Parameters:
package_name – The name of the package to search.
class_name – The name of the class to search for.
base_class – The base class that the class should inherit from.
- batchie.introspection.get_required_init_args_with_annotations(cls) Dict[str, Any]¶
Get a dictionary of required __init__ arguments and their type annotations for a given class.
- Parameters:
cls – The class to inspect.
- Returns:
A dictionary with argument names as keys and their type annotations as values.
batchie.log_config module¶
- batchie.log_config.add_logging_args(parser)¶
- batchie.log_config.configure_logging(args)¶
Configure logging based on the given arguments.
- Parameters:
args – Parsed command line arguments.
batchie.retrospective module¶
- class batchie.retrospective.BatchieEnsemblePlateSmoother(min_size: int, n_iterations: int, min_n_cell_line_plates: int)¶
Bases:
RetrospectivePlateSmootherApply the following smoothers in sequence to the input
batchie.data.Screen:MergeMinPlateSmootherMergeTopBottomPlateSmootherOptimalSizeSmootherNPlatePerCellLineSmoother
- class batchie.retrospective.FixedSizeSmoother(plate_size: int)¶
Bases:
RetrospectivePlateSmootherFilter all plates smaller than the given size and randomly truncate all plates larger than a fixed size to the given size.
- class batchie.retrospective.MergeMinPlateSmoother(min_size: int)¶
Bases:
RetrospectivePlateSmootherIteratively combine the smallest two plates for each sample until all plates are above a user specified size.
- class batchie.retrospective.MergeTopBottomPlateSmoother(n_iterations: int)¶
Bases:
RetrospectivePlateSmootherIteratively combine the largest and smallest plates for each sample. Runs for a user specified number of iterations.
- class batchie.retrospective.NPlatePerCellLineSmoother(min_n_cell_line_plates: int)¶
Bases:
RetrospectivePlateSmootherRemove all experiments involving cell lines which have less than the user specified min_n_cell_line_plates
- class batchie.retrospective.OptimalSizeSmoother¶
Bases:
RetrospectivePlateSmootherThe cost function for any particular plate size is the sum of two terms, the first term is the number of experiments you have to completely throw out because they are in plates below the threshold, the second term is the number of experiments that need to be trimmed out of plates that are over the threshold. This smoother optimizes this cost function and then drops all plates smaller than the optimal size and sub-samples all plates larger than the optimal size until all plates are the same size.
- class batchie.retrospective.PairwisePlateGenerator(subset_size: int, anchor_size: int)¶
Bases:
RetrospectivePlateGenerator
- class batchie.retrospective.PlatePermutationPlateGenerator(force_include_plate_names: list[str] | None = None)¶
Bases:
RetrospectivePlateGeneratorThis generator will create new plates by permuting the plate labels.
Plates can be excluded from permutation with the force_include_plate_names argument
- class batchie.retrospective.SampleSegregatingPermutationPlateGenerator(max_plate_size: int)¶
Bases:
RetrospectivePlateGeneratorThis generator will generate plates that only contain experiments for a single sample. If there are more than max_plate_size experiments for a single sample then the experiments will be split across multiple equal sized plates.
- class batchie.retrospective.SparseCoverPlateGenerator(reveal_single_treatment_experiments: bool)¶
- batchie.retrospective.calculate_mse(observed_screen: Screen, thetas: ThetaHolder) float¶
Calculate the mean squared error between the masked observations and the unmasked observations
- Parameters:
observed_screen – A
Screenthat is fully observedthetas – The set of model parameters to use for prediction
- Returns:
The average mean squared error between predicted and observed values
- batchie.retrospective.create_plate_balanced_holdout_set_among_masked_plates(screen: ~batchie.data.Screen, fraction: float, rng: numpy.random.BitGenerator) -> (<class 'batchie.data.Screen'>, <class 'batchie.data.Screen'>)¶
Create a holdout set from a retrospective screen (where all data is observed but some plates are artificially masked) by sampling a fraction of each unobserved plate.
- Parameters:
screen – The screen to create a holdout set for
fraction – The fraction of each unobserved plate to hold out
- Returns:
A tuple of (training_screen, holdout_screen)
- batchie.retrospective.create_random_holdout(screen: ~batchie.data.Screen, fraction: float, rng: numpy.random.BitGenerator) -> (<class 'batchie.data.Screen'>, <class 'batchie.data.Screen'>)¶
Create a random subset of a screen, of size fraction of the original screen.
- Parameters:
screen – The screen to create a holdout set for
fraction – The fraction of the screen to hold out
- Returns:
A tuple of (training_screen, holdout_screen)
- batchie.retrospective.reveal_plates(screen: Screen, plate_ids: list[int]) Screen¶
Utility function to reveal observations in the masked screen from the observed screen.
- Parameters:
screen – A
batchie.data.Screenthat is partially masked, but with real observations present in the internal observation arrayplate_ids – The plate ids to reveal
batchie.sampling module¶
- batchie.sampling.sample(model, results: ThetaHolder, seed: int, n_chains: int = None, chain_index: int = None, n_burnin: int = None, thin: int = None, progress_bar=False) ThetaHolder¶
Sample from the model posterior using the given parameters.
- Parameters:
model – The model which will be sampled from.
results – The object which will store the results
seed – The seed to use for the random number generator
n_chains – The number of parallel chains to run
chain_index – The index of the current chain
n_burnin – The number of burnin steps to run
thin – The thinning factor
progress_bar – Whether to display a progress bar
- Returns:
a
ThetaHoldercontaining the sampled parameters
batchie.synergy module¶
- batchie.synergy.calculate_synergy(sample_ids: numpy.ndarray, treatment_ids: numpy.ndarray, observation: numpy.ndarray, strict: bool = False)¶
Calculate synergy for a given set of observations, sample ids, and treatment ids.
If single treatment observations for all of the treatments in a multi-treatment observation are not present, the observation is skipped. If strict is True, an error is raised instead.
batchie.distance.mse module¶
- class batchie.distance.mse.MSEDistance(sigmoid: bool = True)¶
Bases:
DistanceMetricMean squared error distance metric
- distance(a: numpy.ndarray, b: numpy.ndarray)¶
Calculate the distance between two arrays of model predictions.
- Parameters:
a – The first array of model predictions.
b – The second array of model predictions.
- Returns:
The distance between the two arrays.
batchie.models.sparse_combo module¶
- class batchie.models.sparse_combo.LegacySparseDrugComboImpl(n_dims: int, n_drugdoses: int, n_clines: int, intercept: bool = True, fake_intercept: bool = True, individual_eff: bool = True, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0, **kwargs)¶
Bases:
objectOriginal implementation of Bayesian tensor factorization model for predicting combination drug response. Preserved here without changes to ensure reproducibility of results.
- bliss(cline: numpy.ndarray, dd1: numpy.ndarray, dd2: numpy.ndarray)¶
- encode_obs()¶
- ess_pars()¶
- get(attr, ix)¶
- mcmc_step() None¶
- n_obs()¶
- predict(cline: numpy.ndarray, dd1: numpy.ndarray, dd2: numpy.ndarray)¶
- predict_single_drug(cline: numpy.ndarray, dd1: numpy.ndarray)¶
- reset_model()¶
- class batchie.models.sparse_combo.SparseDrugCombo(experiment_space: ExperimentSpace, n_embedding_dimensions: int, fake_intercept: bool = True, individual_eff: bool = True, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0, rng: numpy.random.Generator | None = None, predict_interactions: bool = False, interaction_log_transform: bool = True, intercept: bool = True)¶
Bases:
BayesianModel,MCMCModel- get_model_state() SparseDrugComboMCMCSample¶
Get the internal state of the model.
- n_obs() int¶
Return the number of observations that have been added to the model.
- Returns:
Integer number of observations
- reset_model()¶
Reset the internal state of the model to its initial state.
- property rng: numpy.random.Generator¶
Return the PRNG for this model instance.
- Returns:
The PRNG for this model instance.
- set_rng(rng: numpy.random.Generator)¶
Set the PRNG for this model instance.
- Parameters:
rng – The PRNG to use.
- step()¶
Advance the internal state of the model by one step.
In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.
- class batchie.models.sparse_combo.SparseDrugComboMCMCSample(W: numpy.ndarray, W0: numpy.ndarray, V2: numpy.ndarray, V1: numpy.ndarray, V0: numpy.ndarray, alpha: float, precision: float)¶
Bases:
ThetaA single sample from the MCMC chain for the sparse drug combo model
- V0: numpy.ndarray¶
- V1: numpy.ndarray¶
- V2: numpy.ndarray¶
- W: numpy.ndarray¶
- W0: numpy.ndarray¶
- alpha: float¶
- classmethod from_dicts(private_params, shared_params)¶
Instantiate
batchie.core.Thetafrom dictionary- Returns:
a dictionary mapping class variables to arrays/numerical values.
- precision: float¶
- predict_conditional_mean(data: ScreenBase) numpy.ndarray¶
Predict the conditional mean of an
batchie.data.ExperimentBasein modeling space.- Returns:
An array of means for each item in the Experiment.
- predict_conditional_variance(data: ScreenBase) numpy.ndarray¶
Predict the conditional variance of an
batchie.data.ExperimentBase.- Returns:
An array of variances for each item in the Experiment.
- predict_viability(data: ScreenBase) numpy.ndarray¶
Predict the conditional mean of an
batchie.data.ExperimentBasein viability space.- Returns:
An array of means for each item in the Experiment.
- private_parameters_dict() dict[str, numpy.ndarray]¶
The private parameters of a
batchie.core.Theta.- Returns:
a dictionary mapping class variables to arrays/numerical values.
- batchie.models.sparse_combo.interactions_to_logits(interaction: numpy.ndarray, single_effects: numpy.ndarray, log_transform: bool)¶
- batchie.models.sparse_combo.predict(mcmc_sample: SparseDrugComboMCMCSample, data: ScreenBase, viability: bool)¶
- batchie.models.sparse_combo.predict_single_drug(mcmc_sample: SparseDrugComboMCMCSample, data: ScreenBase, viability: bool)¶
batchie.models.sparse_combo_interaction module¶
- class batchie.models.sparse_combo_interaction.LegacySparseDrugComboInteractionImpl(n_dims: int, n_drugdoses: int, n_clines: int, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0)¶
Bases:
objectThis is the original implementation of the sparse drug combo interaction model. Preserved here without changes to ensure reproducibility of results.
- encode_obs()¶
- mcmc_step() None¶
- n_obs()¶
- reset_model()¶
- class batchie.models.sparse_combo_interaction.SparseDrugComboInteraction(experiment_space: ExperimentSpace, n_embedding_dimensions: int, mult_gamma_proc: bool = True, local_shrinkage: bool = True, a0: float = 1.1, b0: float = 1.1, min_Mu: float = -10.0, max_Mu: float = 10.0)¶
Bases:
BayesianModel,MCMCModel- get_model_state() SparseDrugComboInteractionMCMCSample¶
Get the internal state of the model.
- n_obs() int¶
Return the number of observations that have been added to the model.
- Returns:
Integer number of observations
- reset_model()¶
Reset the internal state of the model to its initial state.
- property rng: numpy.random.Generator¶
Return the PRNG for this model instance.
- Returns:
The PRNG for this model instance.
- set_rng(rng: numpy.random.Generator)¶
Set the PRNG for this model instance.
- Parameters:
rng – The PRNG to use.
- step()¶
Advance the internal state of the model by one step.
In the case of an MCMC model, this would mean taking one more MCMC step. Other types of models should implement accordingly.
- class batchie.models.sparse_combo_interaction.SparseDrugComboInteractionMCMCSample(W: numpy.ndarray, V2: numpy.ndarray, precision: float, single_effect_lookup: dict)¶
Bases:
ThetaA single sample from the MCMC chain for the sparse drug combo model
- V2: numpy.ndarray¶
- W: numpy.ndarray¶
- classmethod from_dicts(private_params: dict, shared_params: dict)¶
Instantiate
batchie.core.Thetafrom dictionary- Returns:
a dictionary mapping class variables to arrays/numerical values.
- precision: float¶
- predict_conditional_mean(data: ScreenBase) numpy.ndarray¶
Predict the conditional mean of an
batchie.data.ExperimentBasein modeling space.- Returns:
An array of means for each item in the Experiment.
- predict_conditional_variance(data: ScreenBase) numpy.ndarray¶
Predict the conditional variance of an
batchie.data.ExperimentBase.- Returns:
An array of variances for each item in the Experiment.
- predict_viability(data: ScreenBase) numpy.ndarray¶
Predict the conditional mean of an
batchie.data.ExperimentBasein viability space.- Returns:
An array of means for each item in the Experiment.
- private_parameters_dict() dict[str, numpy.ndarray]¶
The private parameters of a
batchie.core.Theta.- Returns:
a dictionary mapping class variables to arrays/numerical values.
The shared parameters of a
batchie.core.Theta.- Returns:
a dictionary mapping class variables to arrays.
- single_effect_lookup: dict¶
batchie.policies.k_per_sample module¶
batchie.scoring.gaussian_dbal module¶
- class batchie.scoring.gaussian_dbal.GaussianDBALScorer(max_chunk=50, max_triples=5000, **kwargs)¶
Bases:
Scorer- score(plates: dict[int, ScreenSubset], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]¶
- batchie.scoring.gaussian_dbal.dbal_fast_gauss_scoring_vectorized(predictions: numpy.ndarray, variances: numpy.ndarray, distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)¶
Compute the Monte Carlo approximation of the DBAL ideal score \(\widehat{s}_n(P)\) in a vectorized way for each of the given plates.
\[\widehat{s}_n(P) = \frac{1}{{m \choose 3}} \sum_{i < j < k} d(\theta_i, \theta_j) L_{\theta_i}(\theta_j, \theta_k ; P ) e^{2H_{\theta_i}(P)}\]- Parameters:
predictions – model predictions over all plates of shape (n_plates, n_thetas, n_experiments)
variances – an array of variances for model predictions over all plates, of size (n_plate, n_thetas, max_n_experiments). For plates smaller than the maximum size, the variances should be padded with NaNs up to the maximum size.
distance_matrix – a square array of shape (n_thetas, n_thetas) of distances between model parameterizations
rng – PRNG
max_combos – the maximum number of theta triplets to sample
distance_factor – a multiplicative factor for the distance matrix
- Returns:
an array of shape (n_plates,) of approximated scores for each plate in per_plate_predictions
- batchie.scoring.gaussian_dbal.dbal_fast_gaussian_scoring_heteroscedastic(per_plate_predictions: list[numpy.ndarray], variances: list[numpy.ndarray], distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)¶
- batchie.scoring.gaussian_dbal.dbal_fast_gaussian_scoring_homoscedastic(per_plate_predictions: list[numpy.ndarray], variances: numpy.ndarray, distance_matrix: numpy.ndarray, rng: numpy.random.Generator, max_combos: int = 5000, distance_factor: float = 1.0)¶
- Parameters:
per_plate_predictions – Ragged array of model predictions of length n_plates, each list element is an array of shape (n_thetas, n_plate_experiments)
variances – an array of variances for model predictions over all plates, of size (n_plate, n_thetas).
distance_matrix – a square array of shape (n_thetas, n_thetas) of distances between model parameterizations
rng – PRNG
max_combos – the maximum number of theta triplets to sample
distance_factor – a multiplicative factor for the distance matrix
- Returns:
an array of shape (n_plates,) of approximated scores for each plate in per_plate_predictions
- batchie.scoring.gaussian_dbal.generate_combination_at_sorted_index(index, n, k)¶
Generate all range(n) choose k combinations.
Represent each combination as a descending sorted tuple.
Sort all the tuples is ascending order, and return the tuple that would be found at index.
Do this without materializing the actual list of combinations.
- Parameters:
index – The index of the combination to return
n – The number of items to choose from
k – The number of items to choose
- Returns:
A tuple of length k representing the combination
- batchie.scoring.gaussian_dbal.get_combination_at_sorted_index(index, n, k)¶
- batchie.scoring.gaussian_dbal.pad_ragged_arrays_to_dense_array(arrays: list[numpy.ndarray], pad_value: float = 0.0)¶
Given a list of arrays, each with N dimensions, each of which have different sizes, return a dense array of N + 1 dimensions, of size (len(array), maximum_of_dimension_0, … maximum_of_dimension_N) where all the arrays are padded to the maximum size. Padding value defaults to 0.0.
- Parameters:
arrays – A list of arrays
pad_value – A floating point number (default is 0)
- Returns:
A dense array of the arrays
batchie.scoring.main module¶
- class batchie.scoring.main.ChunkedScoresHolder(size: int)¶
Bases:
ScoresHolder- add_score(plate_id: int, score: float)¶
Add a score for a given plate.
- Parameters:
plate_id – The plate id to add the score for.
score – The score to add.
- combine(other: ScoresHolder)¶
- classmethod concat(scores_list: list[ScoresHolder])¶
- get_score(plate_id: int) float¶
Get the score for a given plate.
- Parameters:
plate_id – The plate id to get the score for.
- Returns:
The score for the given plate.
- classmethod load_h5(fn)¶
- plate_id_with_minimum_score(eligible_plate_ids: list[int] = None) int¶
Get the plate id with the minimum score.
- Parameters:
eligible_plate_ids – The set of plates to consider.
- Returns:
The plate id with the minimum score.
- save_h5(fn)¶
- batchie.scoring.main.score_chunk(scorer: Scorer, thetas: ThetaHolder, screen: Screen, distance_matrix: ChunkedDistanceMatrix, rng: numpy.random.Generator | None = None, progress_bar: bool = False, n_chunks: int = 1, chunk_index: int = 0, batch_plate_ids: list[int] | None = None) ChunkedScoresHolder¶
Score a subset of all unobserved plates in a screen.
- Parameters:
scorer – The scorer to use for scoring
thetas – The samples to use for scoring
screen – The screen to score
distance_matrix – The distance matrix to use for scoring
rng – PRNG to use for sampling
progress_bar – Whether to show a progress bar
n_chunks – The number of chunks to split the unobserved plates into
chunk_index – The index of the chunk to score
batch_plate_ids – A list of plate ids that have already been selected in the batch
- Returns:
ChunkedScoresHolder containing the scores for each plate in the current chunk
- batchie.scoring.main.select_next_plate(scores: ScoresHolder, screen: Screen, policy: PlatePolicy | None, batch_plate_ids: list[int] | None = None, rng: numpy.random.Generator | None = None) Plate | None¶
Select the next
batchie.data.Plateto observe- Parameters:
scores – The scores for each plate
screen – The screen which defines the set of plates to choose from
policy – The policy to use for plate selection
batch_plate_ids – The plates currently selected in the batch
rng – PRNG to use for sampling
- Returns:
A list of plates to observe
batchie.scoring.size module¶
- class batchie.scoring.size.SizeScorer¶
Bases:
ScorerA scorer that returns the number of conditions in the
Plateas the score.- score(plates: dict[int, Plate], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]¶
batchie.scoring.rand module¶
- class batchie.scoring.rand.RandomScorer¶
Bases:
ScorerA scorer that returns a random score for each plate, used for baseline comparison
- score(plates: dict[int, Plate], distance_matrix: ChunkedDistanceMatrix, samples: ThetaHolder, rng: numpy.random.Generator, progress_bar: bool) dict[int, float]¶