Command Line Interface

Batchie provides a suite of command line utilities that allow users to script running the pipeline end to end.

train_model

Train the provided model by iteratively calling its #step method,conditioned on the provided data. Collect the model parameters and saveto a file.

usage: train_model [-h] [-v] [-P] --data DATA --model MODEL
                   [--model-param KEY=VALUE] --output OUTPUT
                   [--n-samples N_SAMPLES] [--n-burnin N_BURNIN] [--thin THIN]
                   [--n-chains N_CHAINS] [--chain-index CHAIN_INDEX]
                   [--seed SEED]

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--data

A batchie Screen object in hdf5 format.

--model

Fully qualified name of the BayesianModel class to use.

--model-param

Model parameters

--output

Output file to save learned model parameters to.

--n-samples

Number of samples to save from the posterior distribution

Default: 100

--n-burnin

Number of steps to iterate before samples are saved

Default: 1000

--thin

Thinning factor for samples after burn-in is complete. A value of 2 means every seconds sample is saved, etc.

Default: 10

--n-chains

Number of parallel instances of this model that are being run. This information is used for rng seed intialization.

Default: 1

--chain-index

Index of this model in the series of parallel model runs.

Default: 0

--seed

Seed to use for PRNG.

Default: 0

evaluate_model

This is a utility for evaluating model performance by predicting over an observed ‘test screen’.

usage: evaluate_model [-h] [-v] [-P] --screen SCREEN --thetas THETAS
                      [THETAS ...] --output OUTPUT [--seed SEED]

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--screen

A batchie Screen in hdf5 format with all plates observed.

--thetas

A batchie ThetaHolder in hdf5 format.

--output

Output ModelEvaluation object in h5 format.

--seed

Seed to use for PRNG.

Default: 0

calculate_distance_matrix

calculate_distance_matrix.py

usage: calculate_distance_matrix [-h] [-v] [-P] --data DATA --thetas THETAS
                                 [THETAS ...] --distance-metric
                                 DISTANCE_METRIC
                                 [--distance-metric-param KEY=VALUE]
                                 --n-chunks N_CHUNKS --chunk-index CHUNK_INDEX
                                 --output OUTPUT

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--data

A batchie Screen in hdf5 format.

--thetas

A batchie ThetaHolder in hdf5 format.

--distance-metric

Fully qualified name of the DistanceMetric class to use.

--distance-metric-param

Distance metric parameters

--n-chunks

Number of chunks to split the distance matrix calculation into.

--chunk-index

Which of the n chunks to calculate in this invocation.

--output

Output batchie ChunkedDistanceMatrix in hdf5 format.

calculate_scores

calculate_scores.py

usage: calculate_scores [-h] [-v] [-P] --data DATA --thetas THETAS
                        [THETAS ...] --distance-matrix DISTANCE_MATRIX
                        [DISTANCE_MATRIX ...] [--n-chunks N_CHUNKS]
                        [--chunk-index CHUNK_INDEX] --scorer SCORER
                        [--scorer-param KEY=VALUE]
                        [--batch-plate-ids BATCH_PLATE_IDS [BATCH_PLATE_IDS ...]]
                        --output OUTPUT [--seed SEED]

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--data

A batchie Screen object in hdf5 format.

--thetas

A batchie ThetaHolder object in hdf5 format.

--distance-matrix

A batchie ChunkedDistanceMatrix object in hdf5 format.

--n-chunks

Number of chunks to parallelize scoring over.

Default: 1

--chunk-index

Which of the n chunks to calculate in this invocation.

Default: 0

--scorer

Fully qualified name of the scorer class to use.

--scorer-param

Scorer parameters

--batch-plate-ids

The plate(s) already selected as part of this batch.

Default: []

--output

Location of output h5 file where scores will be saved.

--seed

Seed to use for PRNG.

Default: 0

select_next_plate

calculate_distance_matrix.py

usage: select_next_plate [-h] [-v] [-P] --data DATA --scores SCORES
                         [SCORES ...] [--policy POLICY]
                         [--policy-param KEY=VALUE] --output OUTPUT
                         [--seed SEED]
                         [--batch-plate-id BATCH_PLATE_ID [BATCH_PLATE_ID ...]]

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--data

A batchie Screen object in hdf5 format.

--scores

One or more ChunkedScoresHolder objects in hdf5 format.

--policy

Fully qualified name of the PlatePolicy class to use.

--policy-param

Policy parameters

--output

Location of output file which will contain the plate id selected

--seed

Seed to use for PRNG.

Default: 0

--batch-plate-id

The plate(s) currently select in the batch.

Default: []

reveal_plate

This is a utility for revealing a plates in a retrospective simulation.

usage: reveal_plate [-h] [-v] [-P] --screen SCREEN --output OUTPUT --plate-id
                    PLATE_ID [PLATE_ID ...]

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--screen

A batchie Screen in hdf5 format with some plates observed.

--output

Where to save screen with the next plate revealed.

--plate-id

The plate(s) to reveal.

extract_screen_metadata

extract_screen_metadata.py

usage: extract_screen_metadata [-h] [-v] [-P] --screen SCREEN --output OUTPUT

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--screen

A batchie Screen in hdf5 format.

--output

Output json file to save metadata to.

prepare_retrospective_simulation

This is a utility for revealing plates in a retrospective simulation,calculating the prediction error on the un-revealed plates thus far,and saving the results.

usage: prepare_retrospective_simulation [-h] [-v] [-P] --data DATA
                                        --training-output TRAINING_OUTPUT
                                        --test-output TEST_OUTPUT
                                        [--initial-plate-generator INITIAL_PLATE_GENERATOR]
                                        [--initial-plate-generator-param KEY=VALUE]
                                        [--plate-generator PLATE_GENERATOR]
                                        [--plate-generator-param KEY=VALUE]
                                        [--plate-smoother PLATE_SMOOTHER]
                                        [--plate-smoother-param KEY=VALUE]
                                        [--holdout-fraction HOLDOUT_FRACTION]
                                        [--seed SEED]

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

-P, --progress

Enable progress bar

Default: False

--data

A batchie Screen in hdf5 format.

--training-output

Output training set batchie Screen in hdf5 format.

--test-output

Output test set batchie Screen in hdf5 format.

--initial-plate-generator

Fully qualified name of the InitialRetrospectivePlateGenerator class to use.

--initial-plate-generator-param

Initial plate generator parameters

--plate-generator

Fully qualified name of the RetrospectivePlateGenerator class to use.

--plate-generator-param

Plate generator parameters

--plate-smoother

Fully qualified name of the RetrospectivePlateSmoother class to use.

--plate-smoother-param

Plate smoother parameters

--holdout-fraction

Fraction of data to holdout for testing (proportion of experiments in the test set in the test/train spit).

Default: 0.1

--seed

Seed to use for PRNG.

Default: 0