BIRDMAn Model API¶

BIRDMAn includes several default models for count regression but also supports custom modeling. This involves creating a new Stan file and creating a Model object through the BIRDMAn API.

Default Models¶

These are the default models that are included in BIRDMAn. They should be usable with minimal knowledge of Stan and are good general purpose models.

class birdman.default_models.NegativeBinomial(table: Table, formula: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5)¶

Fit count data using negative binomial model on full table.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\mu_{ij} &= n_i p_{ij}\\\textrm{alr}(p_i) &= x_i \beta\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]

\[A = \ln{\frac{1}{D}}, \ D = \textrm{Number of features}\]

\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s), \ s \in \mathbb{R}_{>0}\]

Parameters

table (biom.table.Table) – Feature table (features x samples)
formula (str) – Design formula to use in model
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

class birdman.default_models.NegativeBinomialSingle(table: Table, feature_id: str, formula: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5)¶

Fit count data using negative binomial model on single feature.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\log(\mu_{ij}) &= \log(\textrm{Depth}_i) + x_i \beta\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]

\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]

\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\]

Parameters

table (biom.table.Table) – Feature table (features x samples)
feature_id (str) – ID of feature to fit
formula (str) – Design formula to use in model
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

class birdman.default_models.NegativeBinomialLME(table: Table, formula: str, group_var: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5, group_var_prior: float = 1.0)¶

Fit count data using negative binomial model considering subject as a random effect.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\mu_{ij} &= n_i p_{ij}\\\textrm{alr}(p_i) &= x_i \beta + z_i u\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]

\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]

\[ \begin{align}\begin{aligned}\frac{1}{\phi_j} &\sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\\u_j &\sim \textrm{Normal}(0, u_p),\ u_p \in \mathbb{R}_{>0}\end{aligned}\end{align} \]

Parameters

table (biom.table.Table) – Feature table (features x samples)
formula (str) – Design formula to use in model
group_var (str) – Variable in metadata to use as grouping
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5
group_var_prior (float) – Standard deviation for normally distributed prior values of group_var, defaults to 1.0

class birdman.default_models.NegativeBinomialLMESingle(table: Table, feature_id: str, formula: str, group_var: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5, group_var_prior: float = 1.0)¶

Fit count data using negative binomial model on single feature.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\log(\mu_{ij}) &= \log(\textrm{Depth}_i) + x_i \beta + z_i u\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]

\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]

\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\]

\[u_j &\sim \textrm{Normal}(0, u_p),\ u_p \in \mathbb{R}_{>0}\]

Parameters

table (biom.table.Table) – Feature table (features x samples)
feature_id (str) – ID of feature to fit
formula (str) – Design formula to use in model
group_var (str) – Variable in metadata to use as grouping
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

Table Model¶

You should inherit/instantiate this class if you are building a custom model for estimating parameters of an entire table at once.

class birdman.model_base.TableModel(table: Table, **kwargs)¶: Fit a model on the entire table at once.

Single Feature Model¶

This class is designed for those interested in parallelizing model fitting across multiple features at once. We do not explicitly perform parallelization but rather leave that to the user.

class birdman.model_base.SingleFeatureModel(table: Table, feature_id: str, **kwargs)¶: Fit a model for a single feature.

Model Iterator¶

This is a helper class for fitting the constituent SingleFeatureModels of a given table. It may be helpful to use this iterator in conjunction with a scheduler or other means of job submission.

class birdman.model_base.ModelIterator(table: Table, model: SingleFeatureModel, num_chunks: Optional[int] = None, **kwargs)¶

Iterate through features in a table.

This class is intended for those looking to parallelize model fitting across individual features rather than across Markov chains.

Parameters

table (biom.table.Table) – Feature table (features x samples)
model (birdman.model_base.SingleFeatureModel) – BIRDMAn model for each individual feature
num_chunks (int) – Number of chunks to split table features. By default does not do any chunking.
kwargs – Keyword arguments to pass to each feature model

Base Model¶

This is the abstract class from which all BIRDMAn models derive. Note that this class cannot be instantiated on its own.

class birdman.model_base.BaseModel(table: Table, model_path: str)¶

Base BIRDMAn model.

Parameters

table (biom.table.Table) – Feature table (features x samples)
model_path (str) – Filepath to Stan model

add_parameters(param_dict: Optional[dict] = None)¶: Add parameters from dict to be passed to Stan.

compile_model()¶: Compile Stan model.

create_regression(formula: str, metadata: DataFrame)¶

Generate design matrix for count regression modeling.

Parameters

formula (str) – Design formula to use in model
metadata (pd.DataFrame) – Metadata for design matrix

fit_model(method: str = 'vi', num_draws: int = 500, mcmc_warmup: Optional[int] = None, mcmc_chains: int = 4, vi_iter: int = 1000, vi_grad_samples: int = 40, vi_require_converged: bool = False, seed: float = 42, mcmc_kwargs: Optional[dict] = None, vi_kwargs: Optional[dict] = None)¶

Fit BIRDMAn model.

Parameters

method (str) – Method by which to fit model, either ‘mcmc’ (default) for Markov Chain Monte Carlo or ‘vi’ for Variational Inference
num_draws (int) – Number of output draws to sample from the posterior, default is 500
mcmc_warmup (int) – Number of warmup iterations for MCMC sampling, default is the same as num_draws
mcmc_chains (int) – Number of Markov chains to use for sampling, default is 4
vi_iter (int) – Number of ADVI iterations to use for VI, default is 1000
vi_grad_samples (int) – Number of MC draws for computing the gradient, default is 40
vi_require_converged (bool) – Whether or not to raise an error if Stan reports that “The algorithm may not have converged”, default is False
seed (int) – Random seed to use for sampling, default is 42
mcmc_kwargs – kwargs to pass into CmdStanModel.sample
vi_kwargs – kwargs to pass into CmdStanModel.variational

specify_model(params: Sequence[str], coords: dict, dims: dict, include_observed_data: bool = False, posterior_predictive: Optional[str] = None, log_likelihood: Optional[str] = None, **kwargs)¶

Specify coordinates and dimensions of model.

Parameters

params (Sequence[str]) – Posterior fitted parameters to include
coords (dict) – Mapping of entries in dims to labels
dims (dict) – Dimensions of parameters in the model
include_observed_data (bool) – Whether to include the original feature table values into the arviz InferenceData object, default is False
posterior_predictive (str, optional) – Name of posterior predictive values from Stan model to include in arviz InferenceData object
log_likelihood (str, optional) – Name of log likelihood values from Stan model to include in arviz InferenceData object
kwargs – Extra keyword arguments to save in specifications dict

abstract to_inference()¶: Convert fitted model to az.InferenceData.