BIRDMAn Model API

BIRDMAn includes several default models for count regression but also supports custom modeling. This involves creating a new Stan file and creating a Model object through the BIRDMAn API.

Default Models

These are the default models that are included in BIRDMAn. They should be usable with minimal knowledge of Stan and are good general purpose models.

class birdman.default_models.NegativeBinomial(table: Table, formula: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5)

Fit count data using negative binomial model on full table.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\mu_{ij} &= n_i p_{ij}\\\textrm{alr}(p_i) &= x_i \beta\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]
\[A = \ln{\frac{1}{D}}, \ D = \textrm{Number of features}\]
\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s), \ s \in \mathbb{R}_{>0}\]
Parameters
  • table (biom.table.Table) – Feature table (features x samples)

  • formula (str) – Design formula to use in model

  • metadata (pd.DataFrame) – Metadata for design matrix

  • beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0

  • inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

class birdman.default_models.NegativeBinomialSingle(table: Table, feature_id: str, formula: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5)

Fit count data using negative binomial model on single feature.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\log(\mu_{ij}) &= \log(\textrm{Depth}_i) + x_i \beta\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]
\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]
\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\]
Parameters
  • table (biom.table.Table) – Feature table (features x samples)

  • feature_id (str) – ID of feature to fit

  • formula (str) – Design formula to use in model

  • metadata (pd.DataFrame) – Metadata for design matrix

  • beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0

  • inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

class birdman.default_models.NegativeBinomialLME(table: Table, formula: str, group_var: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5, group_var_prior: float = 1.0)

Fit count data using negative binomial model considering subject as a random effect.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\mu_{ij} &= n_i p_{ij}\\\textrm{alr}(p_i) &= x_i \beta + z_i u\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]
\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]
\[ \begin{align}\begin{aligned}\frac{1}{\phi_j} &\sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\\u_j &\sim \textrm{Normal}(0, u_p),\ u_p \in \mathbb{R}_{>0}\end{aligned}\end{align} \]
Parameters
  • table (biom.table.Table) – Feature table (features x samples)

  • formula (str) – Design formula to use in model

  • group_var (str) – Variable in metadata to use as grouping

  • metadata (pd.DataFrame) – Metadata for design matrix

  • beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0

  • inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

  • group_var_prior (float) – Standard deviation for normally distributed prior values of group_var, defaults to 1.0

class birdman.default_models.NegativeBinomialLMESingle(table: Table, feature_id: str, formula: str, group_var: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5, group_var_prior: float = 1.0)

Fit count data using negative binomial model on single feature.

\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\log(\mu_{ij}) &= \log(\textrm{Depth}_i) + x_i \beta + z_i u\end{aligned}\end{align} \]

Priors:

\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]
\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]
\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\]
\[u_j &\sim \textrm{Normal}(0, u_p),\ u_p \in \mathbb{R}_{>0}\]
Parameters
  • table (biom.table.Table) – Feature table (features x samples)

  • feature_id (str) – ID of feature to fit

  • formula (str) – Design formula to use in model

  • group_var (str) – Variable in metadata to use as grouping

  • metadata (pd.DataFrame) – Metadata for design matrix

  • beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0

  • inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5

Table Model

You should inherit/instantiate this class if you are building a custom model for estimating parameters of an entire table at once.

class birdman.model_base.TableModel(table: Table, **kwargs)

Fit a model on the entire table at once.

Single Feature Model

This class is designed for those interested in parallelizing model fitting across multiple features at once. We do not explicitly perform parallelization but rather leave that to the user.

class birdman.model_base.SingleFeatureModel(table: Table, feature_id: str, **kwargs)

Fit a model for a single feature.

Model Iterator

This is a helper class for fitting the constituent SingleFeatureModels of a given table. It may be helpful to use this iterator in conjunction with a scheduler or other means of job submission.

class birdman.model_base.ModelIterator(table: Table, model: SingleFeatureModel, num_chunks: Optional[int] = None, **kwargs)

Iterate through features in a table.

This class is intended for those looking to parallelize model fitting across individual features rather than across Markov chains.

Parameters
  • table (biom.table.Table) – Feature table (features x samples)

  • model (birdman.model_base.SingleFeatureModel) – BIRDMAn model for each individual feature

  • num_chunks (int) – Number of chunks to split table features. By default does not do any chunking.

  • kwargs – Keyword arguments to pass to each feature model

Base Model

This is the abstract class from which all BIRDMAn models derive. Note that this class cannot be instantiated on its own.

class birdman.model_base.BaseModel(table: Table, model_path: str)

Base BIRDMAn model.

Parameters
  • table (biom.table.Table) – Feature table (features x samples)

  • model_path (str) – Filepath to Stan model

add_parameters(param_dict: Optional[dict] = None)

Add parameters from dict to be passed to Stan.

compile_model()

Compile Stan model.

create_regression(formula: str, metadata: DataFrame)

Generate design matrix for count regression modeling.

Parameters
  • formula (str) – Design formula to use in model

  • metadata (pd.DataFrame) – Metadata for design matrix

fit_model(method: str = 'vi', num_draws: int = 500, mcmc_warmup: Optional[int] = None, mcmc_chains: int = 4, vi_iter: int = 1000, vi_grad_samples: int = 40, vi_require_converged: bool = False, seed: float = 42, mcmc_kwargs: Optional[dict] = None, vi_kwargs: Optional[dict] = None)

Fit BIRDMAn model.

Parameters
  • method (str) – Method by which to fit model, either ‘mcmc’ (default) for Markov Chain Monte Carlo or ‘vi’ for Variational Inference

  • num_draws (int) – Number of output draws to sample from the posterior, default is 500

  • mcmc_warmup (int) – Number of warmup iterations for MCMC sampling, default is the same as num_draws

  • mcmc_chains (int) – Number of Markov chains to use for sampling, default is 4

  • vi_iter (int) – Number of ADVI iterations to use for VI, default is 1000

  • vi_grad_samples (int) – Number of MC draws for computing the gradient, default is 40

  • vi_require_converged (bool) – Whether or not to raise an error if Stan reports that “The algorithm may not have converged”, default is False

  • seed (int) – Random seed to use for sampling, default is 42

  • mcmc_kwargs – kwargs to pass into CmdStanModel.sample

  • vi_kwargs – kwargs to pass into CmdStanModel.variational

specify_model(params: Sequence[str], coords: dict, dims: dict, include_observed_data: bool = False, posterior_predictive: Optional[str] = None, log_likelihood: Optional[str] = None, **kwargs)

Specify coordinates and dimensions of model.

Parameters
  • params (Sequence[str]) – Posterior fitted parameters to include

  • coords (dict) – Mapping of entries in dims to labels

  • dims (dict) – Dimensions of parameters in the model

  • include_observed_data (bool) – Whether to include the original feature table values into the arviz InferenceData object, default is False

  • posterior_predictive (str, optional) – Name of posterior predictive values from Stan model to include in arviz InferenceData object

  • log_likelihood (str, optional) – Name of log likelihood values from Stan model to include in arviz InferenceData object

  • kwargs – Extra keyword arguments to save in specifications dict

abstract to_inference()

Convert fitted model to az.InferenceData.