BIRDMAn Model API¶
BIRDMAn includes several default models for count regression but also supports custom modeling. This involves creating a new Stan file and creating a Model object through the BIRDMAn API.
Default Models¶
These are the default models that are included in BIRDMAn. They should be usable with minimal knowledge of Stan and are good general purpose models.
- class birdman.default_models.NegativeBinomial(table: Table, formula: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5)¶
Fit count data using negative binomial model on full table.
\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\mu_{ij} &= n_i p_{ij}\\\textrm{alr}(p_i) &= x_i \beta\end{aligned}\end{align} \]Priors:
\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]\[A = \ln{\frac{1}{D}}, \ D = \textrm{Number of features}\]\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s), \ s \in \mathbb{R}_{>0}\]- Parameters
table (biom.table.Table) – Feature table (features x samples)
formula (str) – Design formula to use in model
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5
- class birdman.default_models.NegativeBinomialSingle(table: Table, feature_id: str, formula: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5)¶
Fit count data using negative binomial model on single feature.
\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\log(\mu_{ij}) &= \log(\textrm{Depth}_i) + x_i \beta\end{aligned}\end{align} \]Priors:
\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\]- Parameters
table (biom.table.Table) – Feature table (features x samples)
feature_id (str) – ID of feature to fit
formula (str) – Design formula to use in model
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5
- class birdman.default_models.NegativeBinomialLME(table: Table, formula: str, group_var: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5, group_var_prior: float = 1.0)¶
Fit count data using negative binomial model considering subject as a random effect.
\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\mu_{ij} &= n_i p_{ij}\\\textrm{alr}(p_i) &= x_i \beta + z_i u\end{aligned}\end{align} \]Priors:
\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]\[ \begin{align}\begin{aligned}\frac{1}{\phi_j} &\sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\\u_j &\sim \textrm{Normal}(0, u_p),\ u_p \in \mathbb{R}_{>0}\end{aligned}\end{align} \]- Parameters
table (biom.table.Table) – Feature table (features x samples)
formula (str) – Design formula to use in model
group_var (str) – Variable in metadata to use as grouping
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5
group_var_prior (float) – Standard deviation for normally distributed prior values of group_var, defaults to 1.0
- class birdman.default_models.NegativeBinomialLMESingle(table: Table, feature_id: str, formula: str, group_var: str, metadata: DataFrame, beta_prior: float = 5.0, inv_disp_sd: float = 0.5, group_var_prior: float = 1.0)¶
Fit count data using negative binomial model on single feature.
\[ \begin{align}\begin{aligned}y_{ij} &\sim \textrm{NB}(\mu_{ij}, \phi_j)\\\log(\mu_{ij}) &= \log(\textrm{Depth}_i) + x_i \beta + z_i u\end{aligned}\end{align} \]Priors:
\[ \begin{align}\begin{aligned}\beta_j \sim \begin{cases} \textrm{Normal}(A, B_p), & j = 0\\ \textrm{Normal}(0, B_p), & j > 0 \end{cases}\end{aligned}\end{align} \]\[A = \ln{\frac{1}{D}},\ D = \textrm{Number of features}\]\[\frac{1}{\phi_j} \sim \textrm{Lognormal}(0, s),\ s \in \mathbb{R}_{>0}\]\[u_j &\sim \textrm{Normal}(0, u_p),\ u_p \in \mathbb{R}_{>0}\]- Parameters
table (biom.table.Table) – Feature table (features x samples)
feature_id (str) – ID of feature to fit
formula (str) – Design formula to use in model
group_var (str) – Variable in metadata to use as grouping
metadata (pd.DataFrame) – Metadata for design matrix
beta_prior (float) – Standard deviation for normally distributed prior values of beta, defaults to 5.0
inv_disp_sd (float) – Standard deviation for lognormally distributed prior values of 1/phi, defaults to 0.5
Table Model¶
You should inherit/instantiate this class if you are building a custom model for estimating parameters of an entire table at once.
- class birdman.model_base.TableModel(table: Table, **kwargs)¶
Fit a model on the entire table at once.
Single Feature Model¶
This class is designed for those interested in parallelizing model fitting across multiple features at once. We do not explicitly perform parallelization but rather leave that to the user.
- class birdman.model_base.SingleFeatureModel(table: Table, feature_id: str, **kwargs)¶
Fit a model for a single feature.
Model Iterator¶
This is a helper class for fitting the constituent SingleFeatureModels
of a given table. It may be helpful to use this iterator in conjunction with a scheduler or other means of job submission.
- class birdman.model_base.ModelIterator(table: Table, model: SingleFeatureModel, num_chunks: Optional[int] = None, **kwargs)¶
Iterate through features in a table.
This class is intended for those looking to parallelize model fitting across individual features rather than across Markov chains.
- Parameters
table (biom.table.Table) – Feature table (features x samples)
model (birdman.model_base.SingleFeatureModel) – BIRDMAn model for each individual feature
num_chunks (int) – Number of chunks to split table features. By default does not do any chunking.
kwargs – Keyword arguments to pass to each feature model
Base Model¶
This is the abstract class from which all BIRDMAn models derive. Note that this class cannot be instantiated on its own.
- class birdman.model_base.BaseModel(table: Table, model_path: str)¶
Base BIRDMAn model.
- Parameters
table (biom.table.Table) – Feature table (features x samples)
model_path (str) – Filepath to Stan model
- add_parameters(param_dict: Optional[dict] = None)¶
Add parameters from dict to be passed to Stan.
- compile_model()¶
Compile Stan model.
- create_regression(formula: str, metadata: DataFrame)¶
Generate design matrix for count regression modeling.
- Parameters
formula (str) – Design formula to use in model
metadata (pd.DataFrame) – Metadata for design matrix
- fit_model(method: str = 'vi', num_draws: int = 500, mcmc_warmup: Optional[int] = None, mcmc_chains: int = 4, vi_iter: int = 1000, vi_grad_samples: int = 40, vi_require_converged: bool = False, seed: float = 42, mcmc_kwargs: Optional[dict] = None, vi_kwargs: Optional[dict] = None)¶
Fit BIRDMAn model.
- Parameters
method (str) – Method by which to fit model, either ‘mcmc’ (default) for Markov Chain Monte Carlo or ‘vi’ for Variational Inference
num_draws (int) – Number of output draws to sample from the posterior, default is 500
mcmc_warmup (int) – Number of warmup iterations for MCMC sampling, default is the same as num_draws
mcmc_chains (int) – Number of Markov chains to use for sampling, default is 4
vi_iter (int) – Number of ADVI iterations to use for VI, default is 1000
vi_grad_samples (int) – Number of MC draws for computing the gradient, default is 40
vi_require_converged (bool) – Whether or not to raise an error if Stan reports that “The algorithm may not have converged”, default is False
seed (int) – Random seed to use for sampling, default is 42
mcmc_kwargs – kwargs to pass into CmdStanModel.sample
vi_kwargs – kwargs to pass into CmdStanModel.variational
- specify_model(params: Sequence[str], coords: dict, dims: dict, include_observed_data: bool = False, posterior_predictive: Optional[str] = None, log_likelihood: Optional[str] = None, **kwargs)¶
Specify coordinates and dimensions of model.
- Parameters
params (Sequence[str]) – Posterior fitted parameters to include
coords (dict) – Mapping of entries in dims to labels
dims (dict) – Dimensions of parameters in the model
include_observed_data (bool) – Whether to include the original feature table values into the
arviz
InferenceData object, default is Falseposterior_predictive (str, optional) – Name of posterior predictive values from Stan model to include in
arviz
InferenceData objectlog_likelihood (str, optional) – Name of log likelihood values from Stan model to include in
arviz
InferenceData objectkwargs – Extra keyword arguments to save in specifications dict
- abstract to_inference()¶
Convert fitted model to az.InferenceData.