dynast.estimation.pi

Module Contents

Functions

read_pi(pi_path: str, group_by: Optional[List[str]] = None) → Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]]

Read pi CSV as a dictionary.

initializer(model: pystan.StanModel)

Multiprocessing initializer.

beta_mean(alpha: float, beta: float) → float

Calculate the mean of a beta distribution.

beta_mode(alpha: float, beta: float) → float

Calculate the mode of a beta distribution.

guess_beta_parameters(guess: float, strength: int = 5) → Tuple[float, float]

Given a guess of the mean of a beta distribution, calculate beta

fit_stan_mcmc(values: numpy.ndarray, p_e: float, p_c: float, guess: float = 0.5, model: pystan.StanModel = None, n_chains: int = 1, n_warmup: int = 1000, n_iters: int = 1000, n_threads: int = 1, seed: Optional[int] = None) → Tuple[float, float, float, float]

Run MCMC to estimate the fraction of labeled RNA.

estimate_pi(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) → str

Estimate the fraction of labeled RNA.

Attributes

_model

dynast.estimation.pi.read_pi(pi_path: str, group_by: Optional[List[str]] = None) Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]][source]

Read pi CSV as a dictionary.

Parameters
pi_path

path to CSV containing pi values

group_by

columns that were used to group estimation

Returns

Dictionary with barcodes and genes as keys

dynast.estimation.pi._model[source]
dynast.estimation.pi.initializer(model: pystan.StanModel)[source]

Multiprocessing initializer. https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor

This initializer performs a one-time expensive initialization for each process.

dynast.estimation.pi.beta_mean(alpha: float, beta: float) float[source]

Calculate the mean of a beta distribution. https://en.wikipedia.org/wiki/Beta_distribution

Parameters
alpha

First parameter of the beta distribution

beta

Second parameter of the beta distribution

Returns

Mean of the beta distribution

dynast.estimation.pi.beta_mode(alpha: float, beta: float) float[source]

Calculate the mode of a beta distribution. https://en.wikipedia.org/wiki/Beta_distribution

When the distribution is bimodal (alpha, beta < 1), this function returns nan.

Parameters
alpha

First parameter of the beta distribution

beta

Second parameter of the beta distribution

Returns

Mode of the beta distribution

dynast.estimation.pi.guess_beta_parameters(guess: float, strength: int = 5) Tuple[float, float][source]

Given a guess of the mean of a beta distribution, calculate beta distribution parameters such that the distribution is skewed by some strength toward the guess.

Parameters
guess

Guess of the mean of the beta distribution

strength

Strength of the skew

Returns

Beta distribution parameters (alpha, beta)

dynast.estimation.pi.fit_stan_mcmc(values: numpy.ndarray, p_e: float, p_c: float, guess: float = 0.5, model: pystan.StanModel = None, n_chains: int = 1, n_warmup: int = 1000, n_iters: int = 1000, n_threads: int = 1, seed: Optional[int] = None) Tuple[float, float, float, float][source]

Run MCMC to estimate the fraction of labeled RNA.

Parameters
values

Array of three columns encoding a sparse array in (row, column, value) format, zero-indexed, where row: number of conversions column: nucleotide content value: number of reads

p_e

Average mutation rate in unlabeled RNA

p_c

Average mutation rate in labeled RNA

guess

Guess for the fraction of labeled RNA

model

PyStan model to run MCMC with. If not provided, will try to use the _model global variable

n_chains

Number of MCMC chains

n_warmup

Number of warmup iterations

n_iters

Number of MCMC iterations, excluding any warmups

n_threads

Number of threads to use

seed

random seed used for MCMC

Returns

(guess, alpha, beta, pi)

dynast.estimation.pi.estimate_pi(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) str[source]

Estimate the fraction of labeled RNA.

Parameters
df_aggregates

Pandas dataframe containing aggregate values

p_e

Average mutation rate in unlabeled RNA

p_c

Average mutation rate in labeled RNA

pi_path

Path to write pi estimates

group_by

Columns that were used to group cells

p_group_by

Columns that p_e/p_c estimation was grouped by

n_threads

Number of threads

threshold

Any conversion-content pairs with fewer than this many reads will not be processed

seed

Random seed

nasc

Flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False

model

PyStan model to run MCMC with. If not provided, will try to compile the module manually

Returns

Path to pi output