dynast.estimation

Submodules

Package Contents

Functions

estimate_alpha(df_counts: pandas.DataFrame, pi_c: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], alpha_path: str, conversions: FrozenSet[str] = frozenset({'TC'}), group_by: Optional[List[str]] = None, pi_c_group_by: Optional[List[str]] = None) → str

Estimate the detection rate alpha.

read_alpha(alpha_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]

Read alpha CSV as a dictionary, with group_by columns as keys.

estimate_p_c(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) → str

Estimate the average conversion rate in labeled RNA.

read_p_c(p_c_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]

Read p_c CSV as a dictionary, with group_by columns as keys.

estimate_p_e(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), group_by: Optional[List[str]] = None) → str

Estimate background mutation rate of unabeled RNA by calculating the

estimate_p_e_control(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})})) → str

Estimate background mutation rate of unlabeled RNA for a control sample

estimate_p_e_nasc(df_rates: pandas.DataFrame, p_e_path: str, group_by: Optional[List[str]] = None) → str

Estimate background mutation rate of unabeled RNA by calculating the

read_p_e(p_e_path: str, group_by: Optional[List[str]] = None) → Dict[Union[str, Tuple[str, Ellipsis]], float]

Read p_e CSV as a dictionary, with group_by columns as keys.

estimate_pi(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) → str

Estimate the fraction of labeled RNA.

read_pi(pi_path: str, group_by: Optional[List[str]] = None) → Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]]

Read pi CSV as a dictionary.

dynast.estimation.estimate_alpha(df_counts: pandas.DataFrame, pi_c: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], alpha_path: str, conversions: FrozenSet[str] = frozenset({'TC'}), group_by: Optional[List[str]] = None, pi_c_group_by: Optional[List[str]] = None) str[source]

Estimate the detection rate alpha.

Parameters
df_counts

Pandas dataframe containing conversion counts

pi_c

Labeled mutation rate

alpha_path

Path to output CSV containing alpha estimates

conversions

Conversions to consider

group_by

Columns to group by

pi_c_group_by

Columns that were used to group when calculating pi_c

Returns

Path to output CSV containing alpha estimates

dynast.estimation.read_alpha(alpha_path: str, group_by: Optional[List[str]] = None) Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]][source]

Read alpha CSV as a dictionary, with group_by columns as keys.

Parameters
alpha_path

Path to CSV containing alpha values

group_by

Columns to group by, defaults to None

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.estimate_p_c(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) str[source]

Estimate the average conversion rate in labeled RNA.

Parameters
df_aggregates

Pandas dataframe containing aggregate values

p_e

Background mutation rate of unlabeled RNA

p_c_path

Path to output CSV containing p_c estimates

group_by

Columns to group by

threshold

Read count threshold

n_threads

Number of threads

nasc

Flag to indicate whether to use NASC-seq pipeline variant of the EM algorithm

Returns

Path to output CSV containing p_c estimates

dynast.estimation.read_p_c(p_c_path: str, group_by: Optional[List[str]] = None) Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]][source]

Read p_c CSV as a dictionary, with group_by columns as keys.

Parameters
p_c_path

Path to CSV containing p_c values

group_by

Columns to group by, defaults to None

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.estimate_p_e(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), group_by: Optional[List[str]] = None) str[source]

Estimate background mutation rate of unabeled RNA by calculating the average mutation rate of all three nucleotides other than conversion[0].

Parameters
df_counts

Pandas dataframe containing number of each conversion and nucleotide content of each read

p_e_path

Path to output CSV containing p_e estimates

conversions

Conversion(s) in question, defaults to frozenset([(‘TC’,)])

group_by

Columns to group by, defaults to None

Returns

Path to output CSV containing p_e estimates

dynast.estimation.estimate_p_e_control(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})})) str[source]

Estimate background mutation rate of unlabeled RNA for a control sample by simply calculating the average mutation rate.

Parameters
df_counts

Pandas dataframe containing number of each conversion and nucleotide content of each read

p_e_path

Path to output CSV containing p_e estimates

conversions

Conversion(s) in question

Returns

Path to output CSV containing p_e estimates

dynast.estimation.estimate_p_e_nasc(df_rates: pandas.DataFrame, p_e_path: str, group_by: Optional[List[str]] = None) str[source]

Estimate background mutation rate of unabeled RNA by calculating the average CT and GA mutation rates. This function imitates the procedure implemented in the NASC-seq pipeline (DOI: 10.1038/s41467-019-11028-9).

Parameters
df_counts

Pandas dataframe containing number of each conversion and nucleotide content of each read

p_e_path

Path to output CSV containing p_e estimates

group_by

Columns to group by, defaults to None

Returns

Path to output CSV containing p_e estimates

dynast.estimation.read_p_e(p_e_path: str, group_by: Optional[List[str]] = None) Dict[Union[str, Tuple[str, Ellipsis]], float][source]

Read p_e CSV as a dictionary, with group_by columns as keys.

Parameters
p_e_path

Path to CSV containing p_e values

group_by

Columns to group by

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.estimate_pi(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) str[source]

Estimate the fraction of labeled RNA.

Parameters
df_aggregates

Pandas dataframe containing aggregate values

p_e

Average mutation rate in unlabeled RNA

p_c

Average mutation rate in labeled RNA

pi_path

Path to write pi estimates

group_by

Columns that were used to group cells

p_group_by

Columns that p_e/p_c estimation was grouped by

n_threads

Number of threads

threshold

Any conversion-content pairs with fewer than this many reads will not be processed

seed

Random seed

nasc

Flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False

model

PyStan model to run MCMC with. If not provided, will try to compile the module manually

Returns

Path to pi output

dynast.estimation.read_pi(pi_path: str, group_by: Optional[List[str]] = None) Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]][source]

Read pi CSV as a dictionary.

Parameters
pi_path

path to CSV containing pi values

group_by

columns that were used to group estimation

Returns

Dictionary with barcodes and genes as keys