`dynast.estimation`

Submodules

Package Contents

Functions

`estimate_alpha`(df_counts: pandas.DataFrame, pi_c: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], alpha_path: str, conversions: FrozenSet[str] = frozenset({'TC'}), group_by: Optional[List[str]] = None, pi_c_group_by: Optional[List[str]] = None) → str	Estimate the detection rate alpha.
`read_alpha`(alpha_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]	Read alpha CSV as a dictionary, with group_by columns as keys.
`estimate_p_c`(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) → str	Estimate the average conversion rate in labeled RNA.
`read_p_c`(p_c_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]	Read p_c CSV as a dictionary, with group_by columns as keys.
`estimate_p_e`(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), group_by: Optional[List[str]] = None) → str	Estimate background mutation rate of unabeled RNA by calculating the
`estimate_p_e_control`(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})})) → str	Estimate background mutation rate of unlabeled RNA for a control sample
`estimate_p_e_nasc`(df_rates: pandas.DataFrame, p_e_path: str, group_by: Optional[List[str]] = None) → str	Estimate background mutation rate of unabeled RNA by calculating the
`read_p_e`(p_e_path: str, group_by: Optional[List[str]] = None) → Dict[Union[str, Tuple[str, Ellipsis]], float]	Read p_e CSV as a dictionary, with group_by columns as keys.
`estimate_pi`(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) → str	Estimate the fraction of labeled RNA.
`read_pi`(pi_path: str, group_by: Optional[List[str]] = None) → Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]]	Read pi CSV as a dictionary.

dynast.estimation.estimate_alpha(df_counts: pandas.DataFrame, pi_c: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], alpha_path: str, conversions: FrozenSet[str] = frozenset({'TC'}), group_by: Optional[List[str]] = None, pi_c_group_by: Optional[List[str]] = None) → str[source]

Estimate the detection rate alpha.

Parameters

df_counts: Pandas dataframe containing conversion counts
pi_c: Labeled mutation rate
alpha_path: Path to output CSV containing alpha estimates
conversions: Conversions to consider
group_by: Columns to group by
pi_c_group_by: Columns that were used to group when calculating pi_c

Returns

Path to output CSV containing alpha estimates

dynast.estimation.read_alpha(alpha_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]][source]

Read alpha CSV as a dictionary, with group_by columns as keys.

Parameters

alpha_path: Path to CSV containing alpha values
group_by: Columns to group by, defaults to None

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.estimate_p_c(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) → str[source]

Estimate the average conversion rate in labeled RNA.

Parameters

df_aggregates: Pandas dataframe containing aggregate values
p_e: Background mutation rate of unlabeled RNA
p_c_path: Path to output CSV containing p_c estimates
group_by: Columns to group by
threshold: Read count threshold
n_threads: Number of threads
nasc: Flag to indicate whether to use NASC-seq pipeline variant of the EM algorithm

Returns

Path to output CSV containing p_c estimates

dynast.estimation.read_p_c(p_c_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]][source]

Read p_c CSV as a dictionary, with group_by columns as keys.

Parameters

p_c_path: Path to CSV containing p_c values
group_by: Columns to group by, defaults to None

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.estimate_p_e(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), group_by: Optional[List[str]] = None) → str[source]

Estimate background mutation rate of unabeled RNA by calculating the average mutation rate of all three nucleotides other than conversion[0].

Parameters

df_counts: Pandas dataframe containing number of each conversion and nucleotide content of each read
p_e_path: Path to output CSV containing p_e estimates
conversions: Conversion(s) in question, defaults to frozenset([(‘TC’,)])
group_by: Columns to group by, defaults to None

Returns

Path to output CSV containing p_e estimates

dynast.estimation.estimate_p_e_control(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})})) → str[source]

Estimate background mutation rate of unlabeled RNA for a control sample by simply calculating the average mutation rate.

Parameters

df_counts: Pandas dataframe containing number of each conversion and nucleotide content of each read
p_e_path: Path to output CSV containing p_e estimates
conversions: Conversion(s) in question

Returns

Path to output CSV containing p_e estimates

dynast.estimation.estimate_p_e_nasc(df_rates: pandas.DataFrame, p_e_path: str, group_by: Optional[List[str]] = None) → str[source]

Estimate background mutation rate of unabeled RNA by calculating the average CT and GA mutation rates. This function imitates the procedure implemented in the NASC-seq pipeline (DOI: 10.1038/s41467-019-11028-9).

Parameters

df_counts: Pandas dataframe containing number of each conversion and nucleotide content of each read
p_e_path: Path to output CSV containing p_e estimates
group_by: Columns to group by, defaults to None

Returns

Path to output CSV containing p_e estimates

dynast.estimation.read_p_e(p_e_path: str, group_by: Optional[List[str]] = None) → Dict[Union[str, Tuple[str, Ellipsis]], float][source]

Read p_e CSV as a dictionary, with group_by columns as keys.

Parameters

p_e_path: Path to CSV containing p_e values
group_by: Columns to group by

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.estimate_pi(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) → str[source]

Estimate the fraction of labeled RNA.

Parameters

df_aggregates: Pandas dataframe containing aggregate values
p_e: Average mutation rate in unlabeled RNA
p_c: Average mutation rate in labeled RNA
pi_path: Path to write pi estimates
group_by: Columns that were used to group cells
p_group_by: Columns that p_e/p_c estimation was grouped by
n_threads: Number of threads
threshold: Any conversion-content pairs with fewer than this many reads will not be processed
seed: Random seed
nasc: Flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False
model: PyStan model to run MCMC with. If not provided, will try to compile the module manually

Returns

Path to pi output

dynast.estimation.read_pi(pi_path: str, group_by: Optional[List[str]] = None) → Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]][source]

Read pi CSV as a dictionary.

Parameters

pi_path: path to CSV containing pi values
group_by: columns that were used to group estimation

Returns

Dictionary with barcodes and genes as keys

dynast.estimation

Submodules

Package Contents

Functions

`dynast.estimation`