dynast.estimation
Submodules
Package Contents
Functions
|
Estimate the detection rate alpha. |
|
Read alpha CSV as a dictionary, with group_by columns as keys. |
|
Estimate the average conversion rate in labeled RNA. |
|
Read p_c CSV as a dictionary, with group_by columns as keys. |
|
Estimate background mutation rate of unabeled RNA by calculating the |
|
Estimate background mutation rate of unlabeled RNA for a control sample |
|
Estimate background mutation rate of unabeled RNA by calculating the |
|
Read p_e CSV as a dictionary, with group_by columns as keys. |
|
Estimate the fraction of labeled RNA. |
|
Read pi CSV as a dictionary. |
- dynast.estimation.estimate_alpha(df_counts: pandas.DataFrame, pi_c: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], alpha_path: str, conversions: FrozenSet[str] = frozenset({'TC'}), group_by: Optional[List[str]] = None, pi_c_group_by: Optional[List[str]] = None) str [source]
Estimate the detection rate alpha.
- Parameters
- df_counts
Pandas dataframe containing conversion counts
- pi_c
Labeled mutation rate
- alpha_path
Path to output CSV containing alpha estimates
- conversions
Conversions to consider
- group_by
Columns to group by
- pi_c_group_by
Columns that were used to group when calculating pi_c
- Returns
Path to output CSV containing alpha estimates
- dynast.estimation.read_alpha(alpha_path: str, group_by: Optional[List[str]] = None) Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]] [source]
Read alpha CSV as a dictionary, with group_by columns as keys.
- Parameters
- alpha_path
Path to CSV containing alpha values
- group_by
Columns to group by, defaults to None
- Returns
Dictionary with group_by columns as keys (tuple if multiple)
- dynast.estimation.estimate_p_c(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) str [source]
Estimate the average conversion rate in labeled RNA.
- Parameters
- df_aggregates
Pandas dataframe containing aggregate values
- p_e
Background mutation rate of unlabeled RNA
- p_c_path
Path to output CSV containing p_c estimates
- group_by
Columns to group by
- threshold
Read count threshold
- n_threads
Number of threads
- nasc
Flag to indicate whether to use NASC-seq pipeline variant of the EM algorithm
- Returns
Path to output CSV containing p_c estimates
- dynast.estimation.read_p_c(p_c_path: str, group_by: Optional[List[str]] = None) Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]] [source]
Read p_c CSV as a dictionary, with group_by columns as keys.
- Parameters
- p_c_path
Path to CSV containing p_c values
- group_by
Columns to group by, defaults to None
- Returns
Dictionary with group_by columns as keys (tuple if multiple)
- dynast.estimation.estimate_p_e(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), group_by: Optional[List[str]] = None) str [source]
Estimate background mutation rate of unabeled RNA by calculating the average mutation rate of all three nucleotides other than conversion[0].
- Parameters
- df_counts
Pandas dataframe containing number of each conversion and nucleotide content of each read
- p_e_path
Path to output CSV containing p_e estimates
- conversions
Conversion(s) in question, defaults to frozenset([(‘TC’,)])
- group_by
Columns to group by, defaults to None
- Returns
Path to output CSV containing p_e estimates
- dynast.estimation.estimate_p_e_control(df_counts: pandas.DataFrame, p_e_path: str, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})})) str [source]
Estimate background mutation rate of unlabeled RNA for a control sample by simply calculating the average mutation rate.
- Parameters
- df_counts
Pandas dataframe containing number of each conversion and nucleotide content of each read
- p_e_path
Path to output CSV containing p_e estimates
- conversions
Conversion(s) in question
- Returns
Path to output CSV containing p_e estimates
- dynast.estimation.estimate_p_e_nasc(df_rates: pandas.DataFrame, p_e_path: str, group_by: Optional[List[str]] = None) str [source]
Estimate background mutation rate of unabeled RNA by calculating the average CT and GA mutation rates. This function imitates the procedure implemented in the NASC-seq pipeline (DOI: 10.1038/s41467-019-11028-9).
- Parameters
- df_counts
Pandas dataframe containing number of each conversion and nucleotide content of each read
- p_e_path
Path to output CSV containing p_e estimates
- group_by
Columns to group by, defaults to None
- Returns
Path to output CSV containing p_e estimates
- dynast.estimation.read_p_e(p_e_path: str, group_by: Optional[List[str]] = None) Dict[Union[str, Tuple[str, Ellipsis]], float] [source]
Read p_e CSV as a dictionary, with group_by columns as keys.
- Parameters
- p_e_path
Path to CSV containing p_e values
- group_by
Columns to group by
- Returns
Dictionary with group_by columns as keys (tuple if multiple)
- dynast.estimation.estimate_pi(df_aggregates: pandas.DataFrame, p_e: float, p_c: float, pi_path: str, group_by: Optional[List[str]] = None, p_group_by: Optional[List[str]] = None, n_threads: int = 8, threshold: int = 16, seed: Optional[int] = None, nasc: bool = False, model: Optional[pystan.StanModel] = None) str [source]
Estimate the fraction of labeled RNA.
- Parameters
- df_aggregates
Pandas dataframe containing aggregate values
- p_e
Average mutation rate in unlabeled RNA
- p_c
Average mutation rate in labeled RNA
- pi_path
Path to write pi estimates
- group_by
Columns that were used to group cells
- p_group_by
Columns that p_e/p_c estimation was grouped by
- n_threads
Number of threads
- threshold
Any conversion-content pairs with fewer than this many reads will not be processed
- seed
Random seed
- nasc
Flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False
- model
PyStan model to run MCMC with. If not provided, will try to compile the module manually
- Returns
Path to pi output
- dynast.estimation.read_pi(pi_path: str, group_by: Optional[List[str]] = None) Tuple[Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]] [source]
Read pi CSV as a dictionary.
- Parameters
- pi_path
path to CSV containing pi values
- group_by
columns that were used to group estimation
- Returns
Dictionary with barcodes and genes as keys