dynast.estimation.p_c
Module Contents
Functions
|
Read p_c CSV as a dictionary, with group_by columns as keys. |
|
Numbaized binomial PMF function for faster calculation. |
|
NASC-seq pipeline variant of the EM algorithm to estimate average |
|
Run EM algorithm to estimate average conversion rate in labeled RNA. |
|
Estimate the average conversion rate in labeled RNA. |
- dynast.estimation.p_c.read_p_c(p_c_path: str, group_by: Optional[List[str]] = None) Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]] [source]
Read p_c CSV as a dictionary, with group_by columns as keys.
- Parameters
- p_c_path
Path to CSV containing p_c values
- group_by
Columns to group by, defaults to None
- Returns
Dictionary with group_by columns as keys (tuple if multiple)
- dynast.estimation.p_c.binomial_pmf(k: int, n: int, p: int) float [source]
Numbaized binomial PMF function for faster calculation.
- Parameters
- k
Number of successes
- n
Number of trials
- p
Probability of success
- Returns
- Probability of observing k successes in n trials with probability
of success p
- dynast.estimation.p_c.expectation_maximization_nasc(values: numpy.ndarray, p_e: float, threshold: float = 0.01) float [source]
NASC-seq pipeline variant of the EM algorithm to estimate average conversion rate in labeled RNA.
- Parameters
- values
N x C Numpy array where N is the number of conversions, C is the nucleotide content, and the value at this position is the number of reads observed
- p_e
Background mutation rate of unlabeled RNA
- threshold
Filter threshold
- Returns
Estimated conversion rate
- dynast.estimation.p_c.expectation_maximization(values: numpy.ndarray, p_e: float, p_c: float = 0.1, threshold: float = 0.01, max_iters: int = 300) float [source]
Run EM algorithm to estimate average conversion rate in labeled RNA.
This function runs the following two steps. 1) Constructs a sparse matrix representation of values and filters out certain
indices that are expected to contain more than threshold proportion of unlabeled reads.
Runs an EM algorithm that iteratively updates the filtered out data and stimation.
See https://doi.org/10.1093/bioinformatics/bty256.
- Parameters
- values
array of three columns encoding a sparse array in (row, column, value) format, zero-indexed, where row: number of conversions column: nucleotide content value: number of reads
- p_e
Background mutation rate of unlabeled RNA
- p_c
Initial p_c value
- threshold
Filter threshold
- max_iters
Maximum number of EM iterations
- Returns
Estimated conversion rate
- dynast.estimation.p_c.estimate_p_c(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) str [source]
Estimate the average conversion rate in labeled RNA.
- Parameters
- df_aggregates
Pandas dataframe containing aggregate values
- p_e
Background mutation rate of unlabeled RNA
- p_c_path
Path to output CSV containing p_c estimates
- group_by
Columns to group by
- threshold
Read count threshold
- n_threads
Number of threads
- nasc
Flag to indicate whether to use NASC-seq pipeline variant of the EM algorithm
- Returns
Path to output CSV containing p_c estimates