`dynast.estimation.p_c`

Module Contents

Functions

`read_p_c`(p_c_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]]	Read p_c CSV as a dictionary, with group_by columns as keys.
`binomial_pmf`(k: int, n: int, p: int) → float	Numbaized binomial PMF function for faster calculation.
`expectation_maximization_nasc`(values: numpy.ndarray, p_e: float, threshold: float = 0.01) → float	NASC-seq pipeline variant of the EM algorithm to estimate average
`expectation_maximization`(values: numpy.ndarray, p_e: float, p_c: float = 0.1, threshold: float = 0.01, max_iters: int = 300) → float	Run EM algorithm to estimate average conversion rate in labeled RNA.
`estimate_p_c`(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) → str	Estimate the average conversion rate in labeled RNA.

dynast.estimation.p_c.read_p_c(p_c_path: str, group_by: Optional[List[str]] = None) → Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]][source]

Read p_c CSV as a dictionary, with group_by columns as keys.

Parameters

p_c_path: Path to CSV containing p_c values
group_by: Columns to group by, defaults to None

Returns

Dictionary with group_by columns as keys (tuple if multiple)

dynast.estimation.p_c.binomial_pmf(k: int, n: int, p: int) → float[source]

Numbaized binomial PMF function for faster calculation.

Parameters

k: Number of successes
n: Number of trials
p: Probability of success

Returns

Probability of observing k successes in n trials with probability: of success p

dynast.estimation.p_c.expectation_maximization_nasc(values: numpy.ndarray, p_e: float, threshold: float = 0.01) → float[source]

NASC-seq pipeline variant of the EM algorithm to estimate average conversion rate in labeled RNA.

Parameters

values: N x C Numpy array where N is the number of conversions, C is the nucleotide content, and the value at this position is the number of reads observed
p_e: Background mutation rate of unlabeled RNA
threshold: Filter threshold

Returns

Estimated conversion rate

dynast.estimation.p_c.expectation_maximization(values: numpy.ndarray, p_e: float, p_c: float = 0.1, threshold: float = 0.01, max_iters: int = 300) → float[source]

Run EM algorithm to estimate average conversion rate in labeled RNA.

This function runs the following two steps. 1) Constructs a sparse matrix representation of values and filters out certain

indices that are expected to contain more than threshold proportion of unlabeled reads.

Runs an EM algorithm that iteratively updates the filtered out data and stimation.

See https://doi.org/10.1093/bioinformatics/bty256.

Parameters

values: array of three columns encoding a sparse array in (row, column, value) format, zero-indexed, where row: number of conversions column: nucleotide content value: number of reads
p_e: Background mutation rate of unlabeled RNA
p_c: Initial p_c value
threshold: Filter threshold
max_iters: Maximum number of EM iterations

Returns

Estimated conversion rate

dynast.estimation.p_c.estimate_p_c(df_aggregates: pandas.DataFrame, p_e: Union[float, Dict[str, float], Dict[Tuple[str, Ellipsis], float]], p_c_path: str, group_by: Optional[List[str]] = None, threshold: int = 1000, n_threads: int = 8, nasc: bool = False) → str[source]

Estimate the average conversion rate in labeled RNA.

Parameters

df_aggregates: Pandas dataframe containing aggregate values
p_e: Background mutation rate of unlabeled RNA
p_c_path: Path to output CSV containing p_c estimates
group_by: Columns to group by
threshold: Read count threshold
n_threads: Number of threads
nasc: Flag to indicate whether to use NASC-seq pipeline variant of the EM algorithm

Returns

Path to output CSV containing p_c estimates

dynast.estimation.p_c

Module Contents

Functions

`dynast.estimation.p_c`