`dynast.preprocessing.snp`

Module Contents

Functions

`read_snps`(snps_path: str) → Dict[str, Dict[str, Set[int]]]	Read SNPs CSV as a dictionary
`read_snp_csv`(snp_csv: str) → Dict[str, Dict[str, Set[int]]]	Read a user-provided SNPs CSV
`extract_conversions_part`(conversions_path: str, counter: multiprocessing.Value, lock: multiprocessing.Lock, index: List[Tuple[int, int, int]], alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, update_every: int = 5000) → Dict[str, Dict[str, Dict[int, int]]]	Extract number of conversions for every genomic position.
`extract_conversions`(conversions_path: str, index_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, n_threads: int = 8) → Dict[str, Dict[str, Dict[int, int]]]	Wrapper around extract_conversions_part that works in parallel
`detect_snps`(conversions_path: str, index_path: str, coverage: Dict[str, Dict[int, int]], snps_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, threshold: float = 0.5, min_coverage: int = 1, n_threads: int = 8) → str	Detect SNPs.

Attributes

SNP_COLUMNS

dynast.preprocessing.snp.SNP_COLUMNS = ['contig', 'genome_i', 'conversion'][source]

dynast.preprocessing.snp.read_snps(snps_path: str) → Dict[str, Dict[str, Set[int]]][source]

Read SNPs CSV as a dictionary

Parameters

snps_path: Path to SNPs CSV

Returns

Dictionary of contigs as keys and sets of genomic positions with SNPs as values

dynast.preprocessing.snp.read_snp_csv(snp_csv: str) → Dict[str, Dict[str, Set[int]]][source]

Read a user-provided SNPs CSV

Parameters

snp_csv: Path to SNPs CSV

Returns

Dictionary of contigs as keys and sets of genomic positions with SNPs as values

dynast.preprocessing.snp.extract_conversions_part(conversions_path: str, counter: multiprocessing.Value, lock: multiprocessing.Lock, index: List[Tuple[int, int, int]], alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, update_every: int = 5000) → Dict[str, Dict[str, Dict[int, int]]][source]

Extract number of conversions for every genomic position.

Parameters

conversions_path: Path to conversions CSV
counter: Counter that keeps track of how many reads have been processed
lock: Semaphore for the counter so that multiple processes do not modify it at the same time
index: Conversions index
alignments: Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions: Set of conversions to consider
quality: Only count conversions with PHRED quality greater than this value
update_every: Update the counter every this many reads

Returns

Nested dictionary that contains number of conversions for each contig and position

dynast.preprocessing.snp.extract_conversions(conversions_path: str, index_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, n_threads: int = 8) → Dict[str, Dict[str, Dict[int, int]]][source]

Wrapper around extract_conversions_part that works in parallel

Parameters

conversions_path: Path to conversions CSV
index_path: Path to conversions index
alignments: Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions: Set of conversions to consider
quality: Only count conversions with PHRED quality greater than this value
n_threads: Number of threads

Returns

Nested dictionary that contains number of conversions for each contig and position

dynast.preprocessing.snp.detect_snps(conversions_path: str, index_path: str, coverage: Dict[str, Dict[int, int]], snps_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, threshold: float = 0.5, min_coverage: int = 1, n_threads: int = 8) → str[source]

Detect SNPs.

Parameters

conversions_path: Path to conversions CSV
index_path: Path to conversions index
coverage: Dictionary containing genomic coverage
snps_path: Path to output SNPs
alignments: Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions: Set of conversions to consider
quality: Only count conversions with PHRED quality greater than this value
threshold: Positions with conversions / coverage > threshold will be considered as SNPs
min_coverage: Only positions with at least this many mapping read_snps are considered
n_threads: Number of threads

Returns

Path to SNPs CSV

dynast.preprocessing.snp

Module Contents

Functions

Attributes

`dynast.preprocessing.snp`