dynast.preprocessing.snp
Module Contents
Functions
|
Read SNPs CSV as a dictionary |
|
Read a user-provided SNPs CSV |
|
Extract number of conversions for every genomic position. |
|
Wrapper around extract_conversions_part that works in parallel |
|
Detect SNPs. |
Attributes
- dynast.preprocessing.snp.read_snps(snps_path: str) Dict[str, Dict[str, Set[int]]] [source]
Read SNPs CSV as a dictionary
- Parameters
- snps_path
Path to SNPs CSV
- Returns
Dictionary of contigs as keys and sets of genomic positions with SNPs as values
- dynast.preprocessing.snp.read_snp_csv(snp_csv: str) Dict[str, Dict[str, Set[int]]] [source]
Read a user-provided SNPs CSV
- Parameters
- snp_csv
Path to SNPs CSV
- Returns
Dictionary of contigs as keys and sets of genomic positions with SNPs as values
- dynast.preprocessing.snp.extract_conversions_part(conversions_path: str, counter: multiprocessing.Value, lock: multiprocessing.Lock, index: List[Tuple[int, int, int]], alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, update_every: int = 5000) Dict[str, Dict[str, Dict[int, int]]] [source]
Extract number of conversions for every genomic position.
- Parameters
- conversions_path
Path to conversions CSV
- counter
Counter that keeps track of how many reads have been processed
- lock
Semaphore for the counter so that multiple processes do not modify it at the same time
- index
Conversions index
- alignments
Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
- conversions
Set of conversions to consider
- quality
Only count conversions with PHRED quality greater than this value
- update_every
Update the counter every this many reads
- Returns
Nested dictionary that contains number of conversions for each contig and position
- dynast.preprocessing.snp.extract_conversions(conversions_path: str, index_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, n_threads: int = 8) Dict[str, Dict[str, Dict[int, int]]] [source]
Wrapper around extract_conversions_part that works in parallel
- Parameters
- conversions_path
Path to conversions CSV
- index_path
Path to conversions index
- alignments
Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
- conversions
Set of conversions to consider
- quality
Only count conversions with PHRED quality greater than this value
- n_threads
Number of threads
- Returns
Nested dictionary that contains number of conversions for each contig and position
- dynast.preprocessing.snp.detect_snps(conversions_path: str, index_path: str, coverage: Dict[str, Dict[int, int]], snps_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, threshold: float = 0.5, min_coverage: int = 1, n_threads: int = 8) str [source]
Detect SNPs.
- Parameters
- conversions_path
Path to conversions CSV
- index_path
Path to conversions index
- coverage
Dictionary containing genomic coverage
- snps_path
Path to output SNPs
- alignments
Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
- conversions
Set of conversions to consider
- quality
Only count conversions with PHRED quality greater than this value
- threshold
Positions with conversions / coverage > threshold will be considered as SNPs
- min_coverage
Only positions with at least this many mapping read_snps are considered
- n_threads
Number of threads
- Returns
Path to SNPs CSV