dynast.preprocessing.snp

Module Contents

Functions

read_snps(snps_path: str) → Dict[str, Dict[str, Set[int]]]

Read SNPs CSV as a dictionary

read_snp_csv(snp_csv: str) → Dict[str, Dict[str, Set[int]]]

Read a user-provided SNPs CSV

extract_conversions_part(conversions_path: str, counter: multiprocessing.Value, lock: multiprocessing.Lock, index: List[Tuple[int, int, int]], alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, update_every: int = 5000) → Dict[str, Dict[str, Dict[int, int]]]

Extract number of conversions for every genomic position.

extract_conversions(conversions_path: str, index_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, n_threads: int = 8) → Dict[str, Dict[str, Dict[int, int]]]

Wrapper around extract_conversions_part that works in parallel

detect_snps(conversions_path: str, index_path: str, coverage: Dict[str, Dict[int, int]], snps_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, threshold: float = 0.5, min_coverage: int = 1, n_threads: int = 8) → str

Detect SNPs.

Attributes

SNP_COLUMNS

dynast.preprocessing.snp.SNP_COLUMNS = ['contig', 'genome_i', 'conversion'][source]
dynast.preprocessing.snp.read_snps(snps_path: str) Dict[str, Dict[str, Set[int]]][source]

Read SNPs CSV as a dictionary

Parameters
snps_path

Path to SNPs CSV

Returns

Dictionary of contigs as keys and sets of genomic positions with SNPs as values

dynast.preprocessing.snp.read_snp_csv(snp_csv: str) Dict[str, Dict[str, Set[int]]][source]

Read a user-provided SNPs CSV

Parameters
snp_csv

Path to SNPs CSV

Returns

Dictionary of contigs as keys and sets of genomic positions with SNPs as values

dynast.preprocessing.snp.extract_conversions_part(conversions_path: str, counter: multiprocessing.Value, lock: multiprocessing.Lock, index: List[Tuple[int, int, int]], alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, update_every: int = 5000) Dict[str, Dict[str, Dict[int, int]]][source]

Extract number of conversions for every genomic position.

Parameters
conversions_path

Path to conversions CSV

counter

Counter that keeps track of how many reads have been processed

lock

Semaphore for the counter so that multiple processes do not modify it at the same time

index

Conversions index

alignments

Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.

conversions

Set of conversions to consider

quality

Only count conversions with PHRED quality greater than this value

update_every

Update the counter every this many reads

Returns

Nested dictionary that contains number of conversions for each contig and position

dynast.preprocessing.snp.extract_conversions(conversions_path: str, index_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, n_threads: int = 8) Dict[str, Dict[str, Dict[int, int]]][source]

Wrapper around extract_conversions_part that works in parallel

Parameters
conversions_path

Path to conversions CSV

index_path

Path to conversions index

alignments

Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.

conversions

Set of conversions to consider

quality

Only count conversions with PHRED quality greater than this value

n_threads

Number of threads

Returns

Nested dictionary that contains number of conversions for each contig and position

dynast.preprocessing.snp.detect_snps(conversions_path: str, index_path: str, coverage: Dict[str, Dict[int, int]], snps_path: str, alignments: Optional[List[Tuple[str, int]]] = None, conversions: Optional[FrozenSet[str]] = None, quality: int = 27, threshold: float = 0.5, min_coverage: int = 1, n_threads: int = 8) str[source]

Detect SNPs.

Parameters
conversions_path

Path to conversions CSV

index_path

Path to conversions index

coverage

Dictionary containing genomic coverage

snps_path

Path to output SNPs

alignments

Set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.

conversions

Set of conversions to consider

quality

Only count conversions with PHRED quality greater than this value

threshold

Positions with conversions / coverage > threshold will be considered as SNPs

min_coverage

Only positions with at least this many mapping read_snps are considered

n_threads

Number of threads

Returns

Path to SNPs CSV