dynast.preprocessing.consensus

Module Contents

Functions

call_consensus_from_reads(reads: List[pysam.AlignedSegment], header: pysam.AlignmentHeader, quality: int = 27, tags: Optional[Dict[str, Any]] = None) → pysam.AlignedSegment

Call a single consensus alignment given a list of aligned reads.

call_consensus_from_reads_process(reads, header, tags, strand=None, quality=27)

Helper function to call call_consensus_from_reads() from a subprocess.

consensus_worker(args_q, results_q, *args, **kwargs)

Multiprocessing worker.

call_consensus(bam_path: str, out_path: str, gene_infos: dict, strand: typing_extensions.Literal[forward, reverse, unstranded] = 'forward', umi_tag: Optional[str] = None, barcode_tag: Optional[str] = None, gene_tag: str = 'GX', barcodes: Optional[List[str]] = None, quality: int = 27, add_RS_RI: bool = False, temp_dir: Optional[str] = None, n_threads: int = 8) → str

Call consensus sequences from BAM.

Attributes

BASES

BASE_IDX

dynast.preprocessing.consensus.BASES = ['A', 'C', 'G', 'T'][source]
dynast.preprocessing.consensus.BASE_IDX[source]
dynast.preprocessing.consensus.call_consensus_from_reads(reads: List[pysam.AlignedSegment], header: pysam.AlignmentHeader, quality: int = 27, tags: Optional[Dict[str, Any]] = None) pysam.AlignedSegment[source]

Call a single consensus alignment given a list of aligned reads.

Reads must map to the same contig. Results are undefined otherwise. Additionally, consensus bases are called only for positions that match to the reference (i.e. no insertions allowed).

This function only sets the minimal amount of attributes such that the alignment is valid. These include: * read name – SHA256 hash of the provided read names * read sequence and qualities * reference name and ID * reference start * mapping quality (MAPQ) * cigarstring * MD tag * NM tag * Not unmapped, paired, duplicate, qc fail, secondary, nor supplementary

The caller is expected to further populate the alignment with additional tags, flags, and name.

Parameters
reads

List of reads to call a consensus sequence from

header

header to use when creating the new pysam alignment

quality

quality threshold

tags

additional tags to set

Returns

New pysam alignment of the consensus sequence

dynast.preprocessing.consensus.call_consensus_from_reads_process(reads, header, tags, strand=None, quality=27)[source]

Helper function to call call_consensus_from_reads() from a subprocess.

dynast.preprocessing.consensus.consensus_worker(args_q, results_q, *args, **kwargs)[source]

Multiprocessing worker.

dynast.preprocessing.consensus.call_consensus(bam_path: str, out_path: str, gene_infos: dict, strand: typing_extensions.Literal[forward, reverse, unstranded] = 'forward', umi_tag: Optional[str] = None, barcode_tag: Optional[str] = None, gene_tag: str = 'GX', barcodes: Optional[List[str]] = None, quality: int = 27, add_RS_RI: bool = False, temp_dir: Optional[str] = None, n_threads: int = 8) str[source]

Call consensus sequences from BAM.

Parameters
bam_path

Path to BAM

out_path

Output BAM path

gene_infos

Gene information, as parsed from the GTF

strand

Protocol strandedness

umi_tag

BAM tag containing the UMI

barcode_tag

BAM tag containing the barcode

gene_tag

BAM tag containing the assigned gene

barcodes

List of barcodes to consider

quality

Quality threshold

add_RS_RI

Add RS and RI BAM tags for debugging

temp_dir

Temporary directory

n_threads

Number of threads

Returns

Path to sorted and indexed consensus BAM