dynast.utils

Module Contents

Classes

suppress_stdout_stderr

A context manager for doing a "deep suppression" of stdout and stderr in

Functions

get_STAR_binary_path() → str

Get the path to the platform-dependent STAR binary included with

get_STAR_version() → str

Get the provided STAR version.

combine_arguments(args: Dict[str, Any], additional: Dict[str, Any]) → Dict[str, Any]

Combine two dictionaries representing command-line arguments.

arguments_to_list(args: Dict[str, Any]) → List[Any]

Convert a dictionary of command-line arguments to a list.

get_file_descriptor_limit() → int

Get the current value for the maximum number of open file descriptors

get_max_file_descriptor_limit() → int

Get the maximum allowed value for the maximum number of open file

increase_file_descriptor_limit(limit: int)

Context manager that can be used to temporarily increase the maximum

get_available_memory() → int

Get total amount of available memory (total memory - used memory) in bytes.

make_pool_with_counter(n_threads: int) → Tuple[multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock]

Create a new Process pool with a shared progress counter.

display_progress_with_counter(counter: multiprocessing.Value, total: int, *async_results, desc: Optional[str] = None)

Display progress bar for displaying multiprocessing progress.

as_completed_with_progress(futures: Iterable[concurrent.futures.Future])

Wrapper around concurrent.futures.as_completed that displays a progress bar.

split_index(index: List[Tuple[int, int, int]], n: int = 8) → List[List[Tuple[int, int, int]]]

Split a conversions index, which is a list of tuples (file position,

downsample_counts(df_counts: pandas.DataFrame, proportion: Optional[float] = None, count: Optional[int] = None, seed: Optional[int] = None, group_by: Optional[List[str]] = None) → pandas.DataFrame

Downsample the given counts dataframe according to the proportion or

counts_to_matrix(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX') → scipy.sparse.csr_matrix

Convert a counts dataframe to a sparse counts matrix.

split_counts(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX', conversions: FrozenSet[str] = frozenset({'TC'})) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix]

Split counts dataframe into two count matrices by a column.

split_matrix_pi(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], pis: Dict[Tuple[str, str], float], barcodes: List[str], features: List[str]) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix]

Split the given matrix based on provided fraction of new RNA.

split_matrix_alpha(unlabeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], labeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], alphas: Dict[str, float], barcodes: List[str]) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix]

Split the given matrix based on provided fraction of new RNA.

results_to_adata(df_counts: pandas.DataFrame, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), gene_infos: Optional[dict] = None, pis: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[Tuple[str, str], float]]]] = None, alphas: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[str, float]]]] = None) → anndata.AnnData

Compile all results to a single anndata.

patch_mp_connection_bpo_17560()

Apply PR-10305 / bpo-17560 connection send/receive max size update

dict_to_matrix(d: Dict[Tuple[str, str], float], rows: List[str], columns: List[str]) → scipy.sparse.csr_matrix

Convert a dictionary to a matrix.

Attributes

run_executable

open_as_text

decompress_gzip

flatten_dict_values

mkstemp

all_exists

flatten_dictionary

flatten_iter

merge_dictionaries

write_pickle

read_pickle

dynast.utils.run_executable[source]
dynast.utils.open_as_text[source]
dynast.utils.decompress_gzip[source]
dynast.utils.flatten_dict_values[source]
dynast.utils.mkstemp[source]
dynast.utils.all_exists[source]
dynast.utils.flatten_dictionary[source]
dynast.utils.flatten_iter[source]
dynast.utils.merge_dictionaries[source]
dynast.utils.write_pickle[source]
dynast.utils.read_pickle[source]
exception dynast.utils.UnsupportedOSException[source]

Bases: Exception

Common base class for all non-exit exceptions.

class dynast.utils.suppress_stdout_stderr[source]

A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.

This will not suppress raised exceptions, since exceptions are printed

to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223

__enter__(self)[source]
__exit__(self, *_)[source]
dynast.utils.get_STAR_binary_path() str[source]

Get the path to the platform-dependent STAR binary included with the installation.

Returns

Path to the binary

dynast.utils.get_STAR_version() str[source]

Get the provided STAR version.

Returns

Version string

dynast.utils.combine_arguments(args: Dict[str, Any], additional: Dict[str, Any]) Dict[str, Any][source]

Combine two dictionaries representing command-line arguments.

Any duplicate keys will be merged according to the following procedure: 1. If the value in both dictionaries are lists, the two lists are combined. 2. Otherwise, the value in the first dictionary is OVERWRITTEN.

Parameters
args

Original command-line arguments

additional

Additional command-line arguments

Returns

Combined command-line arguments

dynast.utils.arguments_to_list(args: Dict[str, Any]) List[Any][source]

Convert a dictionary of command-line arguments to a list.

Parameters
args

Command-line arguments

Returns

List of command-line arguments

dynast.utils.get_file_descriptor_limit() int[source]

Get the current value for the maximum number of open file descriptors in a platform-dependent way.

Returns

The current value of the maximum number of open file descriptors.

dynast.utils.get_max_file_descriptor_limit() int[source]

Get the maximum allowed value for the maximum number of open file descriptors.

Note that for Windows, there is not an easy way to get this, as it requires reading from the registry. So, we just return the maximum for a vanilla Windows installation, which is 8192. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=vs-2019

Similarly, on MacOS, we return a hardcoded 10240.

Returns

Maximum allowed value for the maximum number of open file descriptors

dynast.utils.increase_file_descriptor_limit(limit: int)[source]

Context manager that can be used to temporarily increase the maximum number of open file descriptors for the current process. The original value is restored when execution exits this function.

This is required when running STAR with many threads.

Parameters
limit

Maximum number of open file descriptors will be increased to this value for the duration of the context

dynast.utils.get_available_memory() int[source]

Get total amount of available memory (total memory - used memory) in bytes.

Returns

Available memory in bytes

dynast.utils.make_pool_with_counter(n_threads: int) Tuple[multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock][source]

Create a new Process pool with a shared progress counter.

Parameters
n_threads

Number of processes

Returns

Tuple of (Process pool, progress counter, lock)

dynast.utils.display_progress_with_counter(counter: multiprocessing.Value, total: int, *async_results, desc: Optional[str] = None)[source]

Display progress bar for displaying multiprocessing progress.

Parameters
counter

Progress counter

total

Maximum number of units of processing

*async_results

Multiprocessing results to monitor. These are used to determine when all processes are done.

desc

Progress bar description

dynast.utils.as_completed_with_progress(futures: Iterable[concurrent.futures.Future])[source]

Wrapper around concurrent.futures.as_completed that displays a progress bar.

Parameters
objects : Iterator of concurrent.futures.Future

dynast.utils.split_index(index: List[Tuple[int, int, int]], n: int = 8) List[List[Tuple[int, int, int]]][source]

Split a conversions index, which is a list of tuples (file position, number of lines, alignment position), one for each read, into n approximately equal parts. This function is used to split the conversions CSV for multiprocessing.

Parameters
index

index

n

Number of splits, defaults to 8

Returns

List of parts

dynast.utils.downsample_counts(df_counts: pandas.DataFrame, proportion: Optional[float] = None, count: Optional[int] = None, seed: Optional[int] = None, group_by: Optional[List[str]] = None) pandas.DataFrame[source]

Downsample the given counts dataframe according to the proportion or count arguments. One of these two must be provided, but not both. The dataframe is assumed to be UMI-deduplicated.

Parameters
df_counts

Counts dataframe

proportion

Proportion of reads (UMIs) to keep

count

Absolute number of reads (UMIs) to keep

seed

Random seed

group_by

Columns in the counts dataframe to use to group entries. When this is provided, UMIs are no longer sampled at random, but instead grouped by this argument, and only groups that have more than count UMIs are downsampled.

Returns

Downsampled counts dataframe

dynast.utils.counts_to_matrix(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX') scipy.sparse.csr_matrix[source]

Convert a counts dataframe to a sparse counts matrix.

Counts are assumed to be appropriately deduplicated.

Parameters
df_counts

Counts dataframe

barcodes

List of barcodes that will map to the rows

features

List of features (i.e. genes) that will map to the columns

barcode_column

Column in counts dataframe to use as barcodes, defaults to barcode

feature_column

Column in counts dataframe to use as features, defaults to GX

Returns

Sparse counts matrix

dynast.utils.split_counts(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX', conversions: FrozenSet[str] = frozenset({'TC'})) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix][source]

Split counts dataframe into two count matrices by a column.

Parameters
df_counts

Counts dataframe

barcodes

List of barcodes that will map to the rows

features

List of features (i.e. genes) that will map to the columns

barcode_column

Column in counts dataframe to use as barcodes

feature_column

Column in counts dataframe to use as features

conversions

Conversion(s) in question

Returns

count matrix of conversion==0, count matrix of conversion>0

dynast.utils.split_matrix_pi(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], pis: Dict[Tuple[str, str], float], barcodes: List[str], features: List[str]) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix][source]

Split the given matrix based on provided fraction of new RNA.

Parameters
matrix

Matrix to split

pis

Dictionary containing pi estimates

barcodes

All barcodes

features

All features (i.e. genes)

Returns

matrix of pis, matrix of unlabeled RNA, matrix of labeled RNA

dynast.utils.split_matrix_alpha(unlabeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], labeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], alphas: Dict[str, float], barcodes: List[str]) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix][source]

Split the given matrix based on provided fraction of new RNA.

Parameters
unlabeled_matrix

unlabeled matrix

labeled_matrix

Labeled matrix

alphas

Dictionary containing alpha estimates

barcodes

All barcodes

features

All features (i.e. genes)

Returns

matrix of pis, matrix of unlabeled RNA, matrix of labeled RNA

dynast.utils.results_to_adata(df_counts: pandas.DataFrame, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), gene_infos: Optional[dict] = None, pis: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[Tuple[str, str], float]]]] = None, alphas: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[str, float]]]] = None) anndata.AnnData[source]

Compile all results to a single anndata.

Parameters
df_counts

Counts dataframe, with complemented reverse strand bases

conversions

Conversion(s) in question

gene_infos

Dictionary containing gene information. If this is not provided, the function assumes gene names are already in the Counts dataframe.

pis

Dictionary of estimated pis

alphas

Dictionary of estimated alphas

Returns

Anndata containing all results

dynast.utils.patch_mp_connection_bpo_17560()[source]

Apply PR-10305 / bpo-17560 connection send/receive max size update

See the original issue at https://bugs.python.org/issue17560 and https://github.com/python/cpython/pull/10305 for the pull request.

This only supports Python versions 3.3 - 3.7, this function does nothing for Python versions outside of that range.

Taken from https://stackoverflow.com/a/47776649

dynast.utils.dict_to_matrix(d: Dict[Tuple[str, str], float], rows: List[str], columns: List[str]) scipy.sparse.csr_matrix[source]

Convert a dictionary to a matrix.

Parameters
d

Dictionary to convert

rows

Row names

columns

Column names

Returns

A sparse matrix