`dynast.utils`

Module Contents

Classes

suppress_stdout_stderr

A context manager for doing a "deep suppression" of stdout and stderr in

Functions

`get_STAR_binary_path`() → str	Get the path to the platform-dependent STAR binary included with
`get_STAR_version`() → str	Get the provided STAR version.
`combine_arguments`(args: Dict[str, Any], additional: Dict[str, Any]) → Dict[str, Any]	Combine two dictionaries representing command-line arguments.
`arguments_to_list`(args: Dict[str, Any]) → List[Any]	Convert a dictionary of command-line arguments to a list.
`get_file_descriptor_limit`() → int	Get the current value for the maximum number of open file descriptors
`get_max_file_descriptor_limit`() → int	Get the maximum allowed value for the maximum number of open file
`increase_file_descriptor_limit`(limit: int)	Context manager that can be used to temporarily increase the maximum
`get_available_memory`() → int	Get total amount of available memory (total memory - used memory) in bytes.
`make_pool_with_counter`(n_threads: int) → Tuple[multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock]	Create a new Process pool with a shared progress counter.
`display_progress_with_counter`(counter: multiprocessing.Value, total: int, *async_results, desc: Optional[str] = None)	Display progress bar for displaying multiprocessing progress.
`as_completed_with_progress`(futures: Iterable[concurrent.futures.Future])	Wrapper around concurrent.futures.as_completed that displays a progress bar.
`split_index`(index: List[Tuple[int, int, int]], n: int = 8) → List[List[Tuple[int, int, int]]]	Split a conversions index, which is a list of tuples (file position,
`downsample_counts`(df_counts: pandas.DataFrame, proportion: Optional[float] = None, count: Optional[int] = None, seed: Optional[int] = None, group_by: Optional[List[str]] = None) → pandas.DataFrame	Downsample the given counts dataframe according to the `proportion` or
`counts_to_matrix`(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX') → scipy.sparse.csr_matrix	Convert a counts dataframe to a sparse counts matrix.
`split_counts`(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX', conversions: FrozenSet[str] = frozenset({'TC'})) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix]	Split counts dataframe into two count matrices by a column.
`split_matrix_pi`(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], pis: Dict[Tuple[str, str], float], barcodes: List[str], features: List[str]) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix]	Split the given matrix based on provided fraction of new RNA.
`split_matrix_alpha`(unlabeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], labeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], alphas: Dict[str, float], barcodes: List[str]) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix]	Split the given matrix based on provided fraction of new RNA.
`results_to_adata`(df_counts: pandas.DataFrame, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), gene_infos: Optional[dict] = None, pis: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[Tuple[str, str], float]]]] = None, alphas: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[str, float]]]] = None) → anndata.AnnData	Compile all results to a single anndata.
`patch_mp_connection_bpo_17560`()	Apply PR-10305 / bpo-17560 connection send/receive max size update
`dict_to_matrix`(d: Dict[Tuple[str, str], float], rows: List[str], columns: List[str]) → scipy.sparse.csr_matrix	Convert a dictionary to a matrix.

Attributes

`run_executable`
`open_as_text`
`decompress_gzip`
`flatten_dict_values`
`mkstemp`
`all_exists`
`flatten_dictionary`
`flatten_iter`
`merge_dictionaries`
`write_pickle`
`read_pickle`

dynast.utils.run_executable[source]

dynast.utils.open_as_text[source]

dynast.utils.decompress_gzip[source]

dynast.utils.flatten_dict_values[source]

dynast.utils.mkstemp[source]

dynast.utils.all_exists[source]

dynast.utils.flatten_dictionary[source]

dynast.utils.flatten_iter[source]

dynast.utils.merge_dictionaries[source]

dynast.utils.write_pickle[source]

dynast.utils.read_pickle[source]

exception dynast.utils.UnsupportedOSException[source]

Bases: Exception

Common base class for all non-exit exceptions.

class dynast.utils.suppress_stdout_stderr[source]

A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.

This will not suppress raised exceptions, since exceptions are printed

to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223

__enter__(self)[source]

__exit__(self, *_)[source]

dynast.utils.get_STAR_binary_path() → str[source]

Get the path to the platform-dependent STAR binary included with the installation.

Returns: Path to the binary

dynast.utils.get_STAR_version() → str[source]

Get the provided STAR version.

Returns: Version string

dynast.utils.combine_arguments(args: Dict[str, Any], additional: Dict[str, Any]) → Dict[str, Any][source]

Combine two dictionaries representing command-line arguments.

Any duplicate keys will be merged according to the following procedure: 1. If the value in both dictionaries are lists, the two lists are combined. 2. Otherwise, the value in the first dictionary is OVERWRITTEN.

Parameters

args: Original command-line arguments
additional: Additional command-line arguments

Returns

Combined command-line arguments

dynast.utils.arguments_to_list(args: Dict[str, Any]) → List[Any][source]

Convert a dictionary of command-line arguments to a list.

Parameters

args: Command-line arguments

Returns

List of command-line arguments

dynast.utils.get_file_descriptor_limit() → int[source]

Get the current value for the maximum number of open file descriptors in a platform-dependent way.

Returns: The current value of the maximum number of open file descriptors.

dynast.utils.get_max_file_descriptor_limit() → int[source]

Get the maximum allowed value for the maximum number of open file descriptors.

Note that for Windows, there is not an easy way to get this, as it requires reading from the registry. So, we just return the maximum for a vanilla Windows installation, which is 8192. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=vs-2019

Similarly, on MacOS, we return a hardcoded 10240.

Returns: Maximum allowed value for the maximum number of open file descriptors

dynast.utils.increase_file_descriptor_limit(limit: int)[source]

Context manager that can be used to temporarily increase the maximum number of open file descriptors for the current process. The original value is restored when execution exits this function.

This is required when running STAR with many threads.

Parameters

limit: Maximum number of open file descriptors will be increased to this value for the duration of the context

dynast.utils.get_available_memory() → int[source]

Get total amount of available memory (total memory - used memory) in bytes.

Returns: Available memory in bytes

dynast.utils.make_pool_with_counter(n_threads: int) → Tuple[multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock][source]

Create a new Process pool with a shared progress counter.

Parameters

n_threads: Number of processes

Returns

Tuple of (Process pool, progress counter, lock)

dynast.utils.display_progress_with_counter(counter: multiprocessing.Value, total: int, *async_results, desc: Optional[str] = None)[source]

Display progress bar for displaying multiprocessing progress.

Parameters

counter: Progress counter
total: Maximum number of units of processing
*async_results: Multiprocessing results to monitor. These are used to determine when all processes are done.
desc: Progress bar description

dynast.utils.as_completed_with_progress(futures: Iterable[concurrent.futures.Future])[source]

Wrapper around concurrent.futures.as_completed that displays a progress bar.

Parameters

objects : Iterator of concurrent.futures.Future

dynast.utils.split_index(index: List[Tuple[int, int, int]], n: int = 8) → List[List[Tuple[int, int, int]]][source]

Split a conversions index, which is a list of tuples (file position, number of lines, alignment position), one for each read, into n approximately equal parts. This function is used to split the conversions CSV for multiprocessing.

Parameters

index: index
n: Number of splits, defaults to 8

Returns

List of parts

dynast.utils.downsample_counts(df_counts: pandas.DataFrame, proportion: Optional[float] = None, count: Optional[int] = None, seed: Optional[int] = None, group_by: Optional[List[str]] = None) → pandas.DataFrame[source]

Downsample the given counts dataframe according to the proportion or count arguments. One of these two must be provided, but not both. The dataframe is assumed to be UMI-deduplicated.

Parameters

df_counts: Counts dataframe
proportion: Proportion of reads (UMIs) to keep
count: Absolute number of reads (UMIs) to keep
seed: Random seed
group_by: Columns in the counts dataframe to use to group entries. When this is provided, UMIs are no longer sampled at random, but instead grouped by this argument, and only groups that have more than count UMIs are downsampled.

Returns

Downsampled counts dataframe

dynast.utils.counts_to_matrix(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX') → scipy.sparse.csr_matrix[source]

Convert a counts dataframe to a sparse counts matrix.

Counts are assumed to be appropriately deduplicated.

Parameters

df_counts: Counts dataframe
barcodes: List of barcodes that will map to the rows
features: List of features (i.e. genes) that will map to the columns
barcode_column: Column in counts dataframe to use as barcodes, defaults to barcode
feature_column: Column in counts dataframe to use as features, defaults to GX

Returns

Sparse counts matrix

dynast.utils.split_counts(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX', conversions: FrozenSet[str] = frozenset({'TC'})) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix][source]

Split counts dataframe into two count matrices by a column.

Parameters

df_counts: Counts dataframe
barcodes: List of barcodes that will map to the rows
features: List of features (i.e. genes) that will map to the columns
barcode_column: Column in counts dataframe to use as barcodes
feature_column: Column in counts dataframe to use as features
conversions: Conversion(s) in question

Returns

count matrix of conversion==0, count matrix of conversion>0

dynast.utils.split_matrix_pi(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], pis: Dict[Tuple[str, str], float], barcodes: List[str], features: List[str]) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix][source]

Split the given matrix based on provided fraction of new RNA.

Parameters

matrix: Matrix to split
pis: Dictionary containing pi estimates
barcodes: All barcodes
features: All features (i.e. genes)

Returns

matrix of pis, matrix of unlabeled RNA, matrix of labeled RNA

dynast.utils.split_matrix_alpha(unlabeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], labeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], alphas: Dict[str, float], barcodes: List[str]) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix][source]

Split the given matrix based on provided fraction of new RNA.

Parameters

unlabeled_matrix: unlabeled matrix
labeled_matrix: Labeled matrix
alphas: Dictionary containing alpha estimates
barcodes: All barcodes
features: All features (i.e. genes)

Returns

matrix of pis, matrix of unlabeled RNA, matrix of labeled RNA

dynast.utils.results_to_adata(df_counts: pandas.DataFrame, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), gene_infos: Optional[dict] = None, pis: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[Tuple[str, str], float]]]] = None, alphas: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[str, float]]]] = None) → anndata.AnnData[source]

Compile all results to a single anndata.

Parameters

df_counts: Counts dataframe, with complemented reverse strand bases
conversions: Conversion(s) in question
gene_infos: Dictionary containing gene information. If this is not provided, the function assumes gene names are already in the Counts dataframe.
pis: Dictionary of estimated pis
alphas: Dictionary of estimated alphas

Returns

Anndata containing all results

dynast.utils.patch_mp_connection_bpo_17560()[source]

Apply PR-10305 / bpo-17560 connection send/receive max size update

See the original issue at https://bugs.python.org/issue17560 and https://github.com/python/cpython/pull/10305 for the pull request.

This only supports Python versions 3.3 - 3.7, this function does nothing for Python versions outside of that range.

Taken from https://stackoverflow.com/a/47776649

dynast.utils.dict_to_matrix(d: Dict[Tuple[str, str], float], rows: List[str], columns: List[str]) → scipy.sparse.csr_matrix[source]

Convert a dictionary to a matrix.

Parameters

d: Dictionary to convert
rows: Row names
columns: Column names

Returns

A sparse matrix

dynast.utils

Module Contents

Classes

Functions

Attributes

`dynast.utils`