dynast.utils
Module Contents
Classes
A context manager for doing a "deep suppression" of stdout and stderr in |
Functions
|
Get the path to the platform-dependent STAR binary included with |
|
Get the provided STAR version. |
|
Combine two dictionaries representing command-line arguments. |
|
Convert a dictionary of command-line arguments to a list. |
|
Get the current value for the maximum number of open file descriptors |
|
Get the maximum allowed value for the maximum number of open file |
|
Context manager that can be used to temporarily increase the maximum |
|
Get total amount of available memory (total memory - used memory) in bytes. |
|
Create a new Process pool with a shared progress counter. |
|
Display progress bar for displaying multiprocessing progress. |
|
Wrapper around concurrent.futures.as_completed that displays a progress bar. |
|
Split a conversions index, which is a list of tuples (file position, |
|
Downsample the given counts dataframe according to the |
|
Convert a counts dataframe to a sparse counts matrix. |
|
Split counts dataframe into two count matrices by a column. |
|
Split the given matrix based on provided fraction of new RNA. |
|
Split the given matrix based on provided fraction of new RNA. |
|
Compile all results to a single anndata. |
Apply PR-10305 / bpo-17560 connection send/receive max size update |
|
|
Convert a dictionary to a matrix. |
Attributes
- exception dynast.utils.UnsupportedOSException[source]
Bases:
Exception
Common base class for all non-exit exceptions.
- class dynast.utils.suppress_stdout_stderr[source]
A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.
This will not suppress raised exceptions, since exceptions are printed
to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223
- dynast.utils.get_STAR_binary_path() str [source]
Get the path to the platform-dependent STAR binary included with the installation.
- Returns
Path to the binary
- dynast.utils.combine_arguments(args: Dict[str, Any], additional: Dict[str, Any]) Dict[str, Any] [source]
Combine two dictionaries representing command-line arguments.
Any duplicate keys will be merged according to the following procedure: 1. If the value in both dictionaries are lists, the two lists are combined. 2. Otherwise, the value in the first dictionary is OVERWRITTEN.
- Parameters
- args
Original command-line arguments
- additional
Additional command-line arguments
- Returns
Combined command-line arguments
- dynast.utils.arguments_to_list(args: Dict[str, Any]) List[Any] [source]
Convert a dictionary of command-line arguments to a list.
- Parameters
- args
Command-line arguments
- Returns
List of command-line arguments
- dynast.utils.get_file_descriptor_limit() int [source]
Get the current value for the maximum number of open file descriptors in a platform-dependent way.
- Returns
The current value of the maximum number of open file descriptors.
- dynast.utils.get_max_file_descriptor_limit() int [source]
Get the maximum allowed value for the maximum number of open file descriptors.
Note that for Windows, there is not an easy way to get this, as it requires reading from the registry. So, we just return the maximum for a vanilla Windows installation, which is 8192. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=vs-2019
Similarly, on MacOS, we return a hardcoded 10240.
- Returns
Maximum allowed value for the maximum number of open file descriptors
- dynast.utils.increase_file_descriptor_limit(limit: int)[source]
Context manager that can be used to temporarily increase the maximum number of open file descriptors for the current process. The original value is restored when execution exits this function.
This is required when running STAR with many threads.
- Parameters
- limit
Maximum number of open file descriptors will be increased to this value for the duration of the context
- dynast.utils.get_available_memory() int [source]
Get total amount of available memory (total memory - used memory) in bytes.
- Returns
Available memory in bytes
- dynast.utils.make_pool_with_counter(n_threads: int) Tuple[multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock] [source]
Create a new Process pool with a shared progress counter.
- Parameters
- n_threads
Number of processes
- Returns
Tuple of (Process pool, progress counter, lock)
- dynast.utils.display_progress_with_counter(counter: multiprocessing.Value, total: int, *async_results, desc: Optional[str] = None)[source]
Display progress bar for displaying multiprocessing progress.
- Parameters
- counter
Progress counter
- total
Maximum number of units of processing
- *async_results
Multiprocessing results to monitor. These are used to determine when all processes are done.
- desc
Progress bar description
- dynast.utils.as_completed_with_progress(futures: Iterable[concurrent.futures.Future])[source]
Wrapper around concurrent.futures.as_completed that displays a progress bar.
- Parameters
- objects : Iterator of concurrent.futures.Future
- dynast.utils.split_index(index: List[Tuple[int, int, int]], n: int = 8) List[List[Tuple[int, int, int]]] [source]
Split a conversions index, which is a list of tuples (file position, number of lines, alignment position), one for each read, into n approximately equal parts. This function is used to split the conversions CSV for multiprocessing.
- Parameters
- index
index
- n
Number of splits, defaults to 8
- Returns
List of parts
- dynast.utils.downsample_counts(df_counts: pandas.DataFrame, proportion: Optional[float] = None, count: Optional[int] = None, seed: Optional[int] = None, group_by: Optional[List[str]] = None) pandas.DataFrame [source]
Downsample the given counts dataframe according to the
proportion
orcount
arguments. One of these two must be provided, but not both. The dataframe is assumed to be UMI-deduplicated.- Parameters
- df_counts
Counts dataframe
- proportion
Proportion of reads (UMIs) to keep
- count
Absolute number of reads (UMIs) to keep
- seed
Random seed
- group_by
Columns in the counts dataframe to use to group entries. When this is provided, UMIs are no longer sampled at random, but instead grouped by this argument, and only groups that have more than
count
UMIs are downsampled.
- Returns
Downsampled counts dataframe
- dynast.utils.counts_to_matrix(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX') scipy.sparse.csr_matrix [source]
Convert a counts dataframe to a sparse counts matrix.
Counts are assumed to be appropriately deduplicated.
- Parameters
- df_counts
Counts dataframe
- barcodes
List of barcodes that will map to the rows
- features
List of features (i.e. genes) that will map to the columns
- barcode_column
Column in counts dataframe to use as barcodes, defaults to barcode
- feature_column
Column in counts dataframe to use as features, defaults to GX
- Returns
Sparse counts matrix
- dynast.utils.split_counts(df_counts: pandas.DataFrame, barcodes: List[str], features: List[str], barcode_column: str = 'barcode', feature_column: str = 'GX', conversions: FrozenSet[str] = frozenset({'TC'})) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix] [source]
Split counts dataframe into two count matrices by a column.
- Parameters
- df_counts
Counts dataframe
- barcodes
List of barcodes that will map to the rows
- features
List of features (i.e. genes) that will map to the columns
- barcode_column
Column in counts dataframe to use as barcodes
- feature_column
Column in counts dataframe to use as features
- conversions
Conversion(s) in question
- Returns
count matrix of conversion==0, count matrix of conversion>0
- dynast.utils.split_matrix_pi(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], pis: Dict[Tuple[str, str], float], barcodes: List[str], features: List[str]) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix] [source]
Split the given matrix based on provided fraction of new RNA.
- Parameters
- matrix
Matrix to split
- pis
Dictionary containing pi estimates
- barcodes
All barcodes
- features
All features (i.e. genes)
- Returns
matrix of pis, matrix of unlabeled RNA, matrix of labeled RNA
- dynast.utils.split_matrix_alpha(unlabeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], labeled_matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], alphas: Dict[str, float], barcodes: List[str]) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, scipy.sparse.csr_matrix] [source]
Split the given matrix based on provided fraction of new RNA.
- Parameters
- unlabeled_matrix
unlabeled matrix
- labeled_matrix
Labeled matrix
- alphas
Dictionary containing alpha estimates
- barcodes
All barcodes
- features
All features (i.e. genes)
- Returns
matrix of pis, matrix of unlabeled RNA, matrix of labeled RNA
- dynast.utils.results_to_adata(df_counts: pandas.DataFrame, conversions: FrozenSet[FrozenSet[str]] = frozenset({frozenset({'TC'})}), gene_infos: Optional[dict] = None, pis: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[Tuple[str, str], float]]]] = None, alphas: Optional[Dict[str, Dict[Tuple[str, Ellipsis], Dict[str, float]]]] = None) anndata.AnnData [source]
Compile all results to a single anndata.
- Parameters
- df_counts
Counts dataframe, with complemented reverse strand bases
- conversions
Conversion(s) in question
- gene_infos
Dictionary containing gene information. If this is not provided, the function assumes gene names are already in the Counts dataframe.
- pis
Dictionary of estimated pis
- alphas
Dictionary of estimated alphas
- Returns
Anndata containing all results
- dynast.utils.patch_mp_connection_bpo_17560()[source]
Apply PR-10305 / bpo-17560 connection send/receive max size update
See the original issue at https://bugs.python.org/issue17560 and https://github.com/python/cpython/pull/10305 for the pull request.
This only supports Python versions 3.3 - 3.7, this function does nothing for Python versions outside of that range.
Taken from https://stackoverflow.com/a/47776649