Core Functions¶
This module defines core functions used by eco_helper and its submodules.
eco_helper.core.CellTypeCollection¶
This class handles cell type sub-datasets from EcoTyper.
- class eco_helper.core.cell_types.CellTypeCollection(directories: list)¶
Bases:
objectThis class assembles the cell types from multiple EcoTyper results directories. It will store for each cell type the corresponding data directory from the given EcoTyper results directories. This class is iterable over the cell types identified, and can be indexed by the cell type name.
- Parameters
directories (list or str) – A single or a list of multiple EcoTyper results (output) directories to get cell types from.
eco_helper.core.CellStateCollection¶
This class handles Cell State assignments for different EcoTyper runs.
- class eco_helper.core.cell_states.CellStateCollection(directories: list)¶
Bases:
CellTypeCollectionThis class handles the state assignments between different EcoTyper runs. It will store for each cell type a dataframe with the cell type’s genes and their corresponding state assingnments. This class is iterable over the cell types identified, and can be indexed by the cell type name.
- Parameters
directories (list) – List of EcoTyper results (output) directories to get state assignments from.
- compare_gene_overlaps(percent: bool = False) pandas.DataFrame¶
Compares the gene set overlaps between differen EcoTyper runs over per cell type for all cell states.
Note
A per-state comparison makes no sense because the state labelling is arbitrary for each clustering and therefore S01 from two different runs need not correspond to the same state.
- Parameters
percent (bool) – If True, compute the overlap as a percentage of the total set of genes per state.
- Returns
df – Dataframe with the gene set overlaps for each cell type between different runs.
- Return type
pd.DataFrame
- export_to_gseapy(directory: str, prerank: bool = False, enrichr: bool = True)¶
Export the gene sets for each cell type and state into separate files in a directory. If get_genes() has not been called yet, this method will call it automatically.
Note
If both prerank and enrichr are set to true, then the ouput files will be placed in separate subdirectories.
- Parameters
directory (str) – The directory to export the gene sets to.
prerank (bool) – Export both the gene names alongside the max. Fold change for gseapy prerank (default False).
enrichr (bool) – Export only the gene names as a simple text file for gseapy enrichr (default True).
- get_genes()¶
Get the gene info with the fold-change data for each cell type and the assigned cell state.
- Returns
genes – A GeneSetCollection with the gene info with the fold-change data for each cell type and the assigned cell state.
- Return type
- save(directory: str)¶
Save the state assignments of each cell type to a directory (one file per cell type).
Note
The export_to_gseapy method allows streamlined export of gene sets destined for subsequent analysis with gseapy prerank or enrichr.
- Parameters
directory (str) – The directory to save the state assignments to.
eco_helper.core.EcoTypeCollection¶
This class handles EcoType assignments between different EcoTyper runs.
- class eco_helper.core.ecotypes.Ecotype(cell_types: Optional[list] = None, states: Optional[list] = None, genes: Optional[list] = None, label: Optional[str] = None)¶
Bases:
objectThe base class of an Ecotype holding cell types and associated states associated with the Ecotype.
- Parameters
cell_types (list) – List of cell types associated with the Ecotype.
states (list) – List of cell states associated with the Ecotype.
genes (list) – List of pandas dataframes containing the genes associated with each celltype and state.
label (str) – An arbitrary identifier for the Ecotype.
- add(cell_type: str, state: str, genes: Optional[pandas.DataFrame] = None)¶
Add a cell type and state to the Ecotype.
- Parameters
cell_type (str) – The cell type to add.
state (str) – The cell type’s cell state.
genes (pd.DataFrame) – The genes associated with the cell state.
- property cell_types¶
- gene_set_filenames()¶
Assemble a list of gene set filenames (as created by the eco_helper.enrich.collect_gene_sets function) for all celltypes and states contributing to the Ecotype.
- Returns
List of gene set filenames.
- Return type
list
- property genes¶
- remove(cell_type: str, state: Optional[str] = None)¶
Remove a cell type and state from the Ecotype. If no state is given then all states associated with the cell-type are removed.
- property states¶
- to_df()¶
Convert the Ecotype to a pandas DataFrame with two columns, one for cell types and one for their states (as string identifiers/labels).
- to_dict()¶
Convert the Ecotype to a dictionary with cell types and states keys and their associated genes as values.
- class eco_helper.core.ecotypes.EcotypeCollection(directories: list)¶
Bases:
CellStateCollectionThis class handles Ecotype assignments between separate EcoTyper runs.
- Parameters
directories (list) – List of EcoTyper results (output) directories to get ecotypes from.
- match_genes_to_states()¶
Get the gene sets associated with each cell type’s cell state. This will replace the simple string description of the cell state with the respective dataframe within the ecotype_assignments dictionary.
eco_helper.core.gene_sets module¶
Classes to handle gene sets.
- class eco_helper.core.gene_sets.BaseOverlap(a: set, b: set)¶
Bases:
objectThis class handles a single overlap between two sets of genes.
- Parameters
a (set) – The first set of genes.
b (set) – The second set of genes.
- get(percent: bool = False)¶
Get a pandas dataframe of the overlaps between the two sets, either in percentages or in absolute counts (in which case a “total” column is added).
- Parameters
percent (bool) – If True, return the overlap in percentages.
- class eco_helper.core.gene_sets.GeneSetCollection(gene_sets: Optional[dict] = None)¶
Bases:
objectThis class handles a collection of gene sets for different cell types. It stores for each cell type a dataframe with the gene sets for each state. This class is iterable over cell types and can be indexed by cell type name and additionally by cell state.
- Parameters
gene_sets (dict) – A dictionary with cell type labels as keys and a dataframe of extracted genes with a “State” column to describe their assigned state.
- property cell_types¶
- items()¶
- keys()¶
- save(file_or_directory: str)¶
Save the gene sets either to a single condensed file or as separate files (one per cell type) into a directory.
- Parameters
file_or_directory (str) – The file or directory to save the gene sets to.
- subsets(cell_type: Optional[str] = None)¶
Return a groupby object for the given cell_type dataframe.
- Parameters
cell_type (str) – The cell type label. If none is provided a generator is returned with a groupby object for each cell type.
- Returns
A groupby object for the given cell_type dataframe. Or a generator with a groupby object for each cell type.
- Return type
groupby
- values()¶
- class eco_helper.core.gene_sets.GeneSetOverlap(cell_type: str, state_assignments: pandas.DataFrame)¶
Bases:
objectA class to handle the overlap between different cell states and separate Ecotyper runs for a single cell type.
- Parameters
cell_type (str) – The cell type label.
state_assignments (dict) – A pandas dataframe with a genes as index, a “State” column specifying the state to which the gene was assigned, and a “run” column specifying which Ecotyper run the assignment is from.
- compute_overlap(percent: bool = False)¶
Compute the overlap between between separate Ecotyper runs for each cell state individually.
- Parameters
percent (bool) – If True, compute the overlap as a percentage of the total set of genes per state.
eco_helper.core.EcoTyperConfig¶
Read an EcoTyper config yaml file.
- class eco_helper.core.ecotyper_config.EcoTyperConfig(filename: str)¶
Bases:
objectThis class handles the EcoTyper configuration yaml data.
- Parameters
filename (str) – The path to the config file.
- property annotation_columns¶
The annotation columns used for plotting the heatmaps
- property annotation_file¶
The annotation file used
- property cophentic_cutoff¶
The cophentic cutoff used
- property dataset¶
The dataset name used
- property expression_matrix¶
The expression matrix used
- property output_dir¶
The output directory used
- eco_helper.core.ecotyper_config.read_ecotyper_config(filename: str)¶
Reads the config file for an EcoTyper experiment.
- Parameters
filename (str) – The path to the config file.
- Returns
config – The config file as a dictionary.
- Return type
dict
eco_helper.core.Dataset¶
The class to handle EcoTyper datasets as pairs of annotation-tables and expression-matrices.
- class eco_helper.core.dataset.Dataset(annotation: str, expression: str)¶
Bases:
objectThis class handles an EcoTyper dataset.
- Parameters
annotation (str) – The filename of the annotation file.
expression (str) – The filename of the expression matrix file.
- read(annotation: Optional[str] = None, expression: Optional[str] = None)¶
Read in a (new) dataset from files.
- Parameters
annotation (str) – The filename of the annotation file.
expression (str) – The filename of the expression matrix file.
- write(annotation: Optional[str] = None, expression: Optional[str] = None)¶
Write the dataset to files.
- Parameters
annotation (str) – The filename of the annotation file.
expression (str) – The filename of the expression matrix file.
- eco_helper.core.dataset.read_anotation(filename: str)¶
Reads in an annotation file and returns a pandas DataFrame.
- Parameters
filename (str) – The filename of the annotation file.
- Returns
annotation – The annotation file as a pandas DataFrame.
- Return type
pandas.DataFrame
- eco_helper.core.dataset.read_expression(filename: str)¶
Reads in an expression matrix file and returns a pandas DataFrame.
- Parameters
filename (str) – The filename of the expression matrix file.
- Returns
expression – The expression matrix file as a pandas DataFrame.
- Return type
pandas.DataFrame
eco_helper.core settings¶
Generic settings for eco_helper
- eco_helper.core.settings.cell_type_col = 'CellType'¶
The data column handling the “cell type” assignment.
- eco_helper.core.settings.ecotype_col = 'Ecotype'¶
The data column handling the ecotype assignment.
- eco_helper.core.settings.ecotyper_experiment_col = 'run'¶
The data column handling the Ecotyper experiment name.
- eco_helper.core.settings.ecotypes_assignment_file = 'ecotype_assignment.txt'¶
The file containing the Ecotypes assignment to samples from an EcoTyper experiment.
- eco_helper.core.settings.ecotypes_composition_file = 'ecotypes.txt'¶
The file containing the composition data for Ecotypes from an EcoTyper experiment.
- eco_helper.core.settings.ecotypes_folder = 'Ecotypes'¶
The folder containing the Ecotypes from an EcoTyper experiment.
- eco_helper.core.settings.enrichr_outdir = 'gseapy_enrichr'¶
The output directory for the gseapy enrichr gene sets.
- eco_helper.core.settings.enrichr_results_suffix = '.enrichr.txt'¶
The suffix for gseapy enrichr results files.
- eco_helper.core.settings.gene_col = 'Gene'¶
The data handling the gene names or identifiers.
- eco_helper.core.settings.gene_info_file = 'gene_info.txt'¶
The file containing the gene info per celltype, including max fold change and state assignments.
- eco_helper.core.settings.gene_sets_outdir = 'gene_sets'¶
The output directory for extracted gene sets files.
- eco_helper.core.settings.gene_sets_suffix = '.genes.txt'¶
The suffix for a celltype gene sets file.
- eco_helper.core.settings.gseapy_outdir = 'gseapy_results'¶
The output directory for the gseapy results.
- eco_helper.core.settings.prerank_outdir = 'gseapy_prerank'¶
The output directory for the gseapy prerank gene sets.
- eco_helper.core.settings.prerank_results_suffix = '.prerank.txt'¶
The suffix for gseapy prerank results files.
- eco_helper.core.settings.rel_expr_col = 'MaxFC'¶
The data column handling the relative expression.
- eco_helper.core.settings.state_assignments_suffix = '.state_assignment.txt'¶
The suffix for a celltype state assignments file.
- eco_helper.core.settings.state_col = 'State'¶
The data column handling the “state assignment”.
eco_helper.core.terminal_funcs module¶
These are core functions of eco_helper that work with the terminal and running subprocesses.
They are mostly wrappers for subprocess.run(...) that capture output directly without the need for manually catching and decoding them.
- eco_helper.core.terminal_funcs.bash()¶
Get the current bash executable.
- Returns
The path to the bash executable.
- Return type
str
- eco_helper.core.terminal_funcs.from_terminal(cmd: str) TerminalOutput¶
Run a command in the terminal and return the output.
- Parameters
cmd (str) – The command to run.
- Returns
The output of the command. Which stores the stdout, stderr, and returncode.
- Return type
- eco_helper.core.terminal_funcs.returncode(cmd: str) int¶
Run a command in the terminal and return the returncode.
- Parameters
cmd (str) – The command to run.
- Returns
The returncode of the command.
- Return type
int
- eco_helper.core.terminal_funcs.run(cmd: str)¶
Run a command in the terminal without catching any outputs. Note, this will run in shell=True.
- Parameters
cmd (str) – The command to run.
- eco_helper.core.terminal_funcs.stderr(cmd: str, file: Optional[str] = None) str¶
Run a command in the terminal and return the stderr.
- Parameters
cmd (str) – The command to run.
- Returns
The stderr of the command.
- Return type
str
- eco_helper.core.terminal_funcs.stdout(cmd: str, file: Optional[str] = None) str¶
Run a command in the terminal and return the stdout.
- Parameters
cmd (str) – The command to run.
file (str) – A file to write the stdout to. Note this will overwrite any previously existing file of the same name!
- Returns
The stdout of the command.
- Return type
str
eco_helper.core.TerminalOutput¶
This class handles the stdout and stderr of a subprocess run.
- class eco_helper.core.TerminalOutput.TerminalOutput(process: CompletedProcess)¶
Bases:
objectA class to capture the output of a subprocess run.
- Parameters
process (subprocess.CompletedProcess) – The completed process object from which to read the output.
- stdout¶
The stdout of the subprocess run.
- Type
str
- stderr¶
The stderr of the subprocess run.
- Type
str
- returncode¶
The return code of the subprocess run.
- Type
int
- read_output(process: CompletedProcess)¶
Read the output of a subprocess run.
- Parameters
process (subprocess.CompletedProcess) – The completed process object from which to read the output.
- success()¶
Check if the subprocess run was successful.
eco_helper.core.find module¶
Find data files within the EcoTyper output directories or the EcoTyper internal directories.
- eco_helper.core.find.find_files(parent: str, pattern: str)¶
Find files within a directory using glob.
- Parameters
parent (str) – The path to the parent directory.
pattern (str) – The pattern to use for finding files.
- Returns
files – The files within the directory.
- Return type
list or None
- eco_helper.core.find.find_subdirs(parent: str, pattern: str)¶
Find subdirectories within a directory using glob.
- Parameters
parent (str) – The path to the parent directory.
pattern (str) – The pattern to use for finding subdirectories.
- Returns
subdirs – The subdirectories within the directory.
- Return type
list or None