Filetype Conversion¶
The convert submodule provides the functionality to convert data from one format to another. It is the basis of the eco_helper convert command.
Specifically, suported conversions are:
from |
to |
backwards |
|---|---|---|
tabular |
tabular |
yes |
MTX |
tabular |
yes |
RDS (SeuratObject) |
tabular |
no |
The backward column indicates whether the conversion is possible in both directions.
The tabular data formats supported are:
format |
separator |
|---|---|
csv |
, |
tsv |
<tab> |
txt |
<space> |
Usage¶
>>> eco_helper convert [--from <from>] [--to <to>] [--output <output>] <input>
where <input> is the input file and <output> is the output file. The --from and --to options are only required if the format is not explicit from the file suffix.
They are not case sensitive. If the <output> option is not specified, the output file will be the same as the input file with the new format.
Full CLI¶
The full command line with all options for eco_helper convert is as follows:
usage: eco_helper convert [-h] [-o OUTPUT] [-r] [--from FMT_IN] [--to FMT_OUT]
[-i] [-d DATA] [-m METADATA [METADATA ...]]
input
This command converts between different formats. It is able to convert tabular
dataformats (csv,tsv,txt) to and from mtx format. It can also extract data
from a SeuratObject (stored in an RDS file) and convert the data to tabular
formats.
positional arguments:
input Input file.
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output file. By default the same as the input with
altered suffix.
-r, --recursive Use this to mark the output as a directory rather than
a target output file.
--from FMT_IN The input format in case it is not evident from the
input file suffix.
--to FMT_OUT The output format in case it is not evident from the
output file suffix.
-i, --index Use this to also save the index (rownames) to tabular
output files. By default the index will NOT be written
to the output files. In case of SeuratObject data this
option only applies to metadata tables. The extracted
data will **always** have an index.
-d DATA, --data DATA [Used only for Seurat-RDS] The data to extract from
the SeuratObject. If not specified, by default the
'counts' slot will be extracted.
-m METADATA [METADATA ...], --metadata METADATA [METADATA ...]
[Used only for Seurat-RDS] The metadata to extract
from the SeuratObject. This may be any number
accessible slots or attributes of the SeuratObject. If
not specified, by default a 'meta.data' attribute is
tried to be extracted.
eco_helper.convert.funcs module¶
These are the main conversion functions that are used by eco_helper convert.
- eco_helper.convert.funcs.between_tabulars(filename: str, output: str, sep_in: str, sep_out: str, **kwargs)¶
Convert a tabular files to one another.
- Parameters
filename (str) – The name of the tabular file.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep_in (str) – The separator to use for the input file.
sep_out (str) – The separator to use for the output file.
- eco_helper.convert.funcs.filesuffix(filename)¶
Returns the suffix of the filename and a location of the delimiting dot. This will be -1 if NO dots were found! (Error indication)
- Parameters
filename (str) – The name of the file.
- Returns
str – The suffix of the file. Or the full filename if no dots were found to delimit a suffix.
int – The location of the dot in the filename. -1 if no dot was found.
- eco_helper.convert.funcs.from_mtx_to_tabular(filename: str, output: str, sep: str, **kwargs)¶
Convert an mtx file to tabular file(s).
- Parameters
filename (str) – The name of the mtx file.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.
- eco_helper.convert.funcs.from_seurat_to_tabular(filename: str, output: str, sep: str, which: str, metadata: list, index: str)¶
Convert a Seurat object to tabular file(s) for the extracted data and any additional metadata.
- Parameters
filename (str) – The name of the RDS file containing a SeuratObject.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.
which (str) – The data to extract from the SeuratObject. This can be any slot that is accessible via GetAssayData from the SeuratObject.
metadata (list) – The metadata to extract from the SeuratObject. This can be any number slots or attributes that is accessible from SeuratObject.
index (bool) – Whether to include the index in the output of metadata files.
- eco_helper.convert.funcs.from_tabular_to_mtx(filename: str, output: str, sep: str, **kwargs)¶
Convert a tabular file to mtx files.
- Parameters
filename (str) – The name of the tabular file.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.
eco_helper.convert.tabular module¶
- Read and write tabular dataformats:
CSV (Comma Separated Values)
TSV (Tab Separated Values)
TXT (Space Separated Values)
- eco_helper.convert.tabular.read(filename: str, sep: str, **kwargs)¶
Read a file into a pandas dataframe.
- Parameters
filename (str) – The name of the file to read.
sep (str) – The separator to use.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.
- eco_helper.convert.tabular.read_csv(filename: str, header: int = 0, index: Optional[int] = None, sep: Optional[str] = None, **kwargs)¶
Read a CSV file into a pandas dataframe.
- Parameters
filename (str) – The name of the file to read.
header (int, optional) – The row number of the header.
sep (str, optional) – The separator to use. By default the separator is inferred based on the first column. If a semicolon is found, then the semicolon is used as the separator. If both commas and semicolons are found, then the comma is used.
index (str, optional) – The column to use as the index.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.
- eco_helper.convert.tabular.read_tsv(filename: str, header: int = 0, index: Optional[int] = None, **kwargs)¶
Read a TSV file into a pandas dataframe.
- Parameters
filename (str) – The name of the file to read.
header (int, optional) – The row number of the header.
index (str, optional) – The column to use as the index.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.
- eco_helper.convert.tabular.read_txt(filename: str, header: int = 0, index: Optional[int] = None, **kwargs)¶
Read a TXT file into a pandas dataframe.
Note
This assumes the data is ” ” space separated.
- Parameters
filename (str) – The name of the file to read.
header (int, optional) – The row number of the header.
index (str, optional) – The column to use as the index.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.
- eco_helper.convert.tabular.separators = {'csv': ',', 'tsv': '\t', 'txt': ' '}¶
The separators corresponding to supported tabular data formats.
- eco_helper.convert.tabular.supported_formats = ('csv', 'tsv', 'txt')¶
The supported tabular data formats (file suffixes).
- eco_helper.convert.tabular.write(df: pandas.DataFrame, filename: str, sep: str, **kwargs)¶
Write a pandas dataframe to a file.
- Parameters
df (pd.DataFrame) – The dataframe to write.
filename (str) – The name of the file to write.
sep (str) – The separator to use.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.
- eco_helper.convert.tabular.write_csv(filename: str, df: pandas.DataFrame, index: bool = False, sep: str = ',', **kwargs)¶
Write a pandas dataframe to a CSV file.
- Parameters
filename (str) – The name of the file to write.
df (pd.DataFrame) – The dataframe to write.
index (bool, optional) – Whether to write the index.
sep (str, optional) – The separator to use.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.
- eco_helper.convert.tabular.write_tsv(filename: str, df: pandas.DataFrame, index: bool = False, **kwargs)¶
Write a pandas dataframe to a TSV file.
- Parameters
filename (str) – The name of the file to write.
df (pd.DataFrame) – The dataframe to write.
index (bool, optional) – Whether to write the index.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.
- eco_helper.convert.tabular.write_txt(filename: str, df: pandas.DataFrame, index: bool = False, **kwargs)¶
Write a pandas dataframe to a TXT file.
Note
This will write the data as ” ” space separated.
- Parameters
filename (str) – The name of the file to write.
df (pd.DataFrame) – The dataframe to write.
index (bool, optional) – Whether to write the index.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.
eco_helper.convert.mtx module¶
Read and write Matrix Transfer Archive (MTX) formats.
- eco_helper.convert.mtx.add_names(data: pandas.DataFrame, filename: str)¶
Adds column and row names to a dataframe.
- Parameters
data (pandas.DataFrame) – The dataframe to add names to.
filename (str) – The name of the mtx file.
- Returns
The dataframe with names.
- Return type
pandas.DataFrame
- eco_helper.convert.mtx.read(filename: str)¶
Reads an mtx file and returns a pandas dataframe.
- Parameters
filename (str) – The name of the mtx file to read.
- Returns
The dataframe containing the mtx data.
- Return type
pandas.DataFrame
- eco_helper.convert.mtx.write(data: pandas.DataFrame, filename: str)¶
Writes a dataframe to an mtx file.
- Parameters
data (pandas.DataFrame) – The dataframe to write.
filename (str) – The name of the mtx file.
eco_helper.convert.seurat module¶
Defines wrapper functions to call the _seurat_rds_to_tabular.R script, to read from an RDS file storing SeuratObject and to write to tabular files of the data and metadata.
Note
While this module allows CLI use of the udnerlying Rscript, the Rscript itself also has a fully implemented CLI and can thus also be used as a stand-alone if desired.
- eco_helper.convert.seurat.default_metadata = ['meta.data']¶
The default metadata to extract from a SeuratObject.
- eco_helper.convert.seurat.supported_formats = ('rds', 'seurat')¶
The supported formats (i.e. file suffixes) to interpret as storing a SeuratObject.
- eco_helper.convert.seurat.to_tabular(filename: str, output: str, sep: str, which: str, metadata: list, index: bool)¶
Convert a Seurat object to tabular file(s) for the extracted data and any additional metadata.
- Parameters
filename (str) – The name of the RDS file containing a SeuratObject.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.
which (str) – The data to extract from the SeuratObject. This can be any slot that is accessible via GetAssayData from the SeuratObject.
metadata (list) – The metadata to extract from the SeuratObject. This can be any number slots or attributes that is accessible from SeuratObject.
index (bool) – Whether to include the index in the output of metadata files.