Filetype Conversion¶

The convert submodule provides the functionality to convert data from one format to another. It is the basis of the eco_helper convert command.

Specifically, suported conversions are:

from	to	backwards
tabular	tabular	yes
MTX	tabular	yes
RDS (SeuratObject)	tabular	no

The backward column indicates whether the conversion is possible in both directions.

The tabular data formats supported are:

format	separator
csv	,
tsv	<tab>
txt	<space>

Usage¶

>>> eco_helper convert [--from <from>] [--to <to>] [--output <output>] <input>

where <input> is the input file and <output> is the output file. The --from and --to options are only required if the format is not explicit from the file suffix. They are not case sensitive. If the <output> option is not specified, the output file will be the same as the input file with the new format.

Full CLI¶

The full command line with all options for eco_helper convert is as follows:

usage: eco_helper convert [-h] [-o OUTPUT] [-r] [--from FMT_IN] [--to FMT_OUT]
                        [-i] [-d DATA] [-m METADATA [METADATA ...]]
                        input

This command converts between different formats. It is able to convert tabular
dataformats (csv,tsv,txt) to and from mtx format. It can also extract data
from a SeuratObject (stored in an RDS file) and convert the data to tabular
formats.

positional arguments:
input                 Input file.

options:
-h, --help            show this help message and exit
-o OUTPUT, --output OUTPUT
                        Output file. By default the same as the input with
                        altered suffix.
-r, --recursive       Use this to mark the output as a directory rather than
                        a target output file.
--from FMT_IN         The input format in case it is not evident from the
                        input file suffix.
--to FMT_OUT          The output format in case it is not evident from the
                        output file suffix.
-i, --index           Use this to also save the index (rownames) to tabular
                        output files. By default the index will NOT be written
                        to the output files. In case of SeuratObject data this
                        option only applies to metadata tables. The extracted
                        data will **always** have an index.
-d DATA, --data DATA  [Used only for Seurat-RDS] The data to extract from
                        the SeuratObject. If not specified, by default the
                        'counts' slot will be extracted.
-m METADATA [METADATA ...], --metadata METADATA [METADATA ...]
                        [Used only for Seurat-RDS] The metadata to extract
                        from the SeuratObject. This may be any number
                        accessible slots or attributes of the SeuratObject. If
                        not specified, by default a 'meta.data' attribute is
                        tried to be extracted.

eco_helper.convert.funcs module¶

These are the main conversion functions that are used by eco_helper convert.

eco_helper.convert.funcs.between_tabulars(filename: str, output: str, sep_in: str, sep_out: str, **kwargs)¶

Convert a tabular files to one another.

Parameters

filename (str) – The name of the tabular file.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep_in (str) – The separator to use for the input file.
sep_out (str) – The separator to use for the output file.

eco_helper.convert.funcs.filesuffix(filename)¶

Returns the suffix of the filename and a location of the delimiting dot. This will be -1 if NO dots were found! (Error indication)

Parameters

filename (str) – The name of the file.

Returns

str – The suffix of the file. Or the full filename if no dots were found to delimit a suffix.
int – The location of the dot in the filename. -1 if no dot was found.

eco_helper.convert.funcs.from_mtx_to_tabular(filename: str, output: str, sep: str, **kwargs)¶

Convert an mtx file to tabular file(s).

Parameters

filename (str) – The name of the mtx file.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.

eco_helper.convert.funcs.from_seurat_to_tabular(filename: str, output: str, sep: str, which: str, metadata: list, index: str)¶

Convert a Seurat object to tabular file(s) for the extracted data and any additional metadata.

Parameters

filename (str) – The name of the RDS file containing a SeuratObject.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.
which (str) – The data to extract from the SeuratObject. This can be any slot that is accessible via GetAssayData from the SeuratObject.
metadata (list) – The metadata to extract from the SeuratObject. This can be any number slots or attributes that is accessible from SeuratObject.
index (bool) – Whether to include the index in the output of metadata files.

eco_helper.convert.funcs.from_tabular_to_mtx(filename: str, output: str, sep: str, **kwargs)¶

Convert a tabular file to mtx files.

Parameters

filename (str) – The name of the tabular file.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.

eco_helper.convert.tabular module¶

Read and write tabular dataformats:

CSV (Comma Separated Values)
TSV (Tab Separated Values)
TXT (Space Separated Values)

eco_helper.convert.tabular.read(filename: str, sep: str, **kwargs)¶

Read a file into a pandas dataframe.

Parameters

filename (str) – The name of the file to read.
sep (str) – The separator to use.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.

eco_helper.convert.tabular.read_csv(filename: str, header: int = 0, index: Optional[int] = None, sep: Optional[str] = None, **kwargs)¶

Read a CSV file into a pandas dataframe.

Parameters

filename (str) – The name of the file to read.
header (int, optional) – The row number of the header.
sep (str, optional) – The separator to use. By default the separator is inferred based on the first column. If a semicolon is found, then the semicolon is used as the separator. If both commas and semicolons are found, then the comma is used.
index (str, optional) – The column to use as the index.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.

eco_helper.convert.tabular.read_tsv(filename: str, header: int = 0, index: Optional[int] = None, **kwargs)¶

Read a TSV file into a pandas dataframe.

Parameters

filename (str) – The name of the file to read.
header (int, optional) – The row number of the header.
index (str, optional) – The column to use as the index.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.

eco_helper.convert.tabular.read_txt(filename: str, header: int = 0, index: Optional[int] = None, **kwargs)¶

Read a TXT file into a pandas dataframe.

Note

This assumes the data is ” ” space separated.

Parameters

filename (str) – The name of the file to read.
header (int, optional) – The row number of the header.
index (str, optional) – The column to use as the index.
**kwargs – Any additional keyword arguments to pass to pd.read_csv.

eco_helper.convert.tabular.separators = {'csv': ',', 'tsv': '\t', 'txt': ' '}¶: The separators corresponding to supported tabular data formats.

eco_helper.convert.tabular.supported_formats = ('csv', 'tsv', 'txt')¶: The supported tabular data formats (file suffixes).

eco_helper.convert.tabular.write(df: pandas.DataFrame, filename: str, sep: str, **kwargs)¶

Write a pandas dataframe to a file.

Parameters

df (pd.DataFrame) – The dataframe to write.
filename (str) – The name of the file to write.
sep (str) – The separator to use.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.

eco_helper.convert.tabular.write_csv(filename: str, df: pandas.DataFrame, index: bool = False, sep: str = ',', **kwargs)¶

Write a pandas dataframe to a CSV file.

Parameters

filename (str) – The name of the file to write.
df (pd.DataFrame) – The dataframe to write.
index (bool, optional) – Whether to write the index.
sep (str, optional) – The separator to use.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.

eco_helper.convert.tabular.write_tsv(filename: str, df: pandas.DataFrame, index: bool = False, **kwargs)¶

Write a pandas dataframe to a TSV file.

Parameters

filename (str) – The name of the file to write.
df (pd.DataFrame) – The dataframe to write.
index (bool, optional) – Whether to write the index.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.

eco_helper.convert.tabular.write_txt(filename: str, df: pandas.DataFrame, index: bool = False, **kwargs)¶

Write a pandas dataframe to a TXT file.

Note

This will write the data as ” ” space separated.

Parameters

filename (str) – The name of the file to write.
df (pd.DataFrame) – The dataframe to write.
index (bool, optional) – Whether to write the index.
**kwargs – Any additional keyword arguments to pass to pd.to_csv.

eco_helper.convert.mtx module¶

Read and write Matrix Transfer Archive (MTX) formats.

eco_helper.convert.mtx.add_names(data: pandas.DataFrame, filename: str)¶

Adds column and row names to a dataframe.

Parameters

data (pandas.DataFrame) – The dataframe to add names to.
filename (str) – The name of the mtx file.

Returns

The dataframe with names.

Return type

pandas.DataFrame

eco_helper.convert.mtx.read(filename: str)¶

Reads an mtx file and returns a pandas dataframe.

Parameters: filename (str) – The name of the mtx file to read.
Returns: The dataframe containing the mtx data.
Return type: pandas.DataFrame

eco_helper.convert.mtx.write(data: pandas.DataFrame, filename: str)¶

Writes a dataframe to an mtx file.

Parameters

data (pandas.DataFrame) – The dataframe to write.
filename (str) – The name of the mtx file.

eco_helper.convert.seurat module¶

Defines wrapper functions to call the _seurat_rds_to_tabular.R script, to read from an RDS file storing SeuratObject and to write to tabular files of the data and metadata.

Note

While this module allows CLI use of the udnerlying Rscript, the Rscript itself also has a fully implemented CLI and can thus also be used as a stand-alone if desired.

eco_helper.convert.seurat.default_metadata = ['meta.data']¶: The default metadata to extract from a SeuratObject.

eco_helper.convert.seurat.supported_formats = ('rds', 'seurat')¶: The supported formats (i.e. file suffixes) to interpret as storing a SeuratObject.

eco_helper.convert.seurat.to_tabular(filename: str, output: str, sep: str, which: str, metadata: list, index: bool)¶

Convert a Seurat object to tabular file(s) for the extracted data and any additional metadata.

Parameters

filename (str) – The name of the RDS file containing a SeuratObject.
output (str) – The name of the output filename. The actually created files will use this as a base name and attach their symbols to the filenames.
sep (str) – The separator to use.
which (str) – The data to extract from the SeuratObject. This can be any slot that is accessible via GetAssayData from the SeuratObject.
metadata (list) – The metadata to extract from the SeuratObject. This can be any number slots or attributes that is accessible from SeuratObject.
index (bool) – Whether to include the index in the output of metadata files.