The documentation for XlsxReport is still work in progress. This file provides an overview of how XlsxReport works and a detailed description of the table template and its formatting options.
- How does XlsxReport work and what can it do?
- How does the table template look like
- The
sections
area of the table template - The
formats
area of the table template - The
conditional_formats
are of the table template - The
settings
area of the table template
To generate the formatted Excel report, XlsxReport requires tabular data, for example a CSV file, and a table template, a YAML file that contains instructions for the structure and formatting of the generated Excel report. The table template allows specifying which columns appear in the Excel file, the order of the columns, and which columns will be grouped together into sections
. Furthermore, the format of headers can be specified, and individual formats and conditional formats can be applied to the content of each column. Moreover, it is possible to specify section
supheaders that will be written above the header row into a merged cell.
Note: The term
section
ortemplate section
refers to a group of columns that are defined in the table template as a unit, and that are written to the Excel sheet as a block, or as a "section".
In general, changing values, i.e. the content of columns, is beyond the scope of XlsxReport. Specifically, it is not possible to use the table template for applying calculations to columns or for filtering and sorting rows. If such functionality is required it should be implemented in another script that can be run before XlsxReport. The only exception to this is the possibility to apply a log2 transformation on the values of a tag section
or label tag section
.
In addition, directly renaming column headers in the standard section
is not yet supported, but is planned for a future release.
The table template comprises four areas named sections
, formats
, conditional_formats
, and settings
. Each of these is encoded as a mapping in the YAML file:
sections: {}
formats: {}
conditional_formats: {}
settings: {}
The sections
area is used to select and organize columns, and to specify their formatting by assigning formats and conditional formats that are defined in the formats
and conditional_formats
areas. For example, a format can specify the number of decimal digits that are displayed or the text alignment, whereas conditional formats define cell appearance based on values. Basically, formats and conditional formats are defined by using the same parameters as in Excel. Finally, the settings
area of the table template is used to define general settings, such as row height, whether to apply an autofilter on the header row, or if an additional row for section
supheaders should be added.
Here is a basic example of a table template that generates an Excel file with three columns "Protein IDs", "Gene names", and "Unique peptides".
sections:
protein_evidence:
columns: ["Protein IDs", "Gene names", "Unique peptides"]
column_format: {"Unique peptides": "int"}
column_conditional_format: {"Unique peptides": "count"}
format: "str"
supheader: "Protein evidence"
formats:
int: {"align": "center", "num_format": "0"}
str: {"align": "left"}
header: {"bold": True, "align": "center", "bottom": 2}
supheader: {"bold": True, "align": "center", "bottom": 2}
conditional_formats:
count:
"type": "2_color_scale"
"min_type": "num"
"min_value": 0
"min_color": "#ffffbf"
"max_type": "percentile"
"max_value": 99.5
"max_color": "#f25540"
settings:
write_supheader: True
This table template example contains only one template section
with the internal name "protein_evidence". The columns
keyword is used to select which columns will appear in the Excel report. Using the argument format
the default format "str" is applied to all columns within the template section
. In addition, the "int" format and the "count" conditional format are specifically applied to the values of the "Unique peptides" column, overriding the defaults. Finally, a supheader "Protein evidence" is defined, which will be written to the Excel sheet above the header row. Writing a supheader row must be specifically enabled, which is done in the settings
area with the parameter "write_supheader: True". In the formats
area the two formats "int" and "str" are defined that have been referenced in the "protein_evidence" template section
. In addition, the format specified with the name "header" and "supheader" are special formats that are always applied to the header and supheader rows. The conditional_formats
area contains the description of the conditional format called "count", which has been assigned to the "Unique peptides" column.
Each entry in the sections
area is defined by a unique internal name and contains parameters that are used to select one or multiple columns, and parameters that define how the columns are formatted in the Excel report. There are currently three different types of template sections
, each provides a different way how the columns for the section are selected. In addition, the parameters specified in a template section
describe how the column values and headers will be formatted, if conditional formats are applied, and other settings. The order of template sections
in the table template determines the order in which the sections, and thus the columns, are written to the Excel file.
When using a standard template section
, you can directly select the columns that should be written to the Excel sheet by specifying a list of column names. This is useful when column names are constant in your output and don’t change between experiments, such as a "Protein IDs" column. Formats and conditional formats can be applied to the whole section
or to individual columns.
columns
A list of column names that should be written to the Excel sheet. The order of the columns in the sequence defines the order in which the columns will be written to the Excel sheet. Columns that are not present in the input table are ignored.- Type:
sequence[string]
- Is required
- Type:
For additional optional parameters refer to the common optional template section parameters.
You can use the tag template section
to select a group of columns based on a regular expression pattern that is matched against all column names. This is useful when the column names are not constant between multiple tables, but contain a constant common and a variable part. For example, when you have columns named "Intensity Sample_1", "Intensity Sample_2", and so on, you can use a tag template section
to select all columns that start with "Intensity".
TIP: Use the regular expression anchors
^
and$
to specify if the column needs to begin or end with the tag. To exclude columns that are an exact match to the specified tag, add a.
in front or after the tag to indicate that at least one additional character must be present.
-
tag
A regular expression pattern that is matched against all column names. Columns that match the pattern are selected for the section. Columns are written to the Excel sheet in the order they appear in the input table.- Type:
string
- Is required
- Type:
-
remove_tag
If True, the matched regular expression pattern of thetag
is removed from the column name in the Excel sheet. Removing the tag is useful in combination with adding asupheader
to the section, as in this way the removed tag can be displayed only in the supheader row.- Type:
bool
- Default:
False
- Is optional
- Type:
-
log2
Use this parameter to apply a log2 transformation to the values of all selected columns in thetag template section
. The global setting parameterslog2_tag
andevaluate_log2_transformation
affect the behavior of thetag template section
when thelog2
parameter is used. Refer to the description of the table template settings for more information.- Type:
bool
- Default:
False
- Is optional
- Type:
For additional optional parameters refer to the common optional template section parameters.
Let's assume your have the following table template, containing a tag sample section
with the name "intensities":
sections:
intensities: {tag: "^Intensity."}
and a CSV file with the following columns
Protein IDs | Intensity sample_1 | Intensity sample_2 | Mean Intensity |
---|---|---|---|
P40238 | 1,000 | 2,000 | 1,500 |
When generating a report, XlsxReport selects columns that match the regular expression pattern ^Intensity.
, which results in the selection of the columns "Intensity sample_1" and "Intensity sample_2". The specified pattern requires that a column starts with "Intensity" and that "Intensity" is followed by an additional character. This pattern does not match the column "Mean Intensity", which is therefore not included in the section
.
The label tag template section
is an extension of the tag template section
that enables more precise control over the selection of columns. While you only specify the constant part of a column name with the tag
parameter in the tag template section
, the label tag template section
allows you to also specify the variable part of the columns that should be included in the section. This is achieved with the labels
parameter, which is a list representing the variable part of the columns that should be included in the section
. For a column to be selected, the column name must exactly contain the constant part specified in the tag
parameter and one of the variable parts specified in the labels
parameter.
As the label tag template section
requires preexisting knowledge of the variable part of column names, such as sample names that are expected to change between datasets, it is typically not used in table templates
that are intended as a general template for a specific data analysis pipeline. However, it is very useful to dynamically adjust a table template to a specific dataset. For example, when you want to create a report but only select a subset of the columns that are matched by the tag
parameter.
-
tag
A regular expression pattern that is matched against all column names. Columns that match the pattern are selected for the section. Columns are written to the Excel sheet in the order they appear in the input table.- Type:
string
- Is required
- Type:
-
labels
A sequence of strings that represent the variable part of the column names that should be included in the section.- Type:
sequence[string]
- Is required
- Type:
-
remove_tag
If True, the matched regular expression pattern of thetag
is removed from the column name in the Excel sheet. Removing the tag is useful in combination with adding asupheader
to the section, as in this way the removed tag can be displayed only in the supheader row.- Type:
bool
- Default:
False
- Is optional
- Type:
-
log2
Use this parameter to apply a log2 transformation to the values of all selected columns in thetag template section
. The global setting parameterslog2_tag
andevaluate_log2_transformation
affect the behavior of thetag template section
when thelog2
parameter is used. Refer to the description of the table template settings for more information.- Type:
bool
- Default:
False
- Is optional
- Type:
For additional optional parameters refer to the common optional template section parameters.
The comparison template section
allows you to select and group columns that represent pair-wise comparisons of conditions. Examples of such columns include statistical comparisons of differences between two experimental conditions or the ratio between the mean intensities of those conditions. To be included in a comparison section
, column names must follow a specific logic:
- Condition Description: Column names must contain a part that describes the conditions being compared, such as "Control vs. Condition". It is crucial to have a consistent symbol between the two conditions to identify the comparison columns. This symbol is defined by the
tag
parameter. In the example, thetag
is " vs. ". - Comparison Type: Column names must also include a part that describes the type of comparison, such as "P-value" or "Ratio". Multiple types of comparisons can be included in a
comparison section
, and the substrings that identify the comparison type are listed in thecolumns
parameter.
-
tag
A string that corresponds to the comparison symbol between two conditions. Thetag
is used to pre-select columns that might belong to thecomparison section
.- Type:
string
- Is required
- Type:
-
columns
A sequence of strings that correspond to the substrings that identify the comparison type in the column names.- Type:
sequence[string]
- Is required
- Type:
-
replace_comparison_tag
Optional parameter that allows you to replace the comparison tag with a different string. This can be useful if you want to change the comparison tag in the Excel sheet to make it more readable.- Type:
string
- Is optional
- Type:
-
remove_tag
If True, the condition comparison string is removed from the columns, leaving only the comparison type that was specified with thecolumns
parameter. This option is useful in combination with adding asupheader
to the section, as in this way the condition comparison string is only displayed in the supheader row.- Type:
bool
- Default:
False
- Is optional
- Type:
For additional optional parameters refer to the common optional template section parameters.
NOTE: Several optional parameters work slightly different in this
section
type The parameterscolumn_format
andcolumn_conditional_format
are used to apply formats and conditional formats to columns that contain a specific comparison type, for example "P-value" or "Ratio". The column names that are specified for these parameters therefore need to correspond to entries from thecolumns
parameter. Thesupheader
parameter has no effect, as the supheader is automatically generated from the column names and corresponds to the conditions that are compared.
Let's look at an example to illustrate how the comparison template section
works. Assume you have the following columns in your input table:
- "Ratio Control vs. Condition"
- "Ratio Control vs. Another condition"
- "P-value Control vs. Condition"
- "P-value Control vs. Another condition"
- "Intensity Control vs. Condition"
- "Intensity Control vs. Another condition"
And the following table template:
sections:
statistical_comparison:
tag: " vs. "
columns: ["P-value", "Ratio"]
When generating a report, XlsxReport first collects all comparison columns, then groups them according to the conditions that are compared. All columns that compare the same two conditions are then used to write a separate section
to the Excel sheet:
In the example the first section contains the columns:
- "Ratio Control vs. Condition"
- "P-value Control vs. Condition"
And the second section contains the columns:
- "Ratio Control vs. Another condition"
- "P-value Control vs. Another condition"
The columns "Intensity Control vs. Condition" and "Intensity Control vs. Another condition" are not included in any section
, since "Intensity" was not listed in the columns
parameter of the comparison section
.
The following optional parameters can be used in all template sections
types to specify formatting and other settings:
-
format
The default format that is applied to all columns in thesection
. The format must be defined in theformats
area of the table template.- Type:
string
- Type:
-
column_format
A mapping that specifies formats that are applied to individual columns in thesection
. The column format overrides the default format. The keys are column names, and the values are format names that are defined in theformats
area of the table template.- Type:
mapping[string, string]
- Type:
-
conditional_format
The name of the conditional format that is applied to the values of all columns in thesection
. The conditional format must be defined in theconditional_formats
area of the table template.- Type:
string
- Type:
-
column_conditional_format
A mapping that specifies the conditional format that is applied to the values of individual columns in thesection
. The keys are column names, and the values are conditional format names that are defined in theconditional_formats
area of the table template.- Type:
mapping[string, string]
- Type:
-
header_format
Allows to specify additional formatting properties that are applied to the header format of thesection
. The specified formatting properties are added to the default "header" format that can be defined in theformats
area of the table template. For more information about how to define formatting properties refer to the documentation of the formats area.- Type:
mapping[string, mapping]
- Type:
-
supheader
A string that is written to the Excel sheet above the header row of thesection
. Thesupheader
is written to a merged cell that spans all columns of thesection
. Thesupheader
is only written if the global settingwrite_supheader
is set toTrue
.- Type:
string
- Type:
-
supheader_format
Allows to specify additional formatting properties that are applied to the super header format of thesection
. The specified formatting properties are added to the default "supheader" format that can be defined in theformats
area of the table template. For more information about how to define formatting properties refer to the documentation of the formats area.- Type:
mapping[string, mapping]
- Type:
-
width
Defines the column widths in pixels. The width is applied to all columns in thesection
and overwrites the default column width that is defined in thesettings
area of the table template.- Type:
float
- Type:
-
border
If set to True, a thick border line is added to the left and right side of the section.- Type:
boolean
- Default:
False
- Type:
In the formats
area the formats must be defined that are applied in the sections
area of the table template. In addition, by specifying a format called "header" and "supheader" it is possible to define the default formats for the header and supheader row.
Refer to the XlsxWriter documentation for additional information which parameters can be defined for a format. Note that entries of the Property column from the documentation correspond to the keys that can be defined in the format mapping.
Here is an example of a formats
area that defines the formats "int", "float", "str", "header", and "supheader":
formats:
int: {"align": "center", "num_format": "0"}
float: {"align": "center", "num_format": "0.00"}
str: {"align": "left", "num_format": "0"}
header: {
"bold": True,
"align": "center",
"valign": "vcenter",
"bottom": 2,
"top": 2,
"text_wrap": True
}
supheader: {
"bold": True,
"align": "center",
"valign": "vcenter",
"bottom": 2,
"left": 2,
"right": 2,
"text_wrap": True
}
In the conditional_formats
area the conditional formats must be defined that are applied in the template sections.
The type of conditional format needs to be defined with the type
parameter. Currently only the types 2_color_scale
, 3_color_scale
, and data_bar
are supported. In addition, the formatting parameters corresponding to the selected type must be defined.
Refer to the XlsxWriter documentation for additional information which parameters can be defined for different conditional format types.
Here is an example of a conditional_formats
area that defines a 3-color scale conditional format called "intensity":
conditional_formats:
intensity: {
"type": "3_color_scale",
"min_type": "min", "min_color": "#2c7bb6",
"mid_type": "percentile", "mid_value": 50, "mid_color": "#ffffbf",
"max_type": "max", "max_color": "#f25540"
}
The settings
area is used to define general settings affecting all content that is written to the Excel sheet.
-
supheader_height
Defines the supheader row height in pixels.- Type:
float
- Default:
20
- Type:
-
header_height
Defines the header row height in pixels.- Type:
float
- Default:
20
- Type:
-
column_width
Defines the default column widths in pixels. This parameter is overwritten by thewidth
parameter specified in thetemplate sections
.- Type:
float
- Default:
64
- Type:
-
log2_tag
If specified, this tag is added as a suffix to the column headers of atag template section
or alabel tag template section
to indicate that a log2 transformation has been applied. Thelog2_tag
is only added if the section parameterlog2
is set toTrue
. If the section parameterremove_tag
is set to True, thelog2_tag
is added to the section supheader instead of the column headers.- Type:
str
- Default ""
- Type:
-
append_remaining_columns
If True, all remaining columns that are not present in any section are added to the end of the Excel sheet, and the section of appended columns is hidden.- Type:
bool
- Default:
False
- Type:
-
write_supheader
If True, a supheader row is added above the header row.- Type:
bool
- Default:
False
- Type:
-
evaluate_log2_transformation
Use this setting with caution! If True, column values are evaluated if they appear to be already log transformed before a log2 transformation is applied. Assumes that values are log transformed if all values in a column are smaller or equal to 64. Intensities values (and intensity peak areas) reported by tandem mass spectrometry typically range from 10^1 to 10^12. To reach log2 transformed values greater than 64, intensities would need to be higher than 10^19, which seems to be very unlikely to be ever encountered.- Type:
bool
- Default:
False
- Type:
-
remove_duplicate_columns
If True, columns that are already present in a section are removed from subsequent sections. This option guarantees that columns are not duplicated in the Excel sheet.- Type:
bool
- Default:
True
- Type:
-
add_autofilter
If True, adds an Excel auto filter to the header row.- Type:
bool
- Default:
True
- Type:
-
freeze_cols
If a value larger than 0 is specified, freeze pane is applied in the Excel sheet. The selected row for freezing will always be the header row, the position of the selected column is defined by specified value.- Type:
int
- Default:
1
- Type: