-
Notifications
You must be signed in to change notification settings - Fork 15.6k
feat(Charts): multi page pdf report #35014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit introduces functionality to export multi-page PDF reports for charts of the table type. Key changes include: 1. **PDF Generation Library:** * WeasyPrint is used for converting HTML and CSS to PDF when the `PLAYWRIGHT_REPORTS_AND_THUMBNAILS` feature flag is false. (The flag was determined to be false during implementation). 2. **EmailNotification Enhancement (`superset/reports/notifications/email.py`):** * The `_get_content` method in the `EmailNotification` class now checks if the report is for a table and if the requested format is PDF. * If so, it generates a PDF using WeasyPrint. * The generated PDF includes: * The full table data (verified to be fetched completely). * Report description (typically includes chart title). * Pagination for large tables. * Customizable headers and footers. 3. **Configuration Options (`superset/config.py`):** * I added new configuration options to customize PDF exports: * `PDF_EXPORT_HEADERS_FOOTERS_ENABLED` (boolean): To enable/disable headers/footers. * `PDF_EXPORT_HEADER_TEMPLATE` (string): Template for PDF headers. Placeholders: `{report_name}`, `{page_number}`, `{total_pages}`. * `PDF_EXPORT_FOOTER_TEMPLATE` (string): Template for PDF footers. Placeholders: `{generation_date}`, `{report_name}`. * `PDF_EXPORT_PAGE_SIZE` (string): Default page size (e.g., "A4", "Letter"). * `PDF_EXPORT_ORIENTATION` (string): Default page orientation (e.g., "portrait", "landscape"). * These configurations are integrated into the PDF generation logic in `EmailNotification`. 4. **Testing (`tests/unit_tests/reports/notifications/email_tests.py`):** * I added comprehensive unit tests for the new PDF generation functionality. * Tests cover various scenarios, including: * Conditional PDF generation. * Correctness of HTML and CSS passed to WeasyPrint. * Header/footer rendering based on configuration templates and enabled status. * Application of page size and orientation. * Fallback to standard HTML email for non-PDF formats. This enhancement allows you to receive detailed, multi-page PDF versions of your table-based reports via email, complete with proper layout and metadata.
This fix addresses an issue where PDF reports for table charts were being generated as screenshots instead of multi-page PDFs based on the full dataset. The root cause was that the report execution logic did not correctly handle data preparation for PDF chart reports. It was defaulting to a screenshot-based PDF generation for all PDF reports. The following changes were made: - `superset/reports/notifications/base.py`: Added a `report_format` attribute to the `NotificationContent` dataclass. This allows the report format to be passed to the notification handlers. - `superset/commands/report/execute.py`: Modified the `_get_notification_content` method to: - Fetch the full dataset as a DataFrame (`embedded_data`) for chart reports with the PDF format. - Continue using screenshot-based PDF generation for dashboard reports. - Pass the `report_format` to the `NotificationContent` object. These changes ensure that for chart reports, the `EmailNotification` handler receives the necessary data and report format to trigger the existing WeasyPrint logic, which correctly generates a multi-page PDF from the full dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Status |
---|---|---|
HTML escaping disabled in DataFrame rendering ▹ view | ✅ Fix detected |
Files scanned
File Path | Reviewed |
---|---|
superset/reports/notifications/base.py | ✅ |
superset/reports/notifications/email.py | ✅ |
superset/commands/report/execute.py | ✅ |
superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
<body> | ||
<div class="report-description">{description}</div> | ||
<br> | ||
{df.to_html(na_rep="", index=True, escape=False)} |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be especially important with
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#python-library
First warning there and the security part of their docs mentioning this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good reference. The WeasyPrint docs emphasize HTML injection risks even further. Both DataFrame HTML and WeasyPrint HTML parsing need to be secured. We should:
- Keep
escape=True
in DataFrame.to_html() - Add additional HTML sanitization before WeasyPrint processing
- Consider WeasyPrint's URL fetching settings to prevent local file access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify your response? We're discussing important security considerations around HTML escaping in WeasyPrint and DataFrame rendering - what specific aspect are you commenting on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how to fix the security issue:
Find this line in the PDF generation section:
df.to_html(na_rep="", index=True, escape=False)
Change it to:
df.to_html(na_rep="", index=True, escape=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need help updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update 2 sections:
- In the PDF generation section:
# Around line 190, change:
df.to_html(na_rep="", index=True, escape=False)
# to:
df.to_html(na_rep="", index=True, escape=True)
- Add HTML sanitization after that:
html_table = nh3.clean(df_html, tags=TABLE_TAGS, attributes=ALLOWED_TABLE_ATTRIBUTES)
Let me know if you need help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the exact steps to make the security fixes:
- Open superset/reports/notifications/email.py
- Go to line 190 (PDF section)
- Replace this line:
df.to_html(na_rep="", index=True, escape=False)
with:
df.to_html(na_rep="", index=True, escape=True)
Need me to explain any of these steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Agent Run #28dabd
Actionable Suggestions - 1
-
tests/unit_tests/reports/notifications/email_tests.py - 1
- Test logic error in HTML escaping validation · Line 272-274
Review Details
-
Files reviewed - 5 · Commit Range:
583b903..dd59943
- superset/commands/report/execute.py
- superset/config.py
- superset/reports/notifications/base.py
- superset/reports/notifications/email.py
- tests/unit_tests/reports/notifications/email_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review
- Manually triggers a full AI review. -
/pause
- Pauses automatic reviews on this pull request. -
/resume
- Resumes automatic reviews. -
/resolve
- Marks all Bito-posted review comments as resolved. -
/abort
- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Default Agent
You can customize the agent settings here or contact your Bito workspace admin at [email protected].
Documentation & Help
# Check that pandas escapes HTML by default | ||
mock_content.embedded_data = pd.DataFrame({'col1': ['<script>alert(1)</script>']}) | ||
email_content_result_escaped = notification._get_content() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test logic error: The test modifies mock_content.embedded_data
after creating the EmailNotification
instance but expects the second _get_content()
call to use the new data. Create a new EmailNotification
instance with the modified content to properly test HTML escaping.
Code suggestion
Check the AI-generated fix before applying
# Check that pandas escapes HTML by default | |
mock_content.embedded_data = pd.DataFrame({'col1': ['<script>alert(1)</script>']}) | |
email_content_result_escaped = notification._get_content() | |
# Check that pandas escapes HTML by default | |
mock_content.embedded_data = pd.DataFrame({'col1': ['<script>alert(1)</script>']}) | |
notification_escaped = EmailNotification(recipient=MagicMock(), content=mock_content) | |
email_content_result_escaped = notification_escaped._get_content() |
Code Review Run #28dabd
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:
A series of checks will now run when you make a git commit. Alternatively it is possible to run pre-commit by running pre-commit manually:
|
# Retrieve PDF export configurations | ||
pdf_headers_footers_enabled = app.config.get("PDF_EXPORT_HEADERS_FOOTERS_ENABLED", True) | ||
pdf_header_template = app.config.get("PDF_EXPORT_HEADER_TEMPLATE", "Report: {report_name} - Page {page_number} of {total_pages}") | ||
pdf_footer_template = app.config.get("PDF_EXPORT_FOOTER_TEMPLATE", "Generated: {generation_date}") | ||
pdf_page_size = app.config.get("PDF_EXPORT_PAGE_SIZE", "A4") | ||
pdf_orientation = app.config.get("PDF_EXPORT_ORIENTATION", "portrait") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config file has defaults, so there's no need to set them again here.
Thank you for the contribution @inamdarzaid! |
Co-authored-by: Elizabeth Thompson <[email protected]>
Multi-Page PDF Generation Implementation
Overview
This implementation adds support for generating multi-page PDFs from table data in Superset reports, replacing screenshot-based PDFs with HTML-to-PDF conversion using WeasyPrint.
Changes Made
1. Added WeasyPrint Dependency
File:
requirements/base.in
weasyprint>=61.0
as a new dependency for HTML-to-PDF conversion2. Enhanced PDF Utility Functions
File:
superset/utils/pdf.py
Added new functions for HTML-to-PDF conversion:
generate_table_html()
: Converts pandas DataFrame to properly formatted HTMLbuild_pdf_from_html()
: Converts HTML to PDF using WeasyPrintbuild_pdf_from_dataframe()
: Complete workflow for DataFrame to PDF3. Modified Report Execution Logic
File:
superset/commands/report/execute.py
Enhanced the
_get_pdf()
method:table
,pivot_table
,pivot_table_v2
)_get_embedded_data()
to get complete dataset (not just visible data)Key Features
1. Complete Data Export
embedded_data
which contains the full dataset fromChartDataResultFormat.JSON
2. Professional Multi-Page Layout
3. CSS Features for PDF
4. Error Handling
Usage Flow
_get_embedded_data()
fetches complete dataset as DataFrameBenefits
Before (Screenshot-based)
After (HTML-to-PDF)
Installation Requirements
After these changes, you'll need to:
pip install weasyprint>=61.0
or use the updated requirementsCompatibility
table
,pivot_table
, andpivot_table_v2
chartsFuture Enhancements
Potential improvements: