Skip to content

Conversation

@sonofagl1tch
Copy link
Contributor

@sonofagl1tch sonofagl1tch commented Jan 1, 2026

Pull Request: Scan Results Import Feature

Context

This PR implements the scan results import feature requested in issue #8972. It enables users to import Prowler CLI scan results (JSON/OCSF and CSV formats) into the Prowler API, allowing distributed resources protected by network boundaries to send results to a central Prowler instance for visualization in the UI.

Closes #8972

Description

This feature adds a complete end-to-end solution for importing Prowler CLI scan results:

Backend (API)

  • Added IMPORTED to Scan.TriggerChoices for tracking imported scans
  • Created api/src/backend/api/parsers/ module with:
    • ocsf_parser.py: Parses Prowler JSON/OCSF output format
    • csv_parser.py: Parses Prowler CSV output (semicolon/comma-delimited)
  • Created api/src/backend/api/services/scan_import.py with:
    • Format detection (JSON vs CSV)
    • Provider resolution/creation
    • Bulk resource and finding creation
    • Transaction-safe import operations
  • Added POST /api/v1/scans/import endpoint supporting:
    • Multipart file upload
    • Inline JSON body
    • Provider selection or auto-creation
    • File size limit up to 1GB

Frontend (UI)

  • Created import components in ui/components/scans/scan-import/:
    • ScanImportSection: Main container with collapsible UI
    • ScanImportDropzone: Drag-and-drop file upload (.json, .csv)
    • ScanImportForm: Provider selection with validation
    • ScanImportProgress: Upload/processing status display
  • Added server action ui/actions/scans/import-scan.ts for API integration
  • Added API route proxy ui/app/api/scans/import/route.ts for large file handling
  • Integrated import section into scans page
  • Increased Next.js server actions body size limit to 1GB
Prowler-import-scan-csv-json prowler-viewing-imported-scan

Documentation

  • Added user guide: docs/user-guide/tutorials/prowler-app-scan-import.mdx
  • Updated docs navigation in docs/docs.json

Testing

  • Unit tests for parsers and import service
  • API endpoint tests with authentication/permission checks
  • Playwright E2E tests for UI import flow
  • Manual test fixtures with real Prowler output samples

Implementation Status

All implementation tasks are complete. The feature has been fully implemented and tested:

  • Backend: Parsers, import service, API endpoint, and tests ✅
  • Frontend: UI components, server action, API route, and E2E tests ✅
  • Documentation: User guide and API changelog ✅
  • Manual testing: JSON, CSV, large files, and error scenarios ✅

Remaining items for PR submission:

  • Screenshots for PR checklist (mobile, tablet, desktop views)

Steps to Review

  1. Backend Review:

    • Review parsers in api/src/backend/api/parsers/
    • Review import service in api/src/backend/api/services/scan_import.py
    • Review API endpoint in api/src/backend/api/v1/views.py (ScanImportView)
    • Run backend tests: poetry run pytest api/src/backend/api/tests/test_ocsf_parser.py api/src/backend/api/tests/test_csv_parser.py api/src/backend/api/tests/test_scan_import_service.py api/src/backend/api/tests/test_scan_import_view.py -v
  2. Frontend Review:

    • Review components in ui/components/scans/scan-import/
    • Review server action in ui/actions/scans/import-scan.ts
    • Review API route in ui/app/api/scans/import/route.ts
    • Review page integration in ui/app/(prowler)/scans/page.tsx
  3. Manual Testing:

    • Start the development environment: docker-compose -f docker-compose-dev.yml up
    • Navigate to the Scans page
    • Test importing a JSON file from Prowler CLI output
    • Test importing a CSV file from Prowler CLI output
    • Verify findings appear correctly in the UI
  4. Documentation Review:

    • Review docs/user-guide/tutorials/prowler-app-scan-import.mdx

Checklist

  • Are there new checks included in this PR? No
  • Review if the code is being covered by tests
  • Review if code is being documented following the Google Python Style Guide
  • Review if backport is needed
  • Review if is needed to change the README.md - Not needed, feature is documented in docs/
  • Ensure new entries are added to CHANGELOG.md - Added to api/CHANGELOG.md and ui/CHANGELOG.md

UI

  • All issue/task requirements work as expected on the UI
  • Screenshots/Video of the functionality flow - Mobile (X < 640px)
  • Screenshots/Video of the functionality flow - Tablet (640px > X < 1024px)
  • Screenshots/Video of the functionality flow - Desktop (X > 1024px)
  • Ensure new entries are added to ui/CHANGELOG.md

API

  • Verify if API specs need to be regenerated - OpenAPI schema decorators added
  • Check if version updates are required (e.g., specs, Poetry, etc.) - No version updates needed
  • Ensure new entries are added to api/CHANGELOG.md

Files Changed

New Files

  • api/src/backend/api/parsers/__init__.py
  • api/src/backend/api/parsers/ocsf_parser.py
  • api/src/backend/api/parsers/csv_parser.py
  • api/src/backend/api/parsers/README.md
  • api/src/backend/api/services/__init__.py
  • api/src/backend/api/services/scan_import.py
  • api/src/backend/api/services/README.md
  • api/src/backend/api/tests/test_ocsf_parser.py
  • api/src/backend/api/tests/test_csv_parser.py
  • api/src/backend/api/tests/test_scan_import_service.py
  • api/src/backend/api/tests/test_scan_import_view.py
  • api/src/backend/api/tests/test_scan_import_real_csv.py
  • api/src/backend/api/tests/test_scan_import_real_json.py
  • api/src/backend/api/migrations/0066_scan_imported_trigger.py
  • api/tests/manual/ - Manual test fixtures and scripts
  • ui/actions/scans/import-scan.ts
  • ui/app/api/scans/import/route.ts
  • ui/app/api/scans/import/README.md
  • ui/components/scans/scan-import/index.ts
  • ui/components/scans/scan-import/types.ts
  • ui/components/scans/scan-import/scan-import-dropzone.tsx
  • ui/components/scans/scan-import/scan-import-form.tsx
  • ui/components/scans/scan-import/scan-import-progress.tsx
  • ui/components/scans/scan-import/scan-import-section.tsx
  • ui/tests/scan-import.spec.ts
  • docs/user-guide/tutorials/prowler-app-scan-import.mdx

Modified Files

  • api/src/backend/api/models.py - Added IMPORTED trigger type
  • api/src/backend/api/v1/serializers.py - Added import serializers
  • api/src/backend/api/v1/views.py - Added ScanImportView
  • api/src/backend/api/v1/urls.py - Added import route
  • api/src/backend/api/rls.py - RLS policy updates
  • api/src/backend/config/django/base.py - Config updates
  • api/src/backend/config/guniconf.py - Gunicorn config updates
  • api/CHANGELOG.md - Added changelog entry
  • ui/app/(prowler)/scans/page.tsx - Integrated import section
  • ui/actions/scans/index.ts - Export updates
  • ui/components/scans/index.ts - Export updates
  • ui/components/icons/Icons.tsx - Added upload icon
  • ui/next.config.js - Increased server actions body size limit to 1GB
  • ui/package.json - Dependencies
  • ui/playwright.config.ts - Test config
  • ui/CHANGELOG.md - Added changelog entry
  • docs/docs.json - Added navigation entry

Contributions Review

Code Quality ✅

  • All Python code follows Google-style docstrings
  • Proper error handling with meaningful messages
  • Logging at appropriate levels (INFO, ERROR, DEBUG)
  • No hardcoded credentials or sensitive data
  • Test fixtures use placeholder account IDs (123456789012)

Test Coverage ✅

  • Unit tests: test_ocsf_parser.py, test_csv_parser.py, test_scan_import_service.py
  • API tests: test_scan_import_view.py (auth, permissions, tenant isolation, validation)
  • E2E tests: ui/tests/scan-import.spec.ts (Playwright)
  • Manual tests: api/tests/manual/ (real JSON/CSV, large files, error scenarios)

Documentation ✅

  • User guide: docs/user-guide/tutorials/prowler-app-scan-import.mdx
  • API docs: OpenAPI schema decorators with examples
  • Service README: api/src/backend/api/services/README.md
  • Parsers README: api/src/backend/api/parsers/README.md

Changelog Entries ✅

  • API: api/CHANGELOG.md - New endpoint entry added
  • UI: ui/CHANGELOG.md - Scan import UI entry added

Security Review

  • No hardcoded credentials or secrets
  • No real AWS account IDs or ARNs (test fixtures use placeholder 123456789012)
  • No internal hostnames or endpoints
  • No PII or sensitive business data

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

GitHub Pages Setup and others added 10 commits December 29, 2025 19:42
Implement Phase 1 of the scan results import feature (Tasks 1.1-1.3):

- Add IMPORTED trigger type to Scan.TriggerChoices for imported scans
- Create parsers module with OCSF JSON and CSV format support
- OCSF parser handles Prowler CLI JSON output (OCSF schema)
- CSV parser handles semicolon/comma-delimited Prowler CSV output
- Both parsers include validation, error handling, and provider extraction
- Add comprehensive docstrings to models.py for better documentation
- Include unit tests for both parsers

New files:
- api/src/backend/api/parsers/ (module with OCSF and CSV parsers)
- api/src/backend/api/tests/test_ocsf_parser.py
- api/src/backend/api/tests/test_csv_parser.py
- api/src/backend/api/migrations/0066_scan_imported_trigger.py
- api/docs/models.md
Implement a new service for importing scan results from various formats
(OCSF JSON, CSV, JSON-OCSF) into the Prowler API. The service handles:

- Format detection and validation
- Content parsing with error handling
- Resource UID extraction and deduplication
- Check metadata building from scan data
- Raw result storage

Includes comprehensive unit tests covering:
- Format detection edge cases
- Content parsing and error handling
- Resource UID extraction
- Check metadata building
- Import validation
- Default value handling
Add POST /api/v1/scans/import endpoint for importing Prowler CLI scan
results in JSON/OCSF or CSV format.

Changes:
- ScanImportView with multipart file upload and inline JSON support
- ScanImportSerializer with file/data validation and 50MB size limit
- URL route registration at /scans/import
- Comprehensive test suite covering JSON, CSV, provider handling,
  validation errors, authentication, permissions, and tenant isolation
- Enhanced logging with timing metrics in ScanImportService

Relates to: prowler-cloud#8972
Implement Phase 3 of the scan results import feature (Tasks 3.1-3.8):

- Add ScanImportSection component with collapsible UI and state machine
- Add ScanImportDropzone for drag-and-drop file upload (.json, .csv)
- Add ScanImportForm with provider selection and validation
- Add ScanImportProgress for upload/processing status display
- Add importScan server action with Zod validation and API integration
- Integrate ScanImportSection into scans page with permission check
- Add router.refresh() on successful import to update scan list
- Add new icons: Upload, ChevronUp, File, AlertCircle, CheckCircle, etc.

New files:
- ui/actions/scans/import-scan.ts
- ui/components/scans/scan-import/ (types, dropzone, form, progress, section)

Relates to: prowler-cloud#8972
Add comprehensive documentation for the scan import feature including:
- User guide for importing JSON/OCSF and CSV scan results
- Detailed field mappings for both formats
- API usage examples with curl commands
- Extensive troubleshooting guide covering:
  - Format detection and validation errors
  - Provider resolution issues
  - Authentication and permission errors
  - File size and performance guidance
  - Common error codes reference

Also includes:
- OpenAPI schema examples for request/response in views.py
- API changelog entry for the new endpoint
- Playwright e2e test configuration for scan import
- Navigation entry in docs.json

Relates to: prowler-cloud#8972
- Replace empty string with sentinel value for auto-detect provider option to work with Radix UI Select constraints
- Update scan import form to properly convert sentinel value back to undefined for API calls
- Change "View Imported Scan" link to navigate to findings page with scan filter instead of direct scan view
- Update link text to "View Scan Findings" for better clarity on destination
- Add comprehensive JSDoc documentation to next.config.js with feature descriptions and environment variable reference
- Enhance CSP header configuration with detailed comments explaining each directive and security implications
- Add getSentryReportEndpoint function documentation with usage examples
- Add detailed API configuration documentation covering Django settings, environment variables, and file upload limits
- Add services README documenting scan import service architecture and usage
- Update scan import service with improved error handling and validation
- Enhance scan import tests with additional edge cases and validation scenarios
- Update serializers and views to support improved scan import workflow
- Increase file upload limits to 1GB for handling large enterprise scan imports
- Add scan import API route handler for Next.js frontend integration
- Update scan import UI components with improved error handling and user feedback
- Update gunicorn configuration for optimal performance with large file uploads
- Add API documentation for scan import endpoint and configuration
- Improve finding detail component display and scan import section UX
- Update project README files with scan import feature documentation
- Add changelog entry for scan import UI components
- Create comprehensive PR description following template
- Document all changes for issue prowler-cloud#8972

prowler-cloud#8972
…cumentation

- Reorganize test files from api/src/backend/api/tests to api/tests/manual directory
- Add comprehensive README.md documentation for manual test suite
- Reformat test data generation functions with improved code readability and line length compliance
- Update CSV and JSON test data structures to match actual Prowler CLI output format
- Enhance docstrings with clearer descriptions and parameter documentation
- Improve code formatting for better maintainability and consistency across test files
- Consolidate test scenarios for error handling, large file processing, and real-world data imports
@github-actions github-actions bot added documentation component/ui component/api review-django-migrations This PR contains changes in Django migrations community Opened by the Community has-conflicts The PR has conflicts that needs to be resolved. labels Jan 1, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2026

Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

removing from PR as unneeded
@github-actions github-actions bot removed the has-conflicts The PR has conflicts that needs to be resolved. label Jan 1, 2026
return <SettingsIcon className={iconClass} />;
default:
return (
<Loader2Icon className={cn(iconClass, isActive && "animate-spin")} />

Check warning

Code scanning / CodeQL

Useless conditional Warning

This use of variable 'isActive' always evaluates to false.
@sonofagl1tch sonofagl1tch changed the title [FEATURE] Scan Results Import Feature feat: Scan Results Import Feature Jan 1, 2026
if time_val:
try:
timestamp = datetime.fromtimestamp(float(time_val))
except (ValueError, TypeError):

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.
@sonofagl1tch sonofagl1tch marked this pull request as ready for review January 1, 2026 23:04
@sonofagl1tch sonofagl1tch requested review from a team as code owners January 1, 2026 23:04
@sonofagl1tch
Copy link
Contributor Author

all testing was completed on a macbook pro m2 locally in docker containers. I am going to do additional testing to deploy on aws ecs but until then, I think some additional attention should be paid to reviewing my api code. this is new to me and I wasnt able to test it as much as I think I could.

@jfagoagas
Copy link
Member

Hi @sonofagl1tch thanks for this contribution! We'll review it as soon as we get the chance to.

@jfagoagas jfagoagas added the status/waiting-for-revision Waiting for maintainer's revision label Jan 5, 2026
@jfagoagas
Copy link
Member

Hello @sonofagl1tch, we've been reviewing and discussing the content of this PR internally and there are several things we want to discuss with you. We can jump into a call if you prefer but I'm going to leave a summary of the action points:

  • For features that affects both UI and API we prefer to work with 2 PRs, one for each component. This give us speed as each component responsible can review it and does not block the other. Also, makes the PR more manageable.
  • In this particular case, this feature is something we were about to start working internally. Therefore there are several designs and architecture decisions we'd like to have. We can share our internal RFC with you.

As we don't want to leave the whole RFC in a comment, I'm leaving a summary of the key changes we'd want to discuss with you:

  • Add POST /api/v1/findings/ocsf (ingest) and GET /api/v1/findings/ocsf/{ingestion_id} (status).

  • API accepts JSON or NDJSON, supports gzip/zstd, with a hard 10 MiB limit.

  • On each request:

    1. Generate ingestion_id.
    2. Upload the raw body to S3-compatible or local storage. We need to support both.
    3. Insert an IngestionManifest row (status=STAGED).
    4. Enqueue a small Celery pointer task with {ingestion_id}.
    5. Return 202 Accepted with a link to the status endpoint.
  • Celery workers:

    • Fetch payload from S3-compatible or local storage, decompress, and stream-parse.
    • Group findings by provider.
    • Upsert Providers, create one Scan per provider per ingestion.
    • Upsert Resources and Findings (idempotent via (tenant_id, uid, time_dt)).
    • Mark manifest COMPLETED or FAILED (with retries).
  • S3-compatible or local storage holds payloads temporarily (short TTL via lifecycle rules); Redis only carries pointers.

  • Use https://github.com/prowler-cloud/py-ocsf-models instead of building a custom OCSF parser.

None of the above is written in stone but that's something we're still defining to make a decision. The following are also some alternatives we considered, one is the one you implemented, again to give you a summary of them:

Redis / Celery Only (no local storage):

Push the full OCSF payload directly into Redis as a Celery message.

Why rejected:

  • OCSF batches can be up to 10 MiB, which is far larger than what Redis brokers handle well.
  • Celery explicitly recommends Redis for small messages only; large payloads cause memory pressure, slow ACKs, and broker congestion.
  • A single slow consumer or retry storm can stall the entire queue.

No Celery, No S3 (Process in the API request)

Aspect Pros Cons
Complexity No S3, no Redis, no workers All processing runs in API threads
Latency Immediate success/failure Client blocks until full processing ends
Reliability No async failure modes High risk of timeouts on large payloads
Payload size No S3 storage cost Practically limited to ~1–2 MiB by timeouts

Why rejected

  • Large OCSF payloads would frequently timeout or overload API workers.
  • No buffering or backpressure — spikes in traffic would directly degrade API availability.

Regarding the UI work you did that's something is out of our scope but we can keep it after developing the Findings Ingestion API.

One of our main points is to keep you as contributor of this feature. We'd be pleased to continue working with you if you have the bandwidth enough to work on what we've shared. Take into account that the above is just a summary, there are more aspects to take into account to continue the development.

Next time, as you did in other contributions we prefer to have a conversation first using the issue/feature-request.

CC: @Alan-TheGentleman @StylusFrost

@sonofagl1tch
Copy link
Contributor Author

Hello @sonofagl1tch, we've been reviewing and discussing the content of this PR internally and there are several things we want to discuss with you. We can jump into a call if you prefer but I'm going to leave a summary of the action points:

  • For features that affects both UI and API we prefer to work with 2 PRs, one for each component. This give us speed as each component responsible can review it and does not block the other. Also, makes the PR more manageable.
  • In this particular case, this feature is something we were about to start working internally. Therefore there are several designs and architecture decisions we'd like to have. We can share our internal RFC with you.

As we don't want to leave the whole RFC in a comment, I'm leaving a summary of the key changes we'd want to discuss with you:

  • Add POST /api/v1/findings/ocsf (ingest) and GET /api/v1/findings/ocsf/{ingestion_id} (status).

  • API accepts JSON or NDJSON, supports gzip/zstd, with a hard 10 MiB limit.

  • On each request:

    1. Generate ingestion_id.
    2. Upload the raw body to S3-compatible or local storage. We need to support both.
    3. Insert an IngestionManifest row (status=STAGED).
    4. Enqueue a small Celery pointer task with {ingestion_id}.
    5. Return 202 Accepted with a link to the status endpoint.
  • Celery workers:

    • Fetch payload from S3-compatible or local storage, decompress, and stream-parse.
    • Group findings by provider.
    • Upsert Providers, create one Scan per provider per ingestion.
    • Upsert Resources and Findings (idempotent via (tenant_id, uid, time_dt)).
    • Mark manifest COMPLETED or FAILED (with retries).
  • S3-compatible or local storage holds payloads temporarily (short TTL via lifecycle rules); Redis only carries pointers.

  • Use https://github.com/prowler-cloud/py-ocsf-models instead of building a custom OCSF parser.

None of the above is written in stone but that's something we're still defining to make a decision. The following are also some alternatives we considered, one is the one you implemented, again to give you a summary of them:

Redis / Celery Only (no local storage):

Push the full OCSF payload directly into Redis as a Celery message.

Why rejected:

  • OCSF batches can be up to 10 MiB, which is far larger than what Redis brokers handle well.
  • Celery explicitly recommends Redis for small messages only; large payloads cause memory pressure, slow ACKs, and broker congestion.
  • A single slow consumer or retry storm can stall the entire queue.

No Celery, No S3 (Process in the API request)

Aspect Pros Cons
Complexity No S3, no Redis, no workers All processing runs in API threads
Latency Immediate success/failure Client blocks until full processing ends
Reliability No async failure modes High risk of timeouts on large payloads
Payload size No S3 storage cost Practically limited to ~1–2 MiB by timeouts
Why rejected

  • Large OCSF payloads would frequently timeout or overload API workers.
  • No buffering or backpressure — spikes in traffic would directly degrade API availability.

Regarding the UI work you did that's something is out of our scope but we can keep it after developing the Findings Ingestion API.

One of our main points is to keep you as contributor of this feature. We'd be pleased to continue working with you if you have the bandwidth enough to work on what we've shared. Take into account that the above is just a summary, there are more aspects to take into account to continue the development.

Next time, as you did in other contributions we prefer to have a conversation first using the issue/feature-request.

CC: @Alan-TheGentleman @StylusFrost

Hey folks! I fully understand the feedback. This was a large feature submission, and I'm still learning your preferred best practices. I'm happy to continue collaborating with you on this feature.

Suggested path forward:

  1. Prowler team provides guidance on how to break out the current PR into separate PRs (I suggest keeping this PR open for the UI portion and creating new PRs for the other components you mentioned)
  2. I reorganize everything as requested, open new related PRs, clean up this PR, and update the documentation to reflect the changes
  3. We collaborate on each of the new PRs with PR-specific feedback

How does this plan sound? I'll take it as a yes if you complete step 1 and post it to this PR, and I'll then start on step 2.

cheers,
@sonofagl1tch

@jfagoagas
Copy link
Member

Hey folks! I fully understand the feedback. This was a large feature submission, and I'm still learning your preferred best practices. I'm happy to continue collaborating with you on this feature.

Suggested path forward:

  1. Prowler team provides guidance on how to break out the current PR into separate PRs (I suggest keeping this PR open for the UI portion and creating new PRs for the other components you mentioned)
  2. I reorganize everything as requested, open new related PRs, clean up this PR, and update the documentation to reflect the changes
  3. We collaborate on each of the new PRs with PR-specific feedback

How does this plan sound? I'll take it as a yes if you complete step 1 and post it to this PR, and I'll then start on step 2.

cheers, @sonofagl1tch

That's great Ryan! Let's talk next week about this. Have a great weekend!

@jfagoagas
Copy link
Member

Hi @sonofagl1tch we're going to require more time until we send you a plan to continue the development. We need to finish the RFC and prepare the work items.

By the way, we've incorporate some AI skills that will help a lot during development. You can found them in the /skills folder.

Skills provide domain-specific patterns, conventions, and guardrails that help AI coding assistants (Claude Code, OpenCode, Cursor, etc.) understand project-specific requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Opened by the Community component/api component/ui documentation review-django-migrations This PR contains changes in Django migrations status/waiting-for-revision Waiting for maintainer's revision

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prowler CLI results importable to prowler UI

2 participants