Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] corpus generator tool enhancements #141

Open
2 of 7 tasks
shmsr opened this issue Apr 30, 2024 · 2 comments
Open
2 of 7 tasks

[Meta] corpus generator tool enhancements #141

shmsr opened this issue Apr 30, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@shmsr
Copy link
Member

shmsr commented Apr 30, 2024

Introduction

The elastic-integration-corpus-generator-tool is a valuable resource for generating test corpus for testing & benchmarking. To improve its usability and functionality, we have identified several areas for enhancement in this issue.

Notes

  • Enum Support:

    • Currently, enum is only supported for the keyword type.
    • Extend enum support to other relevant types, such as the long type.
    • Use case: http-response-status-code of long type could benefit from enum support.
    • Implement support for weighted enum as proposed in elastic/elastic-integration-corpus-generator-tool#130.
  • Configuration Simplification:

    • Generating Schema B currently requires multiple config files.
    • Simplify the configuration by adding a types section to config.yml.
    • Allow either the types section or fields.yml, but not both. Raise a validation error if both are present to avoid ambiguity.
    • Update the package-spec to make fields.yml optional.
    • The goal is to provide a minimal configuration to start the corpus generator.
  • Counter Type Enhancements:

  • Formatting Pattern Feature:

    • There have been requests to add helper methods like path and hostnames, but adding individual helpers is not sustainable.
    • Introduce a formatting_pattern feature to provide a more flexible and extensible solution.
    • See the discussion in elastic/integrations#8781 (comment) for more context.
  • Clarification Needed:

    • Seek clarification on the discussion in [Nginx] Add benchmark for nginx access logs integrations#8781 (comment)
    • Seek clarification on the discussion in [Nginx] Add benchmark for nginx access logs integrations#8781 (comment)
      • I understand the point to remove these variables and instead they should be generated directly just by saying $agentName for example. For example if there is again a field that user defines named agentName with type long then that would be a collision as we have reserved the agentName as ECS field. Partly saying, we need to mention that for ECS fields there is a reserved naming convention for example host.id -> hostId which should not be generated again for some other field generation. Hope this is only the point discussed here.
  • Return better error messages

Related PRs and Issues:

Notes are extracted from GitHub issues and the related PR linked. From the notes itself, some tasks have been created to get started with the enhancements for the tool. Also, there few more issues in the PR linked from which we have to extract notes and tasks.

Tasks

CC: @tommyers-elastic @ruflin

@shmsr shmsr added the enhancement New feature or request label Apr 30, 2024
@shmsr shmsr transferred this issue from elastic/integrations Apr 30, 2024
@shmsr shmsr changed the title Meta: Key updates planned to the corpus generator tool Roadmap: Corpus Generator Tool Enhancements Apr 30, 2024
@ali786XI
Copy link
Contributor

@shmsr
Copy link
Member Author

shmsr commented Apr 30, 2024

@shmsr shmsr self-assigned this Apr 30, 2024
@lalit-satapathy lalit-satapathy changed the title Roadmap: Corpus Generator Tool Enhancements [Meta]corpus generator tool enhancements Apr 30, 2024
@shmsr shmsr changed the title [Meta]corpus generator tool enhancements [Meta] corpus generator tool enhancements May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants