Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Unify the FileSourceStage, MultiFileSource and DirectoryWatcher functionality #976

Open
2 tasks done
Tracked by #1133
mdemoret-nv opened this issue Jun 8, 2023 · 0 comments · May be fixed by #1184
Open
2 tasks done
Tracked by #1133

[FEA]: Unify the FileSourceStage, MultiFileSource and DirectoryWatcher functionality #976

mdemoret-nv opened this issue Jun 8, 2023 · 0 comments · May be fixed by #1184
Assignees
Labels
feature request New feature or request

Comments

@mdemoret-nv
Copy link
Contributor

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

Currently, there are 2 stages and a utility class which can read files and push them into the pipeline: MultiFileSource, FileSourceStage, and DirectoryWatcher. All 3 are very similar but have slightly different features. Having very similar, but slightly different functionality can be confusing and makes it difficult to use functionality in 2 stages at the same time (i.e. DirectoryWatcher with multiple search patterns)

Describe your ideal solution

This should combine the features of all 3 into a single stage to make it easier for users. Instead of needing to decide which stage to use based on the features a user wants, there will be 1 stage with the capability of all 3 and options to configure the functionality.

For example, the FileSourceStage should be able to support the following:

  • FileSource(watch=True, files=["my_directory/*.json"])
    • Enable watching for new files that match the glob pattern (i.e. the directory watcher stage)
  • FileSource(watch=True, files=["s3://my_bucket/my_directory/*.json"])
    • Enable watching an s3 bucket (combining the directory watcher and multi-file source)
  • FileSource(files=["local_directory1/*.json", "local_directory2/*.json"])
    • Using multiple globs for (using the multi-file source functionality)

The end goal is a single stage which has has the capability of all 3.

Describe any alternatives you have considered

No response

Additional context

This is a follow on issue that will help #975

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Status: Blocked
4 participants