-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand extraction service to more connectors #3 #1694
Expand extraction service to more connectors #3 #1694
Conversation
9cca6a3
to
439efa2
Compare
# gcs has a unique download method so we can't utilize | ||
# the generic download_and_extract_file func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful comment 👍
Is there anything we could generify, so that we don't have usages of create_temp_file
in multiple places? Like would it help if the base function could take a proc as an optional arg or something? Not for this PR, but something we can think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue with calling create_temp_file
here?
I think we can look into generifying some of the non-standard downloads. Mostly the issue is they pipe directly to a file, but my generic download func doesn't support that. I felt strapped for time to make two different versions of it so for now I've not generified downloads that pipe directly to files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with calling create_temp_file
is that every time it's called, there's a chance that the author does it in such a way that the temp file won't be cleaned up. We've had this issue before, where like 8/10 connectors were cleaning up their tempfiles appropriately, but due to copy-paste errors, occasionally some wouldn't. These types of bugs can be hard to catch, and its easier to keep them from propagating if you just don't have numerous usages of the risky code.
But again, not necessary to solve right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanstory I was also concerned about that. It may alleviate your concerns but I think I have this covered with the way tempfiles are being created now. If a connector uses create_temp_file
it will clean itself up after everything is done, including deleting the file and outputting an error if the file deletion failed.
The code in question: https://github.com/elastic/connectors-python/blob/b9fe37744bd9724b3b4b82104f0c124d70bf3b02/connectors/source.py#L771-L783
Of course we should properly check to see if this is actually the case.
Related to https://github.com/elastic/enterprise-search-team/issues/5857
Checklists
Pre-Review Checklist
v7.13.2
,v7.14.0
,v8.0.0
)Related Pull Requests