Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Google Drive ftest use shared fixture #1719

Merged
merged 5 commits into from
Oct 13, 2023

Conversation

artem-shelkovnikov
Copy link
Member

@artem-shelkovnikov artem-shelkovnikov commented Oct 3, 2023

Part of https://github.com/elastic/enterprise-search-team/issues/3397

This PR changes the fixture for the respected content source to make use of shared test setup using WeightedFakeProvider class.

This class takes care of generating large fake data with certain distribution, for example:

# In 65% cases generate small files
# In 20% cases generate medium size files
# In 10% cases generate large files
# in 5% cases generate huge files
fake_provider = WeightedFakeProvider(weights=[0.65, 0.2, 0.1, 0.05])

fake_provider.get_html() # <---- gets HTML of size depending on distribution
fake_provider.get_text() # <---- gets text of size depending on distribution

# Important difference
# get_text() returns huge amount of text that can be ingested, for example 2MB payload
# get_html() returns a huge payload that has too little text, for example for 2MB html it's around 1KB of text to text download/subextraction of rich content better

The final goal of the PR is to have more comparable benchmarks in our nightly functional tests of the amount of memory or cpu that is needed to run the connector.

Checklists

Pre-Review Checklist

  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Tested the changes locally

Related Pull Requests

@artem-shelkovnikov artem-shelkovnikov enabled auto-merge (squash) October 13, 2023 15:33
@artem-shelkovnikov artem-shelkovnikov merged commit d3ebaa0 into main Oct 13, 2023
@artem-shelkovnikov artem-shelkovnikov deleted the artem/make-gdrive-use-shared-fixture branch October 13, 2023 15:44
@github-actions
Copy link

💚 Backport PR(s) successfully created

Status Branch Result
8.11 #1794

This backport PR will be merged automatically after passing CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants