Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace lakefs_client with lakefs-sdk in Spark integration test harness #8403

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mahdikhashan
Copy link

@mahdikhashan mahdikhashan commented Nov 29, 2024

Closes #8391

Change Description

Background

Replace lakefs_client (deprecated python client) with the lakefs-sdk (1.43.0).

Bug Fix

Since in the Spark Integration tests the old deprecated client was used, this PR introduces the new python client (LakeFS_sdk).

New Feature

No new feature was added.

Testing Details

Will be added later, right now I'm struggling with running the docker-compose which contains the testing environment on my local machine.

Breaking Change?

No Breaking changes.

Additional info

Logs, outputs, screenshots of changes if applicable (CLI / GUI changes)

Contact Details

You can contact me here or by myemail: [email protected]

@mahdikhashan mahdikhashan marked this pull request as ready for review November 29, 2024 13:52
@arielshaqed arielshaqed changed the title replace lakefs_client with lakefs-sdk replace lakefs_client with lakefs-sdk in Spark integration test harness Nov 30, 2024
@arielshaqed arielshaqed self-requested a review November 30, 2024 17:44
@arielshaqed arielshaqed added area/testing Improvements or additions to tests area/system-tests Improvements or additions to the system-tests area/client/spark area/sdk/python tech-debt exclude-changelog PR description should not be included in next release changelog labels Nov 30, 2024
@arielshaqed
Copy link
Contributor

Hi @mahdikhashan ,
Thanks for your contribution! This is very much a needed change. I would like to explain why it may take a while to run ci on this.
There were security risks on GitHub with running CI on PRs that did but originate from the project group. As a result, it doesn't run automatically. I will make an effort to get it to run promptly on your changes.
I suspect we may need additional changes to get the Spark client integration test to run; I'll let you know. In the meantime I'm not requesting changes but also but approving; sorry for keeping you hanging. We should make progress on this PR by Monday.

@arielshaqed
Copy link
Contributor

Running CI on #8405; will push that but easier to run CI there.

@arielshaqed
Copy link
Contributor

Okay, I can run Esti (system tests, including the Spark test affected by this PR). First failure is due to using repositories not repositories_api on the lakeFS Client object.

A guide to running system tests locally is here. This is roughly what you'd need in order to run the Spark test locally - but unfortunately not exactly.

@mahdikhashan
Copy link
Author

Okay, I can run Esti (system tests, including the Spark test affected by this PR). First failure is due to using repositories not repositories_api on the lakeFS Client object.

A guide to running system tests locally is here. This is roughly what you'd need in order to run the Spark test locally - but unfortunately not exactly.

thanks @arielshaqed, I'll setup the local test env and fix the failing ones asap.

@arielshaqed
Copy link
Contributor

[...]

thanks @arielshaqed, I'll setup the local test env and fix the failing ones asap.

Hi @mahdikhashan ,
Worth mentioning - I believe I am not currently blocking you here. If I am, please do let me know!

@mahdikhashan
Copy link
Author

mahdikhashan commented Dec 5, 2024

[...]

thanks @arielshaqed, I'll setup the local test env and fix the failing ones asap.

Hi @mahdikhashan , Worth mentioning - I believe I am not currently blocking you here. If I am, please do let me know!

Thanks for remaining me of this, I got busy with university coursework. I'll dedicate some time on it today (end of the day, vienna time) and move it forward - hopefully I deliver my changes by tomorrow.

@mahdikhashan
Copy link
Author

hey @arielshaqed, as you mentioned above, the doc to run the spark tests are poorly written. I have a few questions now:

to run the spark tests, I need to have working version of local development setup if lakefs with spark, its worker and postgress in a docker env.

I have tried two docker-compose I found in the repo,

  1. docker-compose at test/spark path

for this one I receive:

2024-12-06T12:42:27.527090888Z time="2024-12-06T12:42:27Z" level=fatal msg="Invalid config" func=cmd/lakefs/cmd.initConfig file="cmd/root.go:156" error="bad configuration: missing required keys: [blockstore.type]" fields.file=/home/lakefs/.lakefs.yaml file="cmd/root.go:156" phase=startup

which the config is:

credentials:
  access_key_id: AKIAIOSFODNN7EXAMPLE
  secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
server:
  endpoint_url: http://lakefs:8000

could you help me figure out why this config is invalid?

I also tried the following config without any success:

database:
  type: "local"
  local:
    path: /tmp/lakefs/metadata
auth:
  encrypt:
    secret_key: "some random value that should be replaced for production environments"

blockstore:
  type: local
  local:
    path: /tmp/lakefs/data

logging:
  format: text
  level: trace
  1. docker-compose at etsi/ops

for this one I receive:

✘ lakefs Error failed to resolve reference "docker.io/treeverse/lakefs:dev": docker.io/treeverse/lakefs:dev: not found 1.1s
✘ esti Error context canceled

it seems that in set_env_vars.sh, I may need to define some vars, in this case would you be happy to share if any exists?

looking forward to hearing from you

@mahdikhashan
Copy link
Author

mahdikhashan commented Dec 6, 2024

otherwise, i fixed this attribute issue: AttributeError: 'LakeFSClient' object has no attribute 'repositories'. Did you mean: 'repositories_api'?

@arielshaqed
Copy link
Contributor

Hi @mahdikhashan ,
This line:

error="bad configuration: missing required keys: [blockstore.type]"
indicates that your configuration is missing a blockstore type. Indeed, you need to configure a blockstore there.

Your second issue is easier to fix. You need to build an appropriate docker image, since you are in dev. Can you try to make build-docker? That will build a local Docker image that Esti can use.

@mahdikhashan
Copy link
Author

mahdikhashan commented Dec 11, 2024

Hi @mahdikhashan , This line:

error="bad configuration: missing required keys: [blockstore.type]"
indicates that your configuration is missing a blockstore type. Indeed, you need to configure a blockstore there.

Your second issue is easier to fix. You need to build an appropriate docker image, since you are in dev. Can you try to make build-docker? That will build a local Docker image that Esti can use.

ok, thanks. I'll follow up from here. will update you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client/spark area/sdk/python area/system-tests Improvements or additions to the system-tests area/testing Improvements or additions to tests exclude-changelog PR description should not be included in next release changelog tech-debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use an updated lakeFS Python client in Spark integration tests
2 participants