Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dataset support to be updated using distribution settings #5028

Conversation

jfcalvo
Copy link
Member

@jfcalvo jfcalvo commented Jun 14, 2024

Description

This PR add changes to support update dataset distribution settings. Allowing for example to update min_submitted attribute when overlap distribution strategy is in use.

Closes #5010

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)
  • Documentation update

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • Adding new tests.

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

Copy link

codecov bot commented Jun 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.99%. Comparing base (97dc916) to head (c8aa1a9).

Current head c8aa1a9 differs from pull request most recent head 58c8257

Please upload reports for the commit 58c8257 to get more accurate results.

Additional details and impacted files
@@                            Coverage Diff                             @@
##           feat/create-datasets-with-distribution    #5028      +/-   ##
==========================================================================
+ Coverage                                   91.97%   91.99%   +0.01%     
==========================================================================
  Files                                         136      136              
  Lines                                        5847     5859      +12     
==========================================================================
+ Hits                                         5378     5390      +12     
  Misses                                        469      469              
Flag Coverage Δ
argilla-server 91.99% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


class DatasetUpdateValidator:
@classmethod
async def validate(cls, db: AsyncSession, dataset: Dataset, dataset_attrs: dict) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if I don't need the class method to be asynchronous, and not needing the session I prefer to have the same signature in all of them.

…se model (#5118)

# Description

This PR include the following changes:
* Added `count_submitted_responses` as a property of `Record` database
model.
  * This property requires record responses to be pre-loaded. 
  * This value is get from the database using a subquery.
* Added `count_submitted_responses` to search engine mapping.
* Record `status` is exposed by API schemas and is calculated based in
`count_submitted_responses` column property from `Record` database
model.
* This `status` is defined as a property inside `Record` database model
and it's using the `dataset` distribution strategy to calculate the
value.

## Missing changes in this PR
- [ ] Make test suite to pass after changes.
- [ ] Add support to `status` value in search endpoints so we can filter
by `status=pending&response_status=pending`.
- [ ] Check that we are refreshing the record
`count_submitted_responses` values before indexing the record and add a
partial update into the search engine when some associated entity (like
responses) are create/updated/deleted for a record. (We probably should
add a partial update of the index for this record attribute).
- [ ] Change dataset progress metrics.
- [ ] Change user metrics.

Refs #5069

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)
- [ ] Documentation update

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [ ] Test A
- [ ] Test B

**Checklist**

- [ ] I added relevant documentation
- [ ] follows the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Paco Aranda <[email protected]>
@jfcalvo jfcalvo merged commit 91eb6b1 into feat/create-datasets-with-distribution Jul 1, 2024
8 checks passed
@jfcalvo jfcalvo deleted the feat/update-datasets-distribution branch July 1, 2024 10:22
Copy link

github-actions bot commented Jul 1, 2024

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-5028-ki24f765kq-no.a.run.app

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TASK] Allow datasets to be updated with specific distribution settings
2 participants