Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] HumanOps - annotation #617

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Conversation

cyruszhang
Copy link
Collaborator

This feature is to add human annotation support for DJ. Included features:

  • boilerplate code for supporting label studio powered human annotation ops
  • a human preference annotation reference implementation is provided
  • label studio service script; can start up local instance using docker or pip, whichever is available
  • reference configs and data
  • event driven and notification mixins framework for ops

still working on

  • more efficient interaction with label studio SDK
  • better logging with label studio localhost service
  • notification details

@cyruszhang
Copy link
Collaborator Author

ongoing work

  • more efficient interaction with label studio SDK
  • better logging with label studio localhost service
  • notification details
  • test case not working properly yet
  • documentations

class HumanPreferenceAnnotationMapper(LabelStudioAnnotationMapper):
"""Operator for human preference annotation using Label Studio."""

DEFAULT_LABEL_CONFIG = """
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xml in code. could be a separate file if needed. supported via label_config_file param

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend make this part binding to XxAnnotationMapper, to enable the feasibility of automatic adaptation in a ``MCP'' style (e.g., some agents that learn to route and modify the label_cfg given the HumanOP docstring)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? A sample would be great

@cyruszhang
Copy link
Collaborator Author

getting below message:
2025-03-14 12:32:51 | WARNING | data_juicer.core.tracer:63 - Datasets before and after op [human_preference_annotation_mapper] are all the same. Thus no comparison results would be generated.

tracer only checks text_key field. we are actually adding more data. what is the best resolution? should we pack all the data into text_key as json dump? @HYLcool @yxdyc

class HumanPreferenceAnnotationMapper(LabelStudioAnnotationMapper):
"""Operator for human preference annotation using Label Studio."""

DEFAULT_LABEL_CONFIG = """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend make this part binding to XxAnnotationMapper, to enable the feasibility of automatic adaptation in a ``MCP'' style (e.g., some agents that learn to route and modify the label_cfg given the HumanOP docstring)

@yxdyc
Copy link
Collaborator

yxdyc commented Mar 19, 2025

getting below message: 2025-03-14 12:32:51 | WARNING | data_juicer.core.tracer:63 - Datasets before and after op [human_preference_annotation_mapper] are all the same. Thus no comparison results would be generated.

tracer only checks text_key field. we are actually adding more data. what is the best resolution? should we pack all the data into text_key as json dump? @HYLcool @yxdyc

After discussion, we agree to packing all the data into text_key as json dump, and will make a quick PR to support this feature @HYLcool

Copy link
Collaborator

@yxdyc yxdyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current content LGTM. I'm trying to locally run/dev this feature, and incorporate it into our new project related tohumanOp_workflow & experience_manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants