Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

Open
furlat opened this issue Jul 24, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@furlat
Copy link
Contributor

furlat commented Jul 24, 2023

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation

Issue Description:

I am proposing a new feature that allows users to load and edit their conversations with GPT, with the goal of exporting them as a structured dataset. This feature would be immensely helpful for users who have a large amount of interaction data (~10M tokens) and want to organize it into a dataset, potentially even aiming towards a 1B tokens dataset, for future training purposes.

The feature should ideally enable:

  1. Conversation Import and Structuring: Ability to load all user-GPT conversations and organize them in a structured format, like a thread, where each frame corresponds to one conversation.

  2. Message Editing: Provide functionality to edit or cancel certain messages within the conversation. This could include the ability to modify the content, or completely remove certain interactions.

  3. Text Masking through Regex/Substitution: Allow the use of regular expressions or text substitution to mask sensitive information, such as the user's name or any other identifiable information. This would be particularly useful for users who want to anonymize their conversations before exporting or sharing.

  4. Text Tagging: Offer an option to select a subset of text within a message and assign them specific tags. This could be useful for further classification and analysis of the conversation data.

These capabilities would make it significantly easier to extract meaningful information from the conversations, anonymize the data, and transform it into a structured format, ready for any subsequent data analysis or machine learning task.

This is a proposal for an enhancement to our current system, and any thoughts, suggestions or improvements are highly welcome.

@furlat furlat added the enhancement New feature or request label Jul 24, 2023
@joeleonard212
Copy link
Contributor

joeleonard212 commented Jul 24, 2023

  1. I'm gonna use the new function in the BaseThread in order to do this. User inserts the chatgpt link and it will open a new conversation tab in the UI with the same history. So you can't just load all the conversation at once, unless you provide the link for each conversation, one by one.

  2. no problem

  3. no problem

  4. I can do that but I'm not sure if you want to do that directly in the chat or in a memoryframe ? How should we visualize the frame tho, right now ?

@furlat
Copy link
Contributor Author

furlat commented Jul 24, 2023

  1. to start ok, I would prefer we can load all the conversations from the backup zip, that I would drag and drop
  2. For now a visualization similar to a single column of the text/pytorch dual column repo would be nice, the best ui to do the tagging I don't know to be fair, we can store the tags in the frame itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants