Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

furlat · 2023-07-24T19:24:21Z

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation

Issue Description:

I am proposing a new feature that allows users to load and edit their conversations with GPT, with the goal of exporting them as a structured dataset. This feature would be immensely helpful for users who have a large amount of interaction data (~10M tokens) and want to organize it into a dataset, potentially even aiming towards a 1B tokens dataset, for future training purposes.

The feature should ideally enable:

Conversation Import and Structuring: Ability to load all user-GPT conversations and organize them in a structured format, like a thread, where each frame corresponds to one conversation.
Message Editing: Provide functionality to edit or cancel certain messages within the conversation. This could include the ability to modify the content, or completely remove certain interactions.
Text Masking through Regex/Substitution: Allow the use of regular expressions or text substitution to mask sensitive information, such as the user's name or any other identifiable information. This would be particularly useful for users who want to anonymize their conversations before exporting or sharing.
Text Tagging: Offer an option to select a subset of text within a message and assign them specific tags. This could be useful for further classification and analysis of the conversation data.

These capabilities would make it significantly easier to extract meaningful information from the conversations, anonymize the data, and transform it into a structured format, ready for any subsequent data analysis or machine learning task.

This is a proposal for an enhancement to our current system, and any thoughts, suggestions or improvements are highly welcome.

joeleonard212 · 2023-07-24T20:07:22Z

I'm gonna use the new function in the BaseThread in order to do this. User inserts the chatgpt link and it will open a new conversation tab in the UI with the same history. So you can't just load all the conversation at once, unless you provide the link for each conversation, one by one.
no problem
no problem
I can do that but I'm not sure if you want to do that directly in the chat or in a memoryframe ? How should we visualize the frame tho, right now ?

furlat · 2023-07-24T20:15:43Z

to start ok, I would prefer we can load all the conversations from the backup zip, that I would drag and drop
For now a visualization similar to a single column of the text/pytorch dual column repo would be nice, the best ui to do the tagging I don't know to be fair, we can store the tags in the frame itself

furlat added the enhancement New feature or request label Jul 24, 2023

furlat assigned joeleonard212 Jul 24, 2023

furlat unassigned joeleonard212 Sep 15, 2023

joeleonard212 self-assigned this Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

furlat commented Jul 24, 2023

joeleonard212 commented Jul 24, 2023 •

edited

Loading

furlat commented Jul 24, 2023

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

Comments

furlat commented Jul 24, 2023

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation

joeleonard212 commented Jul 24, 2023 • edited Loading

furlat commented Jul 24, 2023

joeleonard212 commented Jul 24, 2023 •

edited

Loading