Skip to content

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation #11

@furlat

Description

@furlat

Feature Request - Conversation Import, Editing, and Export Capabilities for Dataset Creation

Issue Description:

I am proposing a new feature that allows users to load and edit their conversations with GPT, with the goal of exporting them as a structured dataset. This feature would be immensely helpful for users who have a large amount of interaction data (~10M tokens) and want to organize it into a dataset, potentially even aiming towards a 1B tokens dataset, for future training purposes.

The feature should ideally enable:

  1. Conversation Import and Structuring: Ability to load all user-GPT conversations and organize them in a structured format, like a thread, where each frame corresponds to one conversation.

  2. Message Editing: Provide functionality to edit or cancel certain messages within the conversation. This could include the ability to modify the content, or completely remove certain interactions.

  3. Text Masking through Regex/Substitution: Allow the use of regular expressions or text substitution to mask sensitive information, such as the user's name or any other identifiable information. This would be particularly useful for users who want to anonymize their conversations before exporting or sharing.

  4. Text Tagging: Offer an option to select a subset of text within a message and assign them specific tags. This could be useful for further classification and analysis of the conversation data.

These capabilities would make it significantly easier to extract meaningful information from the conversations, anonymize the data, and transform it into a structured format, ready for any subsequent data analysis or machine learning task.

This is a proposal for an enhancement to our current system, and any thoughts, suggestions or improvements are highly welcome.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions