Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Local database in the project's folder #991

Open
Michal-Mikolas opened this issue Jun 3, 2024 · 2 comments
Open

[Enhancement]: Local database in the project's folder #991

Michal-Mikolas opened this issue Jun 3, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@Michal-Mikolas
Copy link

Michal-Mikolas commented Jun 3, 2024

Version

Command-line (Python) version

Suggestion

Hello,
currently GPT-Pilot has some drawbacks due to it's database design:

  1. There is no way how user can share his project with other programmers so they can also build new features into the project using GPT-Pilot.
  2. There is no easy way how to use version control systems (like Git) over the whole project (including GPT-Pilot's state). I can't use git branches, switch between them on the fly and expect GPT-Pilot will work correctly with it's current design.
  3. To iterate over existing project, user needs to remember or retreive the project's ID. It's inconvenient and not user friendly.

I can see one solution to all of the issues above:

  • There should be separate GPT-Pilot database, for every project it's own database. The database should be stored in the project's directory, for example in .gpt-pilot sub-folder.
  • The database could be JSON-type database - you can use python library like pysonDB for that.

This way I can see only pros for the GPT-Pilot users:

  • Easy to use: When user wants to iterate over existing project, he would just specify the project's folder like main.py projects/myproject - that's it.
  • Version control system: User can store several project's phases using git, go back to them, use branches. GPT-Pilot would start to have full compatility with Git.
  • Transparency: It would be easy for the user to look what's inside the GPT-Pilot's database, users can even correct their previous mistakes in the prompt using nothing but code editor. It would be easy to build tools on top of GPT-Pilot, e.g. to integrate manualy created files into pilot's database, modify project plan etc.
  • Cooperation: The user could share GPT-Pilot's project with his colleagues using git and they could build new features using GPT-Pilot as well and then send them back to the original repository. Because GPT-Pilot's database would be text-based (JSON files), git could easily manage changes, merge branches even in the pilot's database.

Overall, I think separation of pilot's database would bring only good for it's users. Every project should have all the info needed in it's folder.

@Michal-Mikolas Michal-Mikolas added the enhancement New feature or request label Jun 3, 2024
@senko
Copy link
Collaborator

senko commented Jun 4, 2024

Hey @Michal-Mikolas thanks for the thoughtful suggestions!

As you may know, we're using file-based SQLite database in GPT Pilot. Besides not requiring the user to set up a database server such as PostgreSQL, it also has a convenience that you can just copy it as you would with a regular file.

Database sharing

So you can already share your project with others just by sending them the file. In gpt-pilot 0.1.x this was a bit messy because we stored files' absolute paths on your disk, and used system-dependent path separator (ie \\ on Windows, /) elsewhere, which meant there was some patching required to make it work (see AUTOFIX_FILE_PATHS env option for 0.1.x). In the recently-released 0.2.x version, we store file paths relative to project root and always use / for path separators meaning you can literally just copy-paste the database and it'll work.

Direct editing

Although at first glance JSON might be nicer to edit with an editor, it wouldn't really work, due to the amount of data we store in the database (a lot!).

It would be really messy and not at all easy to edit. In fact, using any database tool (and since SQLite is popular there are many) is probably much easier to edit the DB than using a text editor and editing the JSON manually. Personally I use sqlitebrowser for GUI and plain o'l sqlite3 in command line.

Per-project database

With gpt-pilot 0.2 that's possible by having per-project config file config.json. For example, if your project is in /Users/Michal/Projects/workspace/my-project, you could add a my-project.json file to workspace folder with the contents (I've ommitted all entries not relevant to to file/db locations):

{
  "db": {
    "url": "sqlite+aiosqlite:////Users/Michal/Projects/workspace/my-project.db"
  },
  "fs": {
    "type": "local",
    "workspace_root": "/Users/Michal/Projects/workspace"
  }
}

Then, you could run Pythagora like: python main.py -f /Users/Michal/Projects/workspace/my-project.json to use that database.

You'd still need to specify the project ID, and having to specify the database location all the time would be tedios, but you could work around that using a simple shell alias.

Git

Could you explain more how you'd use Git for Pythagora projects in general? When (at which points) would you make commits, how would collaboration work (if sharing with others)?

In my mind, I'd use it only for the project files (eg. everything in workspace/my-project in the above example), not the database itself. I don't think merging the databases (eg between multiple collaborators) would work in practice, there'd be conflicts which would be hard to manually resolve (think some curly braces mismatching a few thousand lines apart).

Branching

In 0.2 we've prepared the groundwork for project branching. Currently we only ever use the "main" branch and you can't create others, but we do want to support having multiple branches and switching easily between them. It's still on our TODO list for now, tho.

Tooling

In 0.2 we've made it possible to import the core as a package. This makes it possible to write scripts that do something extra on top of what gpt-pilot already does. We're not providing any API guarantees (we treat all of it as internal API) but realistically don't plan to change it a lot, especially the database models and low-level utilities.

For example, you could make a script that would tweak how configuration is set and which project is loaded to make per-project database/config simpler than the manual workarounds I mentioned earlier.

@Michal-Mikolas
Copy link
Author

Wow, this sounds very nice. I've tried several AI code assistants and GPT-Pilot is far the best one. I'm glad it's heading the right direction from what you've written :-)

Could you explain more how you'd use Git for Pythagora projects in general?

Basically feature-branches. Let's say I am working on some feature in feature1 branch. My colleague is working on another feature in feature2 branch. After the features are done and after we code-review each other's branches, we want to merge the work into main branch. In this point I am not sure if I'm not causing some GPT-Pilot confusion. Because by merging branches a lot of files in the project can add or change, but these changes are not reflected in GPT-Pilot's database. So I'm afraid GPT-Pilot can break and revert some files, bring bugs which were already solved earlier etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants