Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre merge test #1

Open
wants to merge 46 commits into
base: nlu
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
0abb194
add paper reference (#273)
moutozf Jun 24, 2024
91fe094
add paper reference to head (#274)
moutozf Jun 28, 2024
e5b92a9
add missing pyproject.toml
SonglinLyu Jul 15, 2024
a48fd2b
create dbgpt-hub-graph
SonglinLyu Jul 23, 2024
c392a09
add a prototype of query similarity evaluator
SonglinLyu Aug 13, 2024
1db6ffb
a demo for grammar parser generated from .g4 file
SonglinLyu Aug 14, 2024
bbd46eb
add lcypher and gql eavluator
SonglinLyu Aug 16, 2024
c4e5444
remove unnecessary data file
SonglinLyu Aug 16, 2024
e2028ab
force commit all current changes
SonglinLyu Aug 16, 2024
2c096d0
delete data preparation related folder
SonglinLyu Aug 16, 2024
87b15b0
remove useless dataset
SonglinLyu Aug 16, 2024
e332786
add tugraph-analytics dataset
SonglinLyu Aug 16, 2024
15e74ca
rename dbgpt-hub-graph to dbgpt-hub-gql
SonglinLyu Aug 16, 2024
e52e1b2
rename dbgpt-hub-graph to dbgpt-hub-gql
SonglinLyu Aug 16, 2024
c56e7e8
remove eval_data folder
SonglinLyu Aug 16, 2024
310cb18
remove unnecessary log file
SonglinLyu Aug 16, 2024
ebf9318
ignore wandb folder
SonglinLyu Aug 16, 2024
9e8337d
add README.md
SonglinLyu Aug 16, 2024
c41e61e
add table to README
SonglinLyu Aug 16, 2024
b63a504
rename sql to gql
SonglinLyu Aug 19, 2024
40c0fd2
remove unused data_process module
SonglinLyu Aug 19, 2024
0dee13c
remove baseline
SonglinLyu Aug 19, 2024
4bbc197
correct dataset path
SonglinLyu Aug 19, 2024
8258e92
use prettytable to format evaluation output
SonglinLyu Aug 19, 2024
0e03386
add detail log for evaluation
SonglinLyu Aug 19, 2024
9c3e508
remove sql
SonglinLyu Aug 20, 2024
e78e775
remove ouputs that are not query from dataset
SonglinLyu Aug 20, 2024
2c3cf0e
change tugraph-db to tugraph-db-example, this folder only contains ab…
SonglinLyu Aug 20, 2024
99edb59
remove tugraph-analytics folder
SonglinLyu Aug 20, 2024
1792b45
add tugraph-db-example, a mini dataset
SonglinLyu Aug 20, 2024
ccc4f58
update readme, include tugraph-analytics dataset download method
SonglinLyu Aug 20, 2024
c15e240
delete unneeded notation and print
SonglinLyu Aug 20, 2024
7a41f06
reformate with black
SonglinLyu Aug 20, 2024
bc307b0
update readme
SonglinLyu Aug 20, 2024
973efe8
update readme
SonglinLyu Aug 20, 2024
3e2168c
remove temporary change for hub_sql
SonglinLyu Aug 21, 2024
715fe9d
update introduction
SonglinLyu Aug 21, 2024
6d60913
remove temporary change for hub_sql
SonglinLyu Aug 21, 2024
9c60eb3
update readme
SonglinLyu Aug 21, 2024
4b82cbf
update baseline test result
SonglinLyu Aug 26, 2024
be20757
add link to tugraph-analytics parser
SonglinLyu Aug 26, 2024
5906fcd
fix: readme add text2nlu & gql
csunny Aug 26, 2024
c29f447
fix readme comments
SonglinLyu Aug 26, 2024
906bd68
Merge branch 'text2gql_lsl' of https://github.com/SonglinLyu/DB-GPT-H…
SonglinLyu Aug 26, 2024
5c01d59
update readme(dataset link, baseline result)
SonglinLyu Aug 26, 2024
7adfdd7
Merge branch 'main' into text2gql_lsl
zhanghy-sketchzh Aug 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,36 @@ data/eval
output_pred/
wandb/
src/dbgpt-hub-sql/dbgpt_hub_sql/data/*
src/dbgpt-hub-gql/dbgpt_hub_gql/data/*
src/dbgpt-hub-sql/codellama/*
src/dbgpt-hub-gql/codellama/*
src/dbgpt-hub-sql/wandb/*
src/dbgpt-hub-gql/wandb/*
# But track the data/eval_data folder itself
!src/dbgpt-hub-sql/dbgpt_hub_sql/data/eval_data/
!src/dbgpt-hub-sql/dbgpt_hub_sql/data/dataset_info.json
!src/dbgpt-hub-sql/dbgpt_hub_sql/data/example_text2sql.json
!src/dbgpt-hub-gql/dbgpt_hub_gql/data/tugraph-db-example
!src/dbgpt-hub-gql/dbgpt_hub_gql/data/dataset_info.json
!src/dbgpt-hub-gql/dbgpt_hub_gql/data/example_text2sql.json

# Ignore everything under dbgpt_hub_sql/ouput/ except the adapter directory
src/dbgpt-hub-sql/dbgpt_hub_sql/output/
src/dbgpt-hub-sql/dbgpt_hub_sql/output/adapter/*
!src/dbgpt-hub-sql/dbgpt_hub_sql/output/adapter/.gitkeep
src/dbgpt-hub-sql/dbgpt_hub_sql/output/logs/*
!src/dbgpt-hub-sql/dbgpt_hub_sql/output/logs/.gitkeep
src/dbgpt-hub-sql/dbgpt_hub_sql/output/pred/*
!src/dbgpt-hub-sql/dbgpt_hub_sql/output/pred/.gitkeep

src/dbgpt-hub-gql/dbgpt_hub_gql/output/
src/dbgpt-hub-gql/dbgpt_hub_gql/output/adapter/*
!src/dbgpt-hub-gql/dbgpt_hub_gql/output/adapter/.gitkeep
src/dbgpt-hub-gql/dbgpt_hub_gql/output/logs/*
!src/dbgpt-hub-gql/dbgpt_hub_gql/output/logs/.gitkeep
src/dbgpt-hub-gql/dbgpt_hub_gql/output/pred/*
!src/dbgpt-hub-gql/dbgpt_hub_gql/output/pred/.gitkeep

# Ignore NLU output
src/dbgpt-hub-nlu/output
src/dbgpt-hub-nlu/data
Expand Down
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,17 @@
</p>



[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
[**Text2SQL**](README.zh.md) | [**Text2GQL**](src/dbgpt-hub-gql/README.zh.md) | [**Text2NLU**](src/dbgpt-hub-nlu/README.zh.md)

</div>


## 🔥🔥🔥 News
- Support [Text2NLU](src/dbgpt-hub-nlu/README.zh.md) fine-tuning to improve semantic understanding accuracy.
- Support [Text2GQL](src/dbgpt-hub-gql/README.zh.md) fine-tuning to generate graph query.

## Baseline
- update time: 2023/12/08
- metric: execution accuracy (ex)
Expand Down Expand Up @@ -675,14 +683,16 @@ Our work is primarily based on the foundation of numerous open-source contributi
Thanks to all the contributors, especially @[JBoRu](https://github.com/JBoRu) who raised the [issue](https://github.com/eosphoros-ai/DB-GPT-Hub/issues/119) which reminded us to add a new promising evaluation way, i.e. Test Suite. As the paper 《SQL-PALM: IMPROVED LARGE LANGUAGE MODEL ADAPTATION FOR TEXT-TO-SQL》 mentioned, "We consider two commonly-used evaluation metrics: execution accuracy (EX) and test-suite accuracy (TS). EX measures whether the SQL execution outcome matches ground truth (GT), whereas TS measures whether the SQL passes all EX evaluations for multiple tests, generated by database augmentation. Since EX contains false positives, we consider TS as a more reliable evaluation metric".

## 7. Citation
Please consider citing our project if you find it useful:
If you find `DB-GPT-Hub` useful for your research or development, please cite the following <a href="https://arxiv.org/abs/2406.11434" target="_blank">paper</a>:

```bibtex
@software{db-gpt-hub,
author = {DB-GPT-Hub Team},
title = {{DB-GPT-Hub}},
url = {https://github.com/eosphoros-ai/DB-GPT-Hub},
year = {2023}
@misc{zhou2024dbgpthub,
title={DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models},
author={Fan Zhou and Siqiao Xue and Danrui Qi and Wenhui Shi and Wang Zhao and Ganglin Wei and Hongyang Zhang and Caigai Jiang and Gangwei Jiang and Zhixuan Chu and Faqiang Chen},
year={2024},
eprint={2406.11434},
archivePrefix={arXiv},
primaryClass={id='cs.DB' full_name='Databases' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.'}
}
```

Expand Down
20 changes: 14 additions & 6 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,16 @@
</p>



[**英文**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
[**Text2SQL**](README.zh.md) | [**Text2GQL**](src/dbgpt-hub-gql/README.zh.md) | [**Text2NLU**](src/dbgpt-hub-nlu/README.zh.md)
</div>


## 🔥🔥🔥 News
- 支持 [Text2NLU](src/dbgpt-hub-nlu/README.zh.md)微调,提升自然语言理解准确率。
- 支持 [Text2GQL](src/dbgpt-hub-gql/README.zh.md)微调,可以通过自然语言生成图查询语句。

## Baseline
- 更新日期: 2023/12/08
- 评价指标: execution accuracy (ex)
Expand Down Expand Up @@ -648,14 +654,16 @@ poetry run python dbgpt_hub_sql/eval/evaluation.py --plug_value --input Your_mo
**20231104** ,尤其感谢 @[JBoRu](https://github.com/JBoRu) 提的[issue](https://github.com/eosphoros-ai/DB-GPT-Hub/issues/119), 指出我们的之前按照官方网站的95M的数据库去评估的方式的不足,如论文《SQL-PALM: IMPROVED LARGE LANGUAGE MODEL ADAPTATION FOR TEXT-TO-SQL》 指出的 "We consider two commonly-used evaluation metrics: execution accuracy (EX) and test-suite accuracy (TS) [32]. EX measures whether SQL execution outcome matches ground truth (GT), whereas TS measures whether the SQL passes all EX evaluation for multiple tests, generated by database-augmentation. Since EX contains false positives, we consider TS as a more reliable evaluation metric" 。

## 七、引用
如果您觉得我们的项目对您的科研项目或者实际生产项目有帮助,请考虑在您的参考文献里引用`DB-GPT-Hub`:
如果您发现`DB-GPT-Hub`对您的研究或开发有用,请引用以下<a href="https://arxiv.org/abs/2406.11434" target="_blank">论文</a>:

```bibtex
@software{db-gpt-hub,
author = {DB-GPT-Hub Team},
title = {{DB-GPT-Hub}},
url = {https://github.com/eosphoros-ai/DB-GPT-Hub},
year = {2023}
@misc{zhou2024dbgpthub,
title={DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models},
author={Fan Zhou and Siqiao Xue and Danrui Qi and Wenhui Shi and Wang Zhao and Ganglin Wei and Hongyang Zhang and Caigai Jiang and Gangwei Jiang and Zhixuan Chu and Faqiang Chen},
year={2024},
eprint={2406.11434},
archivePrefix={arXiv},
primaryClass={id='cs.DB' full_name='Databases' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.'}
}
```

Expand Down
Loading