Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs #95

MooreZheng · 2024-05-07T14:17:18Z

What would you like to be added/modified:
Based on existing datasets, the issue aims to build a benchmark for domain-specific large models on KubeEdge-Ianvs. Namely, it aims to help all Edge AI application developers validate and select the best-matched domain-specific large models. This issue includes:

Benchmark Dataset Map: A mapping document, e.g., a table, includes test datasets and their download method for various specific domains.
Large Model Interfaces: Integrates open-source benchmarking projects like OpenCompass. Provides model API addresses and keys for online large model invocation.
Domain-specific Large Model Benchmark: Focuses on NLP or multimodal tasks. Constructs a suite for the government sector, including test datasets, evaluation metrics, testing environments, and usage guidelines.
(Advanced) Industrial/Medical Large Model Benchmark: Includes metrics and examples.
(Advanced) Efficient Evaluation: Enables concurrent execution of tasks with automatic request and result collection.
(Advanced) Task Execution and Monitoring: Visualizes the large model invocation process.

Why is this needed:
As large models enter the era of scaled applications, the cloud has already provided infrastructure and services for these large models. Relevant customers have further proposed targeted application requirements on the edge side, including personalization, data compliance, and real-time capabilities, making AI services with cloud-edge collaboration a major trend. However, there are currently two major challenges in terms of product definition, service quality, service qualifications, and industry influence: general competitiveness and customer trust problems. The crux of the matter is that the current large model benchmarking focuses on assessing general basic capabilities and fails to drive large model applications from an industry or domain-specific perspective.

This issue reflects the real value of large models through industry applications from the perspectives of the domain-specific large model and cloud-edge collaborative AI, using industry benchmarks to drive the incubation of large model applications. Based on the collaborative AI benchmark test suite KubeEdge-Ianvs, this issue supplements the large model testing tool interface, provides matching test datasets, and constructs large model test suites for specific domains, e.g., for governments.

Recommended Skills:
KubeEdge-Ianvs, Python, LLMs

Useful links:
Introduction to Ianvs
Quick Start
How to test algorithms with Ianvs
Testing incremental learning in industrial defect detection
Benchmarking for embodied AI
KubeEdge-Ianvs
Example LLMs Benchmark List
Ianvs v0.1 documentation
（中国）国家标准计划《人工智能预训练模型第2部分：评测指标与方法》及政务大模型、工业大模型等标准化文件

MooreZheng · 2024-05-09T03:35:10Z

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.

IcyFeather233 · 2024-05-23T15:29:47Z

I have a question about the Benchmark Dataset Map: which domains should this dataset cover? Is it for all domains, or just industrial and government sectors?
Also, if I need to submit a preliminary version, where would be the most appropriate directory to submit it?

MooreZheng · 2024-05-30T01:30:43Z

I have a question about the Benchmark Dataset Map: which domains should this dataset cover? Is it for all domains, or just industrial and government sectors? Also, if I need to submit a preliminary version, where would be the most appropriate directory to submit it?

For this issue, preferred domains would be those currently making great impacts in LLM, e.g., government affairs, industry, and medical domains.
It depends on what is included in the submitted version. In the beginning, a proposal would be preferred.

IcyFeather233 · 2024-07-31T15:57:36Z

/assign

MooreZheng changed the title ~~Domain-specific Large Model Benchmarking Based on KubeEdge-Ianvs~~ Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs May 7, 2024

MooreZheng mentioned this issue Jul 18, 2024

Add llm-benchmarks proposal #113

Merged

MooreZheng added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 13, 2024

MooreZheng assigned IcyFeather233 Aug 13, 2024

IcyFeather233 mentioned this issue Sep 9, 2024

OSPP: Implementation of Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs #95

Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs #95

MooreZheng commented May 7, 2024 •

edited

Loading

MooreZheng commented May 9, 2024 •

edited

Loading

IcyFeather233 commented May 23, 2024

MooreZheng commented May 30, 2024

IcyFeather233 commented Jul 31, 2024

Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs #95

Domain-Specific Large Model Benchmarking Based on KubeEdge-Ianvs #95

Comments

MooreZheng commented May 7, 2024 • edited Loading

MooreZheng commented May 9, 2024 • edited Loading

IcyFeather233 commented May 23, 2024

MooreZheng commented May 30, 2024

IcyFeather233 commented Jul 31, 2024

MooreZheng commented May 7, 2024 •

edited

Loading

MooreZheng commented May 9, 2024 •

edited

Loading