Skip to content

Commit c678930

Browse files
committedMar 27, 2024
Add more test cases for large model capabilities
1 parent 38d574a commit c678930

File tree

5 files changed

+137
-0
lines changed

5 files changed

+137
-0
lines changed
 

‎docs/use_cases/coding/case2.md

+6
Original file line numberDiff line numberDiff line change
@@ -56,12 +56,18 @@ public class Credential {
5656
### Minimax/abab5.5-chat(7)
5757
完整实现了需求。
5858

59+
### Lingyi/yi-34b-chat-0205(7)
60+
完整实现了需求。
61+
5962
### Tencent/ChatPro(6)
6063
无视了@Data,自己写了set和get函数。
6164

6265
### Tencent/ChatStd(6)
6366
无视了@Data,自己写了set和get函数。
6467

68+
## Lingyi/yi-vl-plus (6)
69+
私自换了updatedAt的数据类。
70+
6571
### OpenAI/gpt-3.5-turbo(5)
6672
无视了@Data,并且生成了包含所有参数的构造函数,但又少了无参构造函数。
6773

‎docs/use_cases/coding/case4.md

+6
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,17 @@
5353
### Minimax/abab5.5-chat (5)
5454
实现不完整。
5555

56+
## Lingyi/yi-vl-plus (5)
57+
实现不完整。
58+
5659
### Ali/qwen-turbo(4)
5760
命令不完整,而且写了一堆奇怪的命令。
5861

5962
### Xunfei/Spark3.1(4)
6063
实现不完整,而且有错误。
6164

65+
## Lingyi/yi-vl-plus (4)
66+
实现不完整,而且有错误。
67+
6268
### Summary
6369
1. 注意:大型语言模型的响应基于概率性预测,因此并不保证每次都一致。只有通过广泛的测试用例和多次测试,才能更准确地评估模型的性能。

‎docs/use_cases/instruction/case1.md

+4
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@
3737

3838
### Ali/qwen-plus (7)
3939

40+
### Lingyi/yi-34b-chat-0205 (7)
41+
4042
### Baidu/ERNIE-Bot-turbo (6.6)
4143

4244
### Zhipu/chatGLM_turbo (6)
@@ -45,6 +47,8 @@
4547

4648
### Xunfei/Spark3.1 (6)
4749

50+
### Lingyi/yi-vl-plus (6)
51+
4852

4953
### Summary
5054
1. 注意:大型语言模型的响应基于概率性预测,因此并不保证每次都一致。只有通过广泛的测试用例和多次测试,才能更准确地评估模型的性能。
+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
## 需求
2+
3+
给这段话划分语块,语块比固定搭配粒度略粗一些,语块之间要用/分隔,注意一句话应该有多个语块:
4+
5+
To start, let's say you have a single server which is accepting and fulfilling requests for your application. But now, we're getting more users, so we need to scale it. The easiest thing to do would be to add more resources, like RAM, or maybe upgrade your CPU. This is known as vertical scaling. It's pretty easy, but it's very limited. A better approach would be to add replicas so that each server can handle a subset of requests. This is known as horizontal scaling. It's more powerful because we can almost scale infinitely, and we don't even need good machines. It also adds redundancy and fault tolerance because if one of our servers goes down, all of the other servers can continue to fulfill requests. This eliminates our previous single point of failure. But the downside is that this approach is much more complicated.
6+
7+
## 结果
8+
9+
### Moonshot/moonshot-v1-8k (8.5)
10+
To start/ let's say you have a single server/ which is accepting and fulfilling requests/ for your application/ But now/ we're getting more users/ so we need to scale it/ The easiest thing to do/ would be to add more resources/ like RAM/ or maybe upgrade your CPU/ This is known as vertical scaling/ It's pretty easy/ but it's very limited/ A better approach/ would be to add replicas/ so that each server can handle a subset of requests/ This is known as horizontal scaling/ It's more powerful/ because we can almost scale infinitely/ and we don't even need good machines/ It also adds redundancy and fault tolerance/ because if one of our servers goes down/ all of the other servers can continue to fulfill requests/ This eliminates our previous single point of failure/ But the downside/ is that this approach is much more complicated.
11+
12+
### Ali/qwen-plus (7)
13+
To start, / let's say you have a single server / which is accepting and fulfilling requests / for your application. / But now, / we're getting more users, / so we need to scale it. / The easiest thing to do would be / to add more resources, / like RAM, / or maybe upgrade your CPU. / This is known as vertical scaling. / It's pretty easy, / but it's very limited. / A better approach would be / to add replicas / so that each server can handle a subset of requests. / This is known as horizontal scaling. / It's more powerful / because we can almost scale infinitely, / and we don't even need good machines. / It also adds redundancy / and fault tolerance / because if one of our servers goes down, / all of the other servers can continue to fulfill requests. / This eliminates our previous single point of failure. / But the downside is / that this approach is much more complicated.
14+
15+
### Baichuan/Baichuan2-Turbo (6)
16+
To start, let's say / you have a single server / which is accepting and fulfilling requests / for your application. But now, we're getting more users, so we need to scale it. The easiest thing to do / would be to add more resources, like RAM, or maybe upgrade your CPU. This / is known as vertical scaling. It's pretty easy, but it's very limited. A better approach / would be to add replicas / so that each server can handle a subset of requests. This / is known as horizontal scaling. It's more powerful / because we can almost scale infinitely, and we don't even need good machines. It also adds redundancy and fault tolerance / because if one of our servers goes down, all of the other servers can continue to fulfill requests. This eliminates our previous single point of failure. But the downside / is that this approach / is much more complicated.
17+
18+
### Tencent/ChatStd (6)
19+
To start/ let's say/ you have a single server/ which is accepting and fulfilling requests for your application. /But now/ we're getting more users/ so we need to scale it. / The easiest thing to do/ would be to add more resources /like RAM/ or maybe upgrade your CPU. /This is known as vertical scaling/ . /It's pretty easy/ /but it's very limited. /A better approach/ would be to add replicas/ so that each server can handle a subset of requests. /This is known as horizontal scaling/ . /It's more powerful/ because we can almost scale infinitely /and we don't even need good machines. It also adds redundancy and fault tolerance/ because if one of our servers goes down /all of the other servers can continue to fulfill requests. /This eliminates our previous single point of failure. /But the downside/ is that this approach is much more complicated.
20+
21+
### Zhipu/chatGLM_turbo (5)
22+
To start, let's say/ you have a single server/ which is accepting and fulfilling requests/ for your application./ But now, we're getting more users/, so we need to scale it./ The easiest thing to do would be/ to add more resources/, like RAM/, or maybe upgrade your CPU./ This is known as vertical scaling./ It's pretty easy/, but it's very limited./ A better approach would be/ to add replicas/ so that each server can handle a subset of requests./ This is known as horizontal scaling./ It's more powerful/ because we can almost scale infinitely/, and we don't even need good machines./ It also adds redundancy and fault tolerance/ because if one of our servers goes down/, all of the other servers can continue to fulfill requests./ This eliminates our previous single point of failure./ But the downside is that this approach is much more complicated.
23+
24+
### ByteDance/Skylark-chat (5)
25+
To start,/ let's say you have a single server/ which is accepting and fulfulling requests/ for your application./ But now,/ we're getting more users,/ so we need to scale it./ The easiest thing to do/ would be to add more resources,/ like RAM,/ or maybe upgrade your CPU./ This is known as vertical scaling./ It's pretty easy,/ but it's very limited./ A better approach would be to add replicas/ so that each server can handle a subset of requests./ This is known as horizontal scaling./ It's more powerful/ because we can almost scale infinitely,/ and we don't even need good machines./ It also adds redundancy/ and fault tolerance/ because if one of our servers goes down,/ all of the other servers can continue to fulfill requests./ This eliminates our previous single point of failure./ But the downside is that this approach is much more complicated.
26+
27+
### OpenAI/gpt-3.5-turbo (4)
28+
To start/ let's say you have a single server which is accepting and fulfilling requests for your application. But now/ we're getting more users/ so we need to scale it. The easiest thing to do would be to add more resources/ like RAM/ or maybe upgrade your CPU. This is known as vertical scaling. It's pretty easy/ but it's very limited. A better approach would be to add replicas so that each server can handle a subset of requests. This is known as horizontal scaling. It's more powerful because we can almost scale infinitely/ and we don't even need good machines. It also adds redundancy and fault tolerance because if one of our servers goes down/ all of the other servers can continue to fulfill requests. This eliminates our previous single point of failure. But the downside is that this approach is much more complicated.
29+
30+
### Ali/qwen-turbo (4)
31+
To start/let's say you have a single server/which is accepting and fulfilling requests for your application. But now/we're getting more users, so we need to scale it. The easiest thing to do/would be to add more resources, like RAM, or maybe upgrade your CPU./This is known as vertical scaling./It's pretty easy,/but it's very limited. A better approach/would be to add replicas/so that each server can handle a subset of requests./This is known as horizontal scaling./It's more powerful/because we can almost scale infinitely,/and we don't even need good machines. It also adds redundancy and fault tolerance/because if one of our servers goes down,/all of the other servers can continue to fulfill requests./This eliminates our previous single point of failure. But the downside/is that this approach is much more complicated.
32+
33+
### Minimax/abab5.5-chat (4)
34+
To start, let's say you have a single server which is accepting and fulfilling requests for your application./ But now, we're getting more users, so we need to scale it./ The easiest thing to do would be to add more resources, like RAM, or maybe upgrade your CPU./ This is known as vertical scaling./ It's pretty easy, but it's very limited./ A better approach would be to add replicas so that each server can handle a subset of requests./ This is known as horizontal scaling./ It's more powerful because we can almost scale infinitely, and we don't even need good machines./ It also adds redundancy and fault tolerance because if one of our servers goes down, all of the other servers can continue to fulfill requests./ This eliminates our previous single point of failure./ But the downside is that this approach is much more complicated.
35+
36+
### Xunfei/Spark3.1 (2)
37+
To start, let's say you have a single server which is accepting and fulfilling requests for your application. But now, we're getting more users, so we need to scale it. The easiest thing to do would be to add more resources, like RAM, or maybe upgrade your CPU. This is known as vertical scaling. It's pretty easy, but it's very limited. A better approach would be to add replicas so that each server can handle a subset of requests. This is known as horizontal scaling. It's more powerful because we can almost scale infinitely, and we don't even need good machines. It also adds redundancy and fault tolerance because if one of our servers goes down, all of the other servers can continue to fulfill requests. This eliminates our previous single point of failure. But the downside is that this approach is much more complicated.
38+
39+
### Xunfei/Spark3.5 (1)
40+
To start/But now/So we need to scale it/ The easiest thing/This is known as vertical scaling/It's pretty easy, but it's very limited/A better approach/This is known as horizontal scaling/It's more powerful because we can almost scale infinitely, and we don't even need good machines/It also adds redundancy and fault tolerance because if one of our servers goes down, all of the other servers can continue to fulfill requests/This eliminates our previous single point of failure/But the downside is that this approach is much more complicated.
41+
42+
### Baidu/ERNIE-Bot-turbo (0)
43+
以下是对这段话的语块划分:
44+
45+
To start,/分隔 Let's/分隔 say/分隔 you/分隔 have/分隔 a/分隔 single/分隔 server/分隔 which/分隔 is/分隔 accepting/分隔 and/分隔 fulfilling/分隔 requests/分隔 for/分隔 your/分隔 application/分隔 ./分隔
46+
47+
/ But/分隔 now/分隔 ,/分隔 we're/分隔 getting/分隔 more/分隔 users/分隔 ,/分隔 so/分隔 we/分隔 need/分隔 to/分隔 scale/分隔 it/分隔 ./
48+
49+
The/分隔 easiest/分隔 thing/分隔 to/分隔 do/分隔 would/分隔 be/分隔 to/分隔 add/分隔 more/分隔 resources/分隔 ,/分隔 like/分隔 RAM/ 、/
50+
51+
or/ maybe/ upgrade/ your/ CPU/ .// This/ is/ known/ as/ vertical/ scaling/.// It's/ pretty/ easy/,// but/,// it's/ very/ limited/.// A/ better/ approach/,// would/,// be/,// to/,// add/ replicas/,// so/,// that/,// each/,// server/,// can/,// handle/,// a/,// subset/,// of/,// requests/.// This/,// is/,// known/,// as/,// horizontal/,// scaling/.// It's/,// more/,// powerful/,// because/,// we/,// can/,// almost/,// scale/,// infinitely/,// and/,// we/,// don't/,// even/,// need/,// good/,// machines/.// It/,// also/,// adds/,// redundancy/,// and/,// fault/,// tolerance/,// // because/,// if/,// one/,// of/,// our/,// servers/,// goes/,// down/,// ,/.// all/,// of/,// the/,// other/,// servers/,// can/,// continue/,// to/,// fulfill/,// requests/. // This/',// eliminates/',// our/',// previous/',// single/',// point/',// of/',// failure/.
52+
53+
But/',// the/',// downside/', // is/', // that/', // this/', // approach/', // is/', // much/', // more/', // complicated/.
54+
55+
请注意,每个语块之间用“/”进行了划分。每个语块都包含了一个或多个单词,这些单词通常在语法和语义上更紧密地联系在一起。这样的划分可以帮助我们更好地理解和使用这些语块。
56+
57+
58+
### Summary
59+
1. 注意:大型语言模型的响应基于概率性预测,因此并不保证每次都一致。只有通过广泛的测试用例和多次测试,才能更准确地评估模型的性能。
+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
## 需求
2+
3+
给这段话划分语块,语块比固定搭配粒度略粗一些,语块之间要用/分隔,注意一句话应该有多个语块:
4+
5+
备选 prompt:
6+
请将以下文本分成多个语块,每个语块应该包含一个完整的思想或信息单元。使用斜杠(/)来分隔这些语块,以便模拟阅读文章时的自然停顿和换气。请注意,一句话中要包含多个语块,语块的界定类似于固定搭配的粒度:
7+
8+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now. You came to deal with being boring, that's for sure. Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012. You came to DealBook and sat on this stage, and we're thrilled to have you back. But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half. And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?' Yeah, yeah, no idea what this Twitter thing is about. 'Should you platform him?' That's what they said. 'Yeah, did you platform then?' I said that I think it's our role. And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
9+
10+
## 结果
11+
12+
### Moonshot/moonshot-v1-8k (7)
13+
Okay, I'm gonna hold to that,/ but a lot has happened between when I first met you and now./ You came to deal with being boring, that's for sure./ Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012./ You came to DealBook and sat on this stage,/ and we're thrilled to have you back./ But there's been so much that's happened between now and then,/ and there's been so much that's happened in the past week, week and a half./ And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here?/ Can you believe what he just said on Twitter, on 1X, on?'/ Yeah, yeah, no idea what this Twitter thing is about./ 'Should you platform him?' That's what they said./ 'Yeah, did you platform then?'/ I said that I think it's our role./ And I know you have issues with journalists as a platform,/ and I know you have an issue with journalists often times,/ but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas,/ and that's what I'm hoping.
14+
15+
### ByteDance/Skylark-chat (6.5)
16+
好的,我会按照你的要求进行分割,但是需要注意的是,一句话可以包含多个语块,而固定搭配的粒度也可以因人而异。以下是按照你要求的分割结果:
17+
18+
Okay, I'm gonna hold to that, /but a lot has happened/ between when I first met you and now. /You came to deal with being boring,/ that's for sure. /Well, actually,/ it's taken by a driver boring, Kony — driver boring Kony2012. /You came to DealBook/ and sat on this stage,/ and we're thrilled to have you back. /But there's been so much/ that's happened between now and then,/ and there's been so much/ that's happened in the past week,/ week and a half. /And a lot of folks,/ and I want to tell you this elevator,/ a lot of folks called me up and said,/ 'Are you really going to host Elon Musk here?/ Can you believe/ what he just said on Twitter,/ on 1X,/ on?' /Yeah, yeah,/ no idea what this Twitter thing is about. 'Should you platform him?'/ That's what they said. 'Yeah,/ did you platform then?'/ I said/ that I think it's our role. /And I know you have issues/ with journalists as a platform,/ and I know you have an issue/ with journalists often times,/ but I said it's our role/ to have conversations and to inquire/ and to sometimes even interrogate ideas,/ and that's what I'm hoping.
19+
20+
### Ali/qwen-plus (6)
21+
Okay, I'm gonna hold to that / but a lot has happened between when I first met you and now / You came to deal with being boring, that's for sure / Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012 / You came to DealBook and sat on this stage / and we're thrilled to have you back / But there's been so much that's happened between now and then / and there's been so much that's happened in the past week, week and a half / And a lot of folks / and I want to tell you this elevator / a lot of folks called me up and said / 'Are you really going to host Elon Musk here? / Can you believe what he just said on Twitter, on 1X, on?' / Yeah, yeah, no idea what this Twitter thing is about / 'Should you platform him?' / That's what they said / 'Yeah, did you platform then?' / I said that I think it's our role / And I know you have issues with journalists as a platform / and I know you have an issue with journalists often times / but I said it's our role to have conversations / and to inquire / and to sometimes even interrogate ideas / and that's what I'm hoping.
22+
23+
### Baichuan/Baichuan2-Turbo (6)
24+
Okay, /I'm gonna hold to that/, /but a lot has happened between when I first met you and now/. /You came to deal with being boring/, /that's for sure/. /Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012/. /You came to DealBook and sat on this stage/, /and we're thrilled to have you back/. /But there's been so much that's happened between now and then/, /and there's been so much that's happened in the past week, week and a half/. /And a lot of folks, and I want to tell you this elevator/, /a lot of folks called me up and said/, /'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?'/ /Yeah, yeah, no idea what this Twitter thing is about/. /'Should you platform him?'/, /That's what they said/. /'Yeah, did you platform then?'/, /I said that I think it's our role/. /And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times/, /but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas/, /and that's what I'm hoping/.
25+
26+
### OpenAI/gpt-3.5-turbo (4)
27+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now. / You came to deal with being boring, that's for sure. / Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012. / You came to DealBook and sat on this stage, and we're thrilled to have you back. / But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half. / And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?' / Yeah, yeah, no idea what this Twitter thing is about. / 'Should you platform him?' That's what they said. 'Yeah, did you platform then?' / I said that I think it's our role. / And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
28+
29+
### Ali/qwen-turbo (4)
30+
Okay, I'm gonna hold to that / but a lot has happened between when I first met you and now. You came to deal with being boring, that's for sure / It's taken by a driver boring, Kony - driver boring Kony2012. You came to DealBook / and sat on this stage / and we're thrilled to have you back. But there's been so much that's happened between now and then / and there's been so much that's happened in the past week, week and a half. A lot of folks called me up / and they said, 'Are you really going to host Elon Musk here?' 'Can you believe what he just said on Twitter, on 1X, on?' Yeah, yeah, no idea what this Twitter thing is about. 'Should you platform him?' That's what they said. 'I said that I think it's our role.' And I know you have issues with journalists as a platform / and I know you have an issue with journalists often times. But I said it's our role to have conversations / and to inquire / and to sometimes even interrogate ideas / and that's what I'm hoping.
31+
32+
### Zhipu/chatGLM_turbo (2)
33+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now. You came to deal with being boring, that's for sure. Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012. You came to DealBook and sat on this stage, and we're thrilled to have you back. But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half. And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?' Yeah, yeah, no idea what this Twitter thing is about. 'Should you platform him?' That's what they said. 'Yeah, did you platform then?' I said that I think it's our role. And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
34+
35+
划分后的语块如下:
36+
37+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now.
38+
You came to deal with being boring, that's for sure.
39+
Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012.
40+
You came to DealBook and sat on this stage, and we're thrilled to have you back.
41+
But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half.
42+
And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here?
43+
Can you believe what he just said on Twitter, on 1X, on?'
44+
Yeah, yeah, no idea what this Twitter thing is about.
45+
'Should you platform him?' That's what they said.
46+
'Yeah, did you platform then?' I said that I think it's our role.
47+
And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
48+
49+
### Minimax/abab5.5-chat (2)
50+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now. You came to deal with being boring, that's for sure. Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012. You came to DealBook and sat on this stage, and we're thrilled to have you back. But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half. And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?' Yeah, yeah, no idea what this Twitter thing is about. 'Should you platform him?' That's what they said. 'Yeah, did you platform then?' I said that I think it's our role. And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
51+
52+
### Xunfei/Spark3.1 (2)
53+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now. You came to deal with being boring, that's for sure. Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012. You came to DealBook and sat on this stage, and we're thrilled to have you back. But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half. And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?' Yeah, yeah, no idea what this Twitter thing is about. 'Should you platform him?' That's what they said. 'Yeah, did you platform then?' I said that I think it's our role. And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
54+
55+
### Xunfei/Spark3.5 (2)
56+
Okay, I'm gonna hold to that, but a lot has happened between when I first met you and now. You came to deal with being boring, that's for sure. Well, actually, it's taken by a driver boring, Kony — driver boring Kony2012. You came to DealBook and sat on this stage, and we're thrilled to have you back. But there's been so much that's happened between now and then, and there's been so much that's happened in the past week, week and a half. And a lot of folks, and I want to tell you this elevator, a lot of folks called me up and said, 'Are you really going to host Elon Musk here? Can you believe what he just said on Twitter, on 1X, on?' Yeah, yeah, no idea what this Twitter thing is about. 'Should you platform him?' That's what they said. 'Yeah, did you platform then?' I said that I think it's our role. And I know you have issues with journalists as a platform, and I know you have an issue with journalists often times, but I said it's our role to have conversations and to inquire and to sometimes even interrogate ideas, and that's what I'm hoping.
57+
58+
### Baidu/ERNIE-Bot-turbo (0)
59+
Okay/,/I/'m/gonna/hold/to/that/,/but/a/lot/has/happened/between/when/I/first/met/you/and/now/。/You/came/to/deal/with/being/boring/,/that's/for/sure/。/Well/,/actually/,//it's/taken/by/a/driver/boring/,/Kony—/driver/boring/Kony2012/。//You/came/to/DealBook/and/sat/on/this/stage/,//and/we're/thrilled/to/have/you/back/。//But/there's/been/so/much//that's/happened/between//now//and//then//,//and//there's//been//so//much//that's//happened//in//the//past//week//,week//and//a//half//。//And//a//lot//of//folks//,//and//I//want//to//tell//you//this//elevator//,//a//lot//of//folks//called//up//and//said//,//'Are//you//really//going//to//host//Elon/Musk/here?/, //Can//you//believe//what//he//just//said//on//Twitter/, //on/, //1X/, //on/, //?/, //Yeah/, //yeah/, //no/idea//what//this/Twitter/(thing)/is/(about)/。/, //'Should//you/(platform)/him?/, //That's/(what)/they//said/./, //Yeah/, //did/(you)/(platform)/then?/, //I/(said)/that/(I)/think/(it's)/our/(role)/。/, //And/(I)/know/(you)/have/(issues)/with/(journalists)/as/(a)/(platform)/,/(and)/(I)/know/(you)/have/(an)/(issue)/with/(journalists)/often/(times)/,//but/(I)/said/(it's)/(our)/(role)/(to)/(have)/(conversations)/(and)/(to)/(inquire)/(and)/(to)/(sometimes)/(even)/(interrogate)/(ideas)/,/(and)/(that's)/(what)/(I'm)/(hoping)/。
60+
61+
### Summary
62+
1. 注意:大型语言模型的响应基于概率性预测,因此并不保证每次都一致。只有通过广泛的测试用例和多次测试,才能更准确地评估模型的性能。

0 commit comments

Comments
 (0)
Please sign in to comment.