Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: trancating stopped and caused logservice being crashed by oom in cn during choas test (kill one logservice continously) #20853

Open
1 task done
aressu1985 opened this issue Dec 20, 2024 · 4 comments
Assignees
Labels
kind/bug Something isn't working severity/s-1
Milestone

Comments

@aressu1985
Copy link
Contributor

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

2.0-dev

Commit ID

9f2b7bc

Other Environment Information

- Hardware parameters:
3*CN: 7C 28G
1*DN: 7C 28G
3*PROXY: 2C 5G
3*LOG: 1C 7G
- OS type:
- Others:

Actual Behavior

test load:

run tpcc 10-10
insert data to a table with 2 thread
and during the test, the chaos tool were continuously kill one log pod by interval 10 mins

then , after about 3 hours, tn stopped truncating and caused logservice crashed by oom.
[github@mo-srv-128 root]$ kubectl -n mo-chaos-9f2b7bc-202412192128 get pod
NAME READY STATUS RESTARTS AGE
mo-chaos-regression-dis-dn-0 0/1 CrashLoopBackOff 98 (37s ago) 14h
mo-chaos-regression-dis-log-0 0/1 CrashLoopBackOff 15 (4m ago) 57m
mo-chaos-regression-dis-log-2 0/1 CrashLoopBackOff 13 (4m6s ago) 47m
mo-chaos-regression-dis-log-3 1/1 Running 0 72m
mo-chaos-regression-dis-proxy-9qwvx 1/1 Running 0 14h
mo-chaos-regression-dis-proxy-wrv76 1/1 Running 0 14h
mo-chaos-regression-dis-tp-cn-6ph65 1/1 Running 0 14h

/matrixorigin/matrixone/pkg/logservice/service.go:137\nmain.startLogService\n\t/go/src/github.com/matrixorigin/matrixone/cmd/mo-service/main.go:354\nmain.startService\n\t/go/src/github.com/matrixorigin/matrixone/cmd/mo-service/main.go:230\nmain.main\n\t/go/src/github.com/matrixorigin/matrixone/cmd/mo-service/main.go:118\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:272"}
{"level":"INFO","time":"2024/12/20 03:01:24.953969 +0000","caller":"motrace/syncer.go:89","msg":"Wait signal done."}
panic: no space left on device

goroutine 1 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x2, 0xc0002420d0, {0x0, 0x0, 0x0})
/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:198 +0xa7
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0002420d0, {0x0, 0x0, 0x0})
/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:264 +0x643
go.uber.org/zap.(*SugaredLogger).log(0xc00118a008, 0x4, {0x728a371, 0x3}, {0xc00adda290, 0x1, 0x1}, {0x0, 0x0, 0x0})
/go/pkg/mod/go.uber.org/[email protected]/sugar.go:295 +0x171
go.uber.org/zap.(*SugaredLogger).Panicf(0xc00118a008, {0x728a371, 0x3}, {0xc00adda290, 0x1, 0x1})
/go/pkg/mod/go.uber.org/[email protected]/sugar.go:189 +0x65
github.com/matrixorigin/matrixone/pkg/logutil.DragonboatAdaptLogger.Panicf({0xc00118a008, 0xc00118a000, {0x729e0c4, 0xa}}, {0x728a371, 0x3}, {0xc00adda290, 0x1, 0x1})
/go/src/github.com/matrixorigin/matrixone/pkg/logutil/dragonboat.go:65 +0x59
github.com/lni/dragonboat/v4/logger.(*dragonboatLogger).Panicf(0xc00217f8f0, {0x728a371, 0x3}, {0xc00adda290, 0x1, 0x1})
/go/pkg/mod/github.com/matrixorigin/dragonboat/[email protected]/logger/logger.go:132 +0x6b
github.com/lni/dragonboat/v4.panicNow({0x78bb120, 0xa1c9800})
/go/pkg/mod/github.com/matrixorigin/dragonboat/[email protected]/nodehost.go:2265 +0xf5
github.com/lni/dragonboat/v4.(*NodeHost).startShard(0xc001186808, 0x0, 0x0, 0xc009790e48, {0x20000, 0x0, 0x1, 0x1, 0xa, 0x1, ...}, ...)
/go/pkg/mod/github.com/matrixorigin/dragonboat/[email protected]/nodehost.go:1684 +0x98e
github.com/lni/dragonboat/v4.(*NodeHost).StartReplica(0xc001186808, 0x0, 0x0, 0x74a45e8, {0x20000, 0x0, 0x1, 0x1, 0xa, 0x1, ...})
/go/pkg/mod/github.com/matrixorigin/dragonboat/[email protected]/nodehost.go:508 +0x165
github.com/matrixorigin/matrixone/pkg/logservice.(*store).startHAKeeperReplica(0xc0000071

mo-log:
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22LqA%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-chaos-9f2b7bc-202412192128%5C%22,%20matrixorigin_io_component%3D%5C%22DNSet%5C%22%7D%20%7C%3D%20%60TRACE-WAL-TRUNCATE%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221734624000000%22,%22to%22:%221734667199000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

test load:

run tpcc 10-10
insert data to a table with 2 thread

Additional information

No response

@volgariver6
Copy link
Contributor

#19122 (comment)

修改一下这个参数然后再试一下,看看还会不会出现问题

@XuPeng-SH XuPeng-SH assigned volgariver6 and unassigned XuPeng-SH Dec 23, 2024
@XuPeng-SH
Copy link
Contributor

@volgariver6

@volgariver6
Copy link
Contributor

pr 已提交

@XuPeng-SH XuPeng-SH assigned Wenbin1002 and unassigned volgariver6 Dec 25, 2024
@XuPeng-SH
Copy link
Contributor

@Wenbin1002 should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working severity/s-1
Projects
None yet
Development

No branches or pull requests

5 participants