Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错日志文件路径没打全,难以定位问题 #5214

Closed
TOMO-CAT opened this issue Jun 13, 2024 · 46 comments
Closed

报错日志文件路径没打全,难以定位问题 #5214

TOMO-CAT opened this issue Jun 13, 2024 · 46 comments
Labels

Comments

@TOMO-CAT
Copy link

Xmake 版本

v2.9.2

操作系统版本和架构

Linux 720ce3a659a2 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

描述问题

报错信息看着像是文件反序列化失败,但是路径没打全无法定位问题:
369b71a6686dc4be48495eb855047d83

期待的结果

希望 fatal 日志可以打印出更详尽的信息。

工程配置

附加信息和错误日志

@TOMO-CAT TOMO-CAT added the bug label Jun 13, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Title: The error log file path is not complete and it is difficult to locate the problem.

@waruqi
Copy link
Member

waruqi commented Jun 13, 2024

local references = os.isfile(references_file) and io.load(references_file) or {}

这里 print 下文件看下,估计是毁了,然后把这个包删了重装再试试

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


local references = os.isfile(references_file) and io.load(references_file) or {}

Print the file here and take a look. It is probably destroyed. Then delete the package and reinstall it and try again.

@TOMO-CAT
Copy link
Author

包删了就好了,但是不知道为啥会写花,没保留现场。这个文件有最长限制吗?会定时删掉一些内容吗,照我们现在每天编译的量来说,估计很快能复现了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


,

It would be fine if I deleted the package, but I don’t know why the flowers were written and the scene was not retained. Is there a maximum limit on this file? Will some content be deleted regularly? Judging from the amount of compilation we do every day, it is estimated that it will be repeated soon.

@waruqi
Copy link
Member

waruqi commented Jun 13, 2024

暂时不清楚,这边没复现过,按理不会写坏,这个就要你这帮忙直接调下源码了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I'm not sure at the moment. It hasn't been reproduced here. It shouldn't be broken. I need your help to directly adjust the source code.

@TOMO-CAT
Copy link
Author

暂时不清楚,这边没复现过,按理不会写坏,这个就要你这帮忙直接调下源码了。

ok,现场被我删了已经,等复现的时候再看看文件被写成啥样了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I’m not sure at the moment. It has never been reproduced here. It should not be broken. I need your help to directly adjust the source code.

OK, I have deleted the scene. I will wait and see how the file is written when it reappears.

@TOMO-CAT
Copy link
Author

暂时不清楚,这边没复现过,按理不会写坏,这个就要你这帮忙直接调下源码了。

今天发现两起一样的问题:
image
我把文件下载下来了,发现最后多了一个右花括号? 能在每次 dump 的时候做一次全量覆盖吗,而且在我们机器上这个目录会一直增加,因为每次 jenkins 流水线都会创建一个 workspace,能禁用 reference.txt 这个功能吗?

@TOMO-CAT
Copy link
Author

image

@waruqi
Copy link
Member

waruqi commented Jun 17, 2024

按理是覆盖的,这边正常,暂时复现不了。。这个没法禁用,只有找到根本原因修掉。

而且如果是 io 写入有问题,即使这里禁用了,其他地方类似的写入,同样可能会存在这种问题,治标不治本。

@TOMO-CAT
Copy link
Author

按理是覆盖的,这边正常,暂时复现不了。。这个没法禁用,只有找到根本原因修掉。

而且如果是 io 写入有问题,即使这里禁用了,其他地方类似的写入,同样可能会存在这种问题,治标不治本。

基本上是必现了,用的版本是 xmake v2.9.2+HEAD.1883b6b9f,翻了一下代码就是 io.save?
image
有没有可能是 io.save 的并发问题?

@TOMO-CAT
Copy link
Author

就算有复现问题的环境,也没法还原当时为啥写花了,这只能从代码层面上分析了

@waruqi
Copy link
Member

waruqi commented Jun 17, 2024

可能是你这里 jenkins 同时开了多个工程,并发安装一个包,导致多个 xmake 子进程同时去写一个包的配置,互相覆盖了。

我看到 refernces.txt 里,就是有 N 个工程引用到了。

@TOMO-CAT
Copy link
Author

可能是你这里 jenkins 同时开了多个工程,并发安装一个包,导致多个 xmake 子进程同时去写一个包的配置,互相覆盖了。

我看到 refernces.txt 里,就是有 N 个工程引用到了。

jenkins 每个流水线都会创建一个独立的文件夹,编译完就删了,所以这个目录会一直增加,不能给这里加个并发保护吗?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I’m not sure at the moment. It has never been reproduced here. It should not be broken. I need your help to directly adjust the source code.

Found two similar problems today:
image
I downloaded the file and found that there is an extra right curly bracket at the end? Can we do a full coverage every time we dump, and this directory will keep increasing on our machine, because every time the jenkins pipeline creates a workspace, can we disable the reference.txt function?

@waruqi
Copy link
Member

waruqi commented Jun 17, 2024

目前包下载和安装会锁,但是这个是在 fetch 阶段去读写的,这块暂时没锁

@TOMO-CAT
Copy link
Author

目前包下载和安装会锁,但是这个是在 fetch 阶段去读写的,这块暂时没锁

我感觉是我们的编译任务数比较高,所以这个问题基本上必现,有考虑处理一下 io.save 的并发问题吗

@waruqi
Copy link
Member

waruqi commented Jun 17, 2024

跟 io.save 无关,是包的并发读写问题。。我这里可以加锁规避部分。。但是我还是不建议你们这么搞。。即使不会出错,同时并发安装一个包,也会导致等锁,反而更慢。

甚至可能让其他任务一直 wait 好久,毕竟安装一个包是非常慢的。

@TOMO-CAT
Copy link
Author

跟 io.save 无关,是包的并发读写问题。。我这里可以加锁规避部分。。但是我还是不建议你们这么搞。。即使不会出错,同时并发安装一个包,也会导致等锁,反而更慢。

包的并发读写在我们这还是比较正常的,jenkins 这种编译物理机不可能只跑一个编译任务的,多个编译任务可能需要安装同一个包。不过之前看源码 package install 的时候本来就是加锁的吧?那我可以通过重写 on_fetch 绕过这个 references.txt 的读写吗?

@TOMO-CAT
Copy link
Author

跟 io.save 无关,是包的并发读写问题。。我这里可以加锁规避部分。。但是我还是不建议你们这么搞。。即使不会出错,同时并发安装一个包,也会导致等锁,反而更慢。

甚至可能让其他任务一直 wait 好久,毕竟安装一个包是非常慢的。

我感觉安装 package 比较慢或者打印 warning 都可以,总比并发安全完之后把 references.txt 写花要好吧。感觉还是 io.save 的并发问题,多个 xmake 进程同时写某个 package 的 references.txt 文件,能不能在这个粒度上加锁。

@waruqi
Copy link
Member

waruqi commented Jun 17, 2024

跟 io.save 无关,是包的并发读写问题。。我这里可以加锁规避部分。。但是我还是不建议你们这么搞。。即使不会出错,同时并发安装一个包,也会导致等锁,反而更慢。

包的并发读写在我们这还是比较正常的,jenkins 这种编译物理机不可能只跑一个编译任务的,多个编译任务可能需要安装同一个包。不过之前看源码 package install 的时候本来就是加锁的吧?

我刚不是说了么,install 有锁,fetch 没锁,fetch 时候也会读写包文件

那我可以通过重写 on_fetch 绕过这个 references.txt 的读写吗?

内部 fetch ,绕不过,而且即使你绕过了,也没用,刚说了治标不治本,又不止读写这一个文件,有并发冲突,各种包文件都有可能毁,你光绕过这一个有啥用。。

我刚 fetch 也把锁加上了,按理应该不会再同时访问到一个包,但是多个进程并发安装包,遇到冲突,避免不了卡着等其他进程安装完包。反正我是不建议你们这么玩。编译任务可以并发,但是包安装,没必要并发装

@TOMO-CAT
Copy link
Author

编译任务可以并发,但是包安装,没必要并发装

单个编译任务就是直接 xmake -bvD,不同 jenkins 任务之间的隔离不太好做,我们研究下怎么避免 package 的并发安装吧。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It is supposed to be covered. It is normal here and cannot be reproduced for the time being. . This cannot be disabled, only the root cause can be found and fixed.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Even if there is an environment where the problem is reproduced, there is no way to restore why it was written at that time. This can only be analyzed from the code level.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It is supposed to be covered. It is normal here and cannot be reproduced for the time being. . This cannot be disabled, only the root cause can be found and fixed.

And if there is a problem with io writing, even if it is disabled here, similar writing in other places may also have this problem, treating the symptoms but not the root cause.

Basically, it must appear. The version used is xmake v2.9.2+HEAD.1883b6b9f. After looking through the code, it turns out io.save?
image
Is it possible that it is a concurrency issue with io.save?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It may be that you have multiple projects opened in Jenkins at the same time and a package is installed concurrently, causing multiple xmake sub-processes to write the configuration of a package at the same time, overwriting each other.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Currently, package downloading and installation are locked, but this is read and written during the fetch stage, so this area is not locked for the time being.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It may be that you have multiple projects opened in Jenkins at the same time, and a package is installed concurrently, causing multiple xmake sub-processes to write the configuration of a package at the same time, overwriting each other.

I see that in references.txt, there are N projects referenced.

Each Jenkins pipeline will create an independent folder and delete it after compilation, so this directory will keep increasing. Can't we add concurrency protection here?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It has nothing to do with io.save, it is a problem of concurrent reading and writing of the package. . I can add a lock avoidance part here. . But I still don’t recommend you do this. . Even if there is no error, installing a package concurrently will cause lock waiting, which will make it slower.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Currently, package downloading and installation are locked, but this is read and written during the fetch stage, so this area is not locked for the time being.

I feel that the number of compilation tasks is relatively high, so this problem is bound to occur. Have you considered solving the concurrency problem of io.save?

@TOMO-CAT
Copy link
Author

我刚 fetch 也把锁加上了,按理应该不会再同时访问到一个包,但是多个进程并发安装包,遇到冲突,避免不了卡着等其他进程安装完包。反正我是不建议你们这么玩。编译任务可以并发,但是包安装,没必要并发装

刚才想了一下,这里应该不是并发安装同一个包的问题,而是多个编译任务同时引用了同一个 package,在编译开始阶段 find package 需要修改该 package 的 references.txt。不知道你这里的改动能否修复这个问题。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It has nothing to do with io.save, it is a problem of concurrent reading and writing of the package. . I can add a lock avoidance part here. . But I still don’t recommend you do this. . Even if there is no error, installing a package concurrently will cause lock waiting, which will make it slower.

Concurrent reading and writing of packages is relatively normal here. It is impossible for a compilation physical machine like Jenkins to run only one compilation task. Multiple compilation tasks may need to install the same package. But when I looked at the source code package install before, it was already locked, right? So can I bypass the reading and writing of references.txt by overriding on_fetch?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It has nothing to do with io.save, it is a problem of concurrent reading and writing of the package. . I can add a lock avoidance part here. . But I still don’t recommend you do this. . Even if there is no error, installing a package concurrently will cause lock waiting, which will make it slower.

It may even make other tasks wait for a long time, after all, installing a package is very slow.

I feel that installing the package is slow or printing a warning is better than writing references.txt after the concurrency safety is completed. It feels like it's still a concurrency issue with io.save. Multiple xmake processes write the references.txt file of a package at the same time. Can locking be done at this granularity?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


It has nothing to do with io.save, it is a problem of concurrent reading and writing of the package. . I can add a lock avoidance part here. . But I still don’t recommend you do this. . Even if there is no error, installing a package concurrently will cause lock waiting, which will make it slower.

Concurrent reading and writing of packages is relatively normal here. It is impossible for a compilation physical machine like Jenkins to run only one compilation task. Multiple compilation tasks may require the installation of the same package. But when I looked at the source code package install before, it was already locked, right?

Didn't I just say that install has a lock but fetch does not? The package file will also be read and written during fetch.

Can I bypass the reading and writing of references.txt by overriding on_fetch?

Internal fetch cannot be bypassed, and even if you bypass it, it is useless. As I just said, it treats the symptoms but not the root cause. It is not just about reading and writing this one file. There are concurrency conflicts, and various package files may be destroyed. You can just bypass this. What's the use of one. .

I just added a lock to fetch. Logically, a package should no longer be accessed at the same time. However, if multiple processes install the package concurrently and encounter conflicts, it cannot avoid being stuck waiting for other processes to finish installing the package. Anyway, I don’t recommend you to play like this. Compilation tasks can be done concurrently, but package installation is not necessary.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Compilation tasks can be performed concurrently, but package installation is not necessary.

A single compilation task is directly xmake -bvD. The isolation between different jenkins tasks is not easy to achieve. Let's study how to avoid concurrent installation of packages.

@waruqi
Copy link
Member

waruqi commented Jun 17, 2024

我刚 fetch 也把锁加上了,按理应该不会再同时访问到一个包,但是多个进程并发安装包,遇到冲突,避免不了卡着等其他进程安装完包。反正我是不建议你们这么玩。编译任务可以并发,但是包安装,没必要并发装

刚才想了一下,这里应该不是并发安装同一个包的问题,而是多个编译任务同时引用了同一个 package,在编译开始阶段 find package 需要修改该 package 的 references.txt。不知道你这里的改动能否修复这个问题。

fetch == find

@TOMO-CAT
Copy link
Author

fetch == find

那我这边更新完再观察下

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I just added a lock to fetch. It should not be possible to access a package at the same time. However, if multiple processes install the package concurrently and encounter conflicts, it cannot avoid being stuck waiting for other processes to finish installing the package. Anyway, I don’t recommend you to play like this. Compilation tasks can be done concurrently, but package installation is not necessary.

I just thought about it. It should not be a problem of concurrent installation of the same package, but multiple compilation tasks referencing the same package at the same time. Find package needs to modify the references.txt of the package at the beginning of compilation. I don't know if your changes here can fix this problem.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I just added a lock to fetch. It should not be possible to access a package at the same time. However, if multiple processes install the package concurrently and encounter conflicts, it cannot avoid being stuck waiting for other processes to finish installing the package. Anyway, I don’t recommend you to play like this. Compilation tasks can be done concurrently, but package installation is not necessary.

I just thought about it. It should not be a problem of concurrent installation of the same package, but multiple compilation tasks referencing the same package at the same time. At the beginning of compilation, find package needs to modify the references.txt of the package. I don't know if your changes here can fix this problem.

fetch == find

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


fetch == find

Then I'll take a look after updating it here.

@waruqi
Copy link
Member

waruqi commented Jun 24, 2024

还有问题么

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Are there any questions?

@TOMO-CAT
Copy link
Author

还有问题么

没有复现了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Are there any questions?

There is no recurrence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants