sidebar_label | sidebar_position | slug |
---|---|---|
Fault Diagnosis and Analysis |
5 |
/fault_diagnosis_and_analysis |
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
JuiceFS client will output logs for troubleshooting while running. The level of logs in terms of fatality follows DEBUG < INFO < WARNING < ERROR < FATAL. Since DEBUG logs are not printed by default, you need to explicitly enable it if needed, e.g. by adding the --debug
option when running the JuiceFS client.
Different JuiceFS clients print logs in different ways, which are described as follows.
When a JuiceFS file system is mounted with the -d
option (indicating running in the background), it will print logs to the system log file and local log file simultaneously. Depending on which user is running when mounting the file system, the paths of the local log files are slightly different. For root, the local log file locates at /var/log/juicefs.log
, while it locates at $HOME/.juicefs/juicefs.log
for non-root users. Please refer to --log
option for details.
Depending on the operating system, there are different commands to retrieve system logs or read local log files directly.
tail -n 100 /var/log/juicefs.log
syslog | grep 'juicefs'
cat /var/log/syslog | grep 'juicefs'
cat /var/log/messages | grep 'juicefs'
You can use the grep
command to filter different levels of logs for performance analysis or troubleshooting:
cat /var/log/syslog | grep 'juicefs' | grep '<ERROR>'
Depending on the version of the JuiceFS CSI Driver, there are different ways to retrieve logs. Please refer to CSI Driver documentation for details.
The S3 gateway can only run in the foreground, so client logs are output directly to the terminal. If you deploys the S3 gateway in Kubernetes, you can get logs from the corresponding pods.
The JuiceFS client logs will be mixed into the logs of processes using JuiceFS Hadoop Java SDK, e.g. Spark executor. Thus, you need to use keywords, e.g. juicefs
(case-insensitive), to filter out the logs you do not want.
Each JuiceFS client has an access log that records all operations on the file system in detail, such as operation type, user ID, group ID, file inodes and time cost. Access logs can be used for various purposes such as performance analysis, auditing, and troubleshooting.
An example format of an access log is as follows:
2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
The meaning of each column is:
2021.01.15 08:26:11.003330
: The time of the current operation[uid:0,gid:0,pid:4403]
: User ID, group ID, process ID of the current operationwrite
: Operation type(17669,8666,4993160)
: The input parameters of the current operation type. For example, the input parameters of thewrite
operation in the example are the inode of the written file, the size of the written data, and the offset of the written file. Different operation types have different parameters. For details, please refer to thevfs.go
file.OK
: Indicate the current operation is successful or not. If it is unsuccessful, specific failure information will be output.<0.000010>
: The time (in seconds) that the current operation takes.
You can debug and analyze performance issues with access log, or try using juicefs profile <mount-point>
to see real-time statistics. Run juicefs profile -h
or refer to Operations Profiling for details.
Different JuiceFS clients obtain access log in different ways, which are described below.
There is a virtual file named .accesslog
in the root directory of the JuiceFS file system mount point, the contents of which can be viewed by the cat
command (the command will not exit), for example (assuming the root directory of the mount point is /jfs
):
$ cat /jfs/.accesslog
2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>
2021.01.15 08:26:11.003473 [uid:0,gid:0,pid:4403] write (17675,198,997439): OK <0.000014>
2021.01.15 08:26:11.003616 [uid:0,gid:0,pid:4403] write (17666,390,951582): OK <0.000006>
Please refer to CSI Driver documentation to find the mount pod or CSI driver pod depending on the version of JuiceFS CSI Driver you are using, and the .accesslog
file can be viewed in the root directory of the JuiceFS file system mount point in the pod. The mount point path in the pod is /jfs/<pv_volumeHandle>
. Assuming there is a mount pod named as juicefs-1.2.3.4-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373
, in which pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373
is <pv_volumeHandle>
, you can then use the following command to view the .accesslog
file:
kubectl -n kube-system exec juicefs-chaos-k8s-002-pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373 -- cat /jfs/pvc-d4b8fb4f-2c0b-48e8-a2dc-530799435373/.accesslog
You need to add the --access-log
option when starting the S3 gateway to specify the path to output the access log. By default, the S3 gateway does not output the access log.
You need to add the juicefs.access-log
configuration item in the client configurations of the JuiceFS Hadoop Java SDK to specify the path of the access log output, and the access log is not output by default.
By default, JuiceFS clients will listen to a TCP port locally via pprof to get runtime information such as Goroutine stack information, CPU performance statistics, memory allocation statistics. You can see the specific port number that the current JuiceFS client is listening on by using the system command (e.g. lsof
):
:::note
If you mount JuiceFS as the root user, you need to add sudo
before the lsof
command.
:::
$ lsof -i -nP | grep LISTEN | grep juicefs
juicefs 32666 user 8u IPv4 0x44992f0610d9870b 0t0 TCP 127.0.0.1:6061 (LISTEN)
juicefs 32666 user 9u IPv4 0x44992f0619bf91cb 0t0 TCP 127.0.0.1:6071 (LISTEN)
juicefs 32666 user 15u IPv4 0x44992f062886fc5b 0t0 TCP 127.0.0.1:9567 (LISTEN)
By default, pprof listens on port numbers ranging from 6060 to 6099. That's why the actual port number in the above example is 6061. Once you get the listening port number, you can view all the available runtime information by accessing http://localhost:<port>/debug/pprof
, and some important runtime information will be shown as follows:
- Goroutine stack information:
http://localhost:<port>/debug/pprof/goroutine?debug=1
- CPU performance statistics:
http://localhost:<port>/debug/pprof/profile?seconds=30
- Memory allocation statistics:
http://localhost:<port>/debug/pprof/heap
To make it easier to analyze this runtime information, you can save it locally, e.g.:
$ curl 'http://localhost:<port>/debug/pprof/goroutine?debug=1' > juicefs.goroutine.txt
$ curl 'http://localhost:<port>/debug/pprof/profile?seconds=30' > juicefs.cpu.pb.gz
```bash
$ curl 'http://localhost:<port>/debug/pprof/heap' > juicefs.heap.pb.gz
If you have the go
command installed, you can analyze it directly with the go tool pprof
command. For example to analyze CPU performance statistics:
$ go tool pprof 'http://localhost:<port>/debug/pprof/profile?seconds=30'
Fetching profile over HTTP from http://localhost:<port>/debug/pprof/profile?seconds=30
Saved profile in /Users/xxx/pprof/pprof.samples.cpu.001.pb.gz
Type: cpu
Time: Dec 17, 2021 at 1:41pm (CST)
Duration: 30.12s, Total samples = 32.06s (106.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 30.57s, 95.35% of 32.06s total
Dropped 285 nodes (cum <= 0.16s)
Showing top 10 nodes out of 192
flat flat% sum% cum cum%
14.73s 45.95% 45.95% 14.74s 45.98% runtime.cgocall
7.39s 23.05% 69.00% 7.41s 23.11% syscall.syscall
2.92s 9.11% 78.10% 2.92s 9.11% runtime.pthread_cond_wait
2.35s 7.33% 85.43% 2.35s 7.33% runtime.pthread_cond_signal
1.13s 3.52% 88.96% 1.14s 3.56% runtime.nanotime1
0.77s 2.40% 91.36% 0.77s 2.40% syscall.Syscall
0.49s 1.53% 92.89% 0.49s 1.53% runtime.memmove
0.31s 0.97% 93.86% 0.31s 0.97% runtime.kevent
0.27s 0.84% 94.70% 0.27s 0.84% runtime.usleep
0.21s 0.66% 95.35% 0.21s 0.66% runtime.madvise
Runtime information can also be exported to visual charts for a more intuitive analysis. The visual charts can be exported to various formats such as HTML, PDF, SVG, PNG, etc. For example, the command to export memory allocation statistics as a PDF file is as follows:
:::note The export to visual chart function relies on Graphviz, so please install it first. :::
go tool pprof -pdf 'http://localhost:<port>/debug/pprof/heap' > juicefs.heap.pdf
For more information about pprof, please see the official documentation.
Pyroscope is an open source continuous profiling platform. It will help you:
- Find performance issues and bottlenecks in your code
- Resolve issues of high CPU utilization
- Understand the call tree of your application
- Track changes over time
JuiceFS supports using the --pyroscope
option to pass in the pyroscope server address, and metrics are pushed to the server every 10 seconds. If permission verification is enabled on the server, the verification information API Key can be passed in by the environment variable PYROSCOPE_AUTH_TOKEN
:
export PYROSCOPE_AUTH_TOKEN=xxxxxxxxxxxxxxxx
juicefs mount --pyroscope http://localhost:4040 redis://localhost /mnt/jfs
juicefs dump --pyroscope http://localhost:4040 redis://localhost dump.json