fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=0 --size=512M --numjobs=4 --runtime=240 --group_reporting
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size=512M --numjobs=4 --runtime=240 --group_reporting
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
-
- Async mode
- 8K block size
- Direct IO
- 100% Reads
fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=8k --numjobs=8 --size=1G --runtime=600 --group_reporting
-
- Async mode
- 32K block size
- Direct IO
- 100% Writes
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k --numjobs=4 --size=2G --runtime=600 --group_reporting
-
- Async mode
- 8K block size
- Direct IO
- 100% Reads
fio --name=randread --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=16 --size=1G --runtime=600 --group_reporting
-
- Async mode
- 64K block size
- Direct IO
- 100% Writes
fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=64k --numjobs=8 --size=512m --runtime=600 --group_reporting
-
- Async mode
- 16K block size
- Direct IO
- 90% Reads/10% Writes
fio --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=8 --rwmixread=90 --size=1G --runtime=600 --group_reporting
Creates 8 files (numjobs=8), each with size 512MB (size) at 64K block size (bs=64k) and will perform random read/write (rw=randrw) with the mixed workload of 70% reads and 30% writes. The job will run for full 5 minutes (runtime=300 & time_based) even if the files were created and read/written.
fio --name=randrw --ioengine=libaio --iodepth=1 --rw=randrw --bs=64k --direct=1 --size=512m --numjobs=8 --runtime=300 --group_reporting --time_based --rwmixread=70
Compare disk performance with a simple 3:1 4K read/write. The test creates a 4 GB file and perform 4KB reads and writes using a (75%/25%) split within the file, with 64 operations running at a time. The 3:1 ratio represents a typical database.
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
dd's cons:
- This is a single-threaded, sequential-write test. If you are running the typical web+database server on your VPS, the number is meaningless because typical services do not do long-running sequential writes.
- The amount of data written (1GB) is small; and hence can be strongly influenced by caching on the host server, or the host's RAID controller. (The conv=fdatasync only applies to the VPS, not the host).
- It executes for a very short period of time; just a few seconds on faster I/O subsystems. This isn't enough to get a consistant result.
- There's no read performance testing at all.
ioping
allows you to see if storage is performing as expected, or if there are some performance issues that can express themselves as general slowness and as latency spikes for some requests. These latency issues are not always easily visible in historical graphs that are plotting averages.
What Storage Latencies Matter Most for MySQL ?
Before we look at using ioping
to measure them, what I/O latencies matter most for MySQL?
The first is Sequential Synchronous writes to the Innodb Log File. Any stalls in these will stall write transaction commits, and all following transactions commits as well. Even though MySQL supports Group Commit, only one such Group Commit operation can process at any moment in time.
The second is Random Reads, which are submitted through Asynchronous IO, typically using a DirectIO operation. This is critical for serving your general I/O intensive queries: Selects, Updates, Deletes and most likely Inserts will relay them on fetching such data from storage. Such fetches are latency sensitive: since they must be completed during query execution, they can’t be delayed.
You may ask, "What is about Random Writes?" Random (non-sequential) writes happen in the background as InnoDB flushes dirty pages from its buffer pool. While it is important, storage has enough throughput to keep up with the workload. It is not latency sensitive since it is not in any query execution critical path.
One more access pattern important for MySQL performance is writing binary logs (especially with sync_binlog=1). This is different from writing to the InnoDB log
file, because writes go to the end of file and cause the file to grow. As such, it requires constant updates to the file system metadata. Unfortunately, it doesn’t look like ioping supports this I/O pattern yet.
To simulate writing to the log file, we will use a medium-sized file (64M) and sequential 4K size writes to assess the latency:
ioping -S64M -L -s4k -W -c 10 .
ioping -A -D -s16k -c 10 .
ioping -k -B -S64M -L -s4k -W -c 100 -i 0.1 .
For monitoring you might want to look at offsets 6,7,8 — which specify avg, max and stdev statistics for IO requests measured in nanoseconds
ioping -p 100 -c 200 -i 0 -q .
99 10970974 9024 36961531 90437 110818 358872 30756 100 12516420
100 9573265 10446 42785821 86849 95733 154609 10548 100 10649035
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
(1) count of requests in statistics
(2) running time (nanoseconds)
(3) requests per second (iops)
(4) transfer speed (bytes per second)
(5) minimal request time (nanoseconds)
(6) average request time (nanoseconds)
(7) maximum request time (nanoseconds)
(8) request time standard deviation (nanoseconds)
(9) total requests (including warmup, too slow or too fast)
(10) total running time (nanoseconds)
- How to install ioping
~ # git clone https://github.com/koct9i/ioping.git
Cloning into 'ioping'...
remote: Enumerating objects: 55, done.
remote: Counting objects: 100% (55/55), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 746 (delta 33), reused 37 (delta 18), pack-reused 691
Receiving objects: 100% (746/746), 202.24 KiB | 180.00 KiB/s, done.
Resolving deltas: 100% (423/423), done.
~ # cd ioping/
~ # make
gcc -o ioping ioping.c -DEXTRA_VERSION=\".16.gf549dff\" -g -O2 -funroll-loops -ftree-vectorize -std=gnu99 -Wall -Wextra -pedantic -lm -lrt
~ # make install
mkdir -p /usr/local/bin
install -m 0755 ioping /usr/local/bin
mkdir -p /usr/local/share/man/man1
install -m 644 ioping.1 /usr/local/share/man/man1
~ # ioping -R /dev/dm-5
--- /dev/dm-5 (block device 5 GiB) ioping statistics ---
157 requests completed in 2.99 s, 628 KiB read, 52 iops, 210.3 KiB/s
generated 158 requests in 3.01 s, 632 KiB, 52 iops, 210.2 KiB/s
min/avg/max/mdev = 18.7 ms / 19.0 ms / 22.6 ms / 512.7 us
~ # ioping -RL /dev/dm-5
--- /dev/dm-5 (block device 5 GiB) ioping statistics ---
59 requests completed in 2.70 s, 14.8 MiB read, 21 iops, 5.47 MiB/s
generated 60 requests in 3.03 s, 15 MiB, 19 iops, 4.94 MiB/s
min/avg/max/mdev = 37.8 ms / 45.7 ms / 185.5 ms / 24.2 ms
~ # ioping /dev/dm-5
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=1 time=19.8 ms (warmup)
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=2 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=3 time=19.2 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=4 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=5 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=6 time=19.0 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=7 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=8 time=19.9 ms (slow)
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=9 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=10 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=11 time=19.2 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=12 time=19.1 ms
4 KiB <<< /dev/dm-5 (block device 5 GiB): request=13 time=19.2 ms
~ # rpm -q open-iscsi
open-iscsi-2.0.873-20.4.x86_64
~ # rpm -q multipath-tools
multipath-tools-0.5.0-30.1.x86_64
~ # cat /etc/multipath.conf
defaults {
verbosity 2
no_path_retry "fail"
user_friendly_names "yes"
# find_multipaths "no"
polling_interval 10
path_checker tur
max_fds 8192
flush_on_last_del yes
force_sync yes
}
blacklist {
# devnode ".*"
devnode "^(ram|raw|loop|fd|md|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
device {
vendor "VMware"
product "Virtual disk"
}
}
devices {
device {
vendor "DGC"
product "VRAID"
path_grouping_policy "group_by_prio 1"
path_selector "queue-length 0"
prio alua
prio_args alua
detect_prio yes
hardware_handler "1 alua"
failback followover
dev_loss_tmo 60
}
}
~ # service multipathd start
~ # sudo multipath -v2 -d
~ # service multipathd status
multipathd.service - Device-Mapper Multipath Device Controller
Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled)
Active: active (running) since Mon 2020-06-08 04:24:31 EDT; 2 months 10 days ago
Process: 478 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS)
Main PID: 512 (multipathd)
Status: "running"
CGroup: /system.slice/multipathd.service
└─512 /sbin/multipathd -d -s
~ # cat /etc/iscsi/initiatorname.iscsi
##To Discovery iSCSI Target, try below commend
# iscsiadm -m discoverydb -p <portal ip> -t st -D
~ # iscsiadm -m discoverydb -p 10.228.44.62 -t st -D
##To Login iSCSI Taret
# iscsiadm -m node -p <spa portal ip> -T <target port iqn> -l
~ # iscsiadm -m node -p 10.228.44.62 -T iqn.1992-04.com.emc:cx.fcnch0972c2c3b.a1 -l
~ # iscsiadm -m node -p 10.228.44.63 -T iqn.1992-04.com.emc:cx.fcnch0972c2c3b.b1 -l
##To Logout iSCSI Target
# iscsiadm -m node -p <spa portal ip> -T <target port iqn> -u
~ # iscsiadm -m node -T iqn.1992-04.com.emc:cx.fcnch0972c2c3b.a1 -p 10.228.44.62 -u
##To Delete iSCSI sessions
~ # iscsiadm -m node -o delete -T iqn.1992-04.com.emc:cx.fcnch0972c2c3b.a1 --portal 10.228.44.62:3260
## Allocate LUN to Host from array side
## To discover a new LUN on a system running Multipath, try below command to rescan Disk
~ # iscsiadm -m session --rescan
or
~ # iscsiadm -m session -R
##Rescan a specific session
~ #iscsiadm -m session --sid=N --rescan
##Note: N is the specific session ID
~ # iscsiadm -m session
tcp: [2] 10.228.44.62:3260,2 iqn.1992-04.com.emc:cx.fcnch0972c2c3b.a1 (non-flash)
tcp: [3] 10.228.44.63:3260,1 iqn.1992-04.com.emc:cx.fcnch0972c2c3b.b1 (non-flash)
~ # iscsiadm -m session --sid=2 --rescan
Rescanning session [sid: 2, target: iqn.1992-04.com.emc:cx.fcnch0972c2c3b.a1, portal: 10.228.44.62,3260]
~ # iscsiadm -m session --sid=3 --rescan
Rescanning session [sid: 3, target: iqn.1992-04.com.emc:cx.fcnch0972c2c3b.b1, portal: 10.228.44.63,3260]
##Rescan using the SCSI rescan script
~ # /usr/bin/rescan-scsi-bus.sh
##List the devices and check the LUN is assigned to host properly
~ # multipath -ll
mpathe (3600601607dd30a00afaa3b5faa581574) dm-4 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 4:0:0:4 sdl 8:176 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 5:0:0:4 sdk 8:160 active ready running
mpathd (3600601607dd30a00aeaa3b5f08d56f3f) dm-3 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 5:0:0:3 sdi 8:128 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 4:0:0:3 sdj 8:144 active ready running
mpathc (3600601607dd30a00aeaa3b5fa29d0a47) dm-2 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 4:0:0:2 sdh 8:112 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 5:0:0:2 sdg 8:96 active ready running
mpathb (3600601607dd30a00adaa3b5f7d6cdd27) dm-1 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 5:0:0:1 sde 8:64 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 4:0:0:1 sdf 8:80 active ready running
##If some LUN is not listed in the `multipath -ll` output, you may check the Host ID (HUID) if it is '0' and update it to other value
##Then re-scan the iSCSI with `iscsiadm -m session -R`
~ # lsscsi
[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[0:0:1:0] disk VMware Virtual disk 1.0 /dev/sdb
[2:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0
[4:0:0:0] disk DGC LUNZ 5100 /dev/sdc
[4:0:0:1] disk DGC VRAID 5100 /dev/sdf
[4:0:0:2] disk DGC VRAID 5100 /dev/sdh
[4:0:0:3] disk DGC VRAID 5100 /dev/sdj
[4:0:0:4] disk DGC VRAID 5100 /dev/sdl
[4:0:0:5] disk DGC VRAID 5100 /dev/sdn
[5:0:0:0] disk DGC LUNZ 5100 /dev/sdd
[5:0:0:1] disk DGC VRAID 5100 /dev/sde
[5:0:0:2] disk DGC VRAID 5100 /dev/sdg
[5:0:0:3] disk DGC VRAID 5100 /dev/sdi
[5:0:0:4] disk DGC VRAID 5100 /dev/sdk
[5:0:0:5] disk DGC VRAID 5100 /dev/sdm
~ # multipath -ll
mpathe (3600601607dd30a00afaa3b5faa581574) dm-4 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 4:0:0:4 sdl 8:176 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 5:0:0:4 sdk 8:160 active ready running
mpathd (3600601607dd30a00aeaa3b5f08d56f3f) dm-3 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 5:0:0:3 sdi 8:128 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 4:0:0:3 sdj 8:144 active ready running
mpathc (3600601607dd30a00aeaa3b5fa29d0a47) dm-2 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 4:0:0:2 sdh 8:112 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 5:0:0:2 sdg 8:96 active ready running
mpathb (3600601607dd30a00adaa3b5f7d6cdd27) dm-1 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 5:0:0:1 sde 8:64 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 4:0:0:1 sdf 8:80 active ready running
mpathf (3600601607dd30a00acaa3b5fa126fa52) dm-5 DGC,VRAID
size=5.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 4:0:0:5 sdn 8:208 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
`- 5:0:0:5 sdm 8:192 active ready running
##To run I/O with multipath, use the 'dm-xxx' as the device
~ # ls -l /dev/dm*
brw-rw---- 1 root disk 254, 0 Jun 8 04:24 /dev/dm-0
brw-rw---- 1 root disk 254, 1 Aug 18 06:24 /dev/dm-1
brw-rw---- 1 root disk 254, 2 Aug 18 06:24 /dev/dm-2
brw-rw---- 1 root disk 254, 3 Aug 18 06:24 /dev/dm-3
brw-rw---- 1 root disk 254, 4 Aug 18 06:24 /dev/dm-4
brw-rw---- 1 root disk 254, 5 Aug 19 02:55 /dev/dm-5
~ # lsscsi
[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[0:0:1:0] disk VMware Virtual disk 1.0 /dev/sdb
[2:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0
[4:0:0:0] disk DGC LUNZ 5100 /dev/sdc
[4:0:0:1] disk DGC VRAID 5100 /dev/sdf
[4:0:0:2] disk DGC VRAID 5100 /dev/sdh
[4:0:0:3] disk DGC VRAID 5100 /dev/sdj
[4:0:0:4] disk DGC VRAID 5100 /dev/sdl
[5:0:0:0] disk DGC LUNZ 5100 /dev/sdd
[5:0:0:1] disk DGC VRAID 5100 /dev/sde
[5:0:0:2] disk DGC VRAID 5100 /dev/sdg
[5:0:0:3] disk DGC VRAID 5100 /dev/sdi
[5:0:0:4] disk DGC VRAID 5100 /dev/sdk
or
~ # lsscsi -w -c -l
~ # ls /dev/sd*
/dev/sda /dev/sda1 /dev/sdb /dev/sdb1 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl
~ # lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 50G 0 disk
└─sda1 8:1 0 50G 0 part /
sdb 8:16 0 300G 0 disk
└─sdb1 8:17 0 300G 0 part /user_data_disk
sdc 8:32 0 5G 0 disk
sdd 8:48 0 5G 0 disk
sde 8:64 0 5G 0 disk
└─mpathb 254:1 0 5G 0 mpath
sdf 8:80 0 5G 0 disk
└─mpathb 254:1 0 5G 0 mpath
sdg 8:96 0 5G 0 disk
└─mpathc 254:2 0 5G 0 mpath
sdh 8:112 0 5G 0 disk
└─mpathc 254:2 0 5G 0 mpath
sdi 8:128 0 5G 0 disk
└─mpathd 254:3 0 5G 0 mpath
sdj 8:144 0 5G 0 disk
└─mpathd 254:3 0 5G 0 mpath
sdk 8:160 0 5G 0 disk
└─mpathe 254:4 0 5G 0 mpath
sdl 8:176 0 5G 0 disk
└─mpathe 254:4 0 5G 0 mpath
sr0 11:0 1 1024M 0 rom
loop0 7:0 0 100G 0 loop
└─docker-8:17-272252041-pool 254:0 0 100G 0 dm
loop1 7:1 0 2G 0 loop
└─docker-8:17-272252041-pool 254:0 0 100G 0 dm
##Once the disk is seen by the OS execute `multiplath` or `multipath -v4` (verbose) to build the new MPIO map
~ # multipath -v4
##On SLES 10 execute `udevtrigger`. (On SLES11 udevadm trigger is executed automatically)
- LUNs are not seen by the driver
lsscsi
can be used to check whether the SCSI devices are seen correctly by the OS. When the LUNs are not seen by the HBA driver, check the zoning setup of the SAN. In particular, check whether LUN masking is active and whether the LUNs are correctly assigned to the server.
- LUNs are seen by the driver, but there are no corresponding block devices
When LUNs are seen by the HBA driver, but not as block devices, additional kernel parameters are needed to change the SCSI device scanning behavior, e.g. to indicate that LUNs are not numbered consecutively.