-
Notifications
You must be signed in to change notification settings - Fork 0
getting started with storage research
Welcome to the group! Thanks for your interest in joining the research work in the storage domain.
We broadly work with high-performance, NVMe storage devices and research on how modern storage stack on them should be designed and built.
Goal: The high-level goal is to find a piece of artifact, and try to modify, enhance, and develop it. With systems programming, learning-by-doing is the only way forward. There is so much you can learn by looking at the slides, and textbooks.
Understand storage basics:
- https://pages.cs.wisc.edu/~remzi/OSTEP/ (chapters 36, 37, 39, 40, and 44)
- Modern Operating Systems, 4th Edition, https://csc-knu.github.io/sys-prog/books/Andrew%20S.%20Tanenbaum%20-%20Modern%20Operating%20Systems.pdf (chapters 4 and 5)
- The 5th edition is also out now (June 2023)
- Linux specific, https://en.wikibooks.org/wiki/The_Linux_Kernel (under progress)
Read past BSc and MSc thesis from the group on the topics of storage work: https://animeshtrivedi.github.io/team/
Read one paper and see how much of this paper can you understand:
- Performance Characterization of Modern Storage Stacks: POSIX I/O, libaio, SPDK, and io_uring, https://dl.acm.org/doi/10.1145/3578353.3589545
Do some "hello storage" coding in QEMU (This section needs more specific, step by step instructions)
- What is QEMU, a whole system emulator in which we do development work. So what when we mess up the code, it does not crashes or corrupt our machine. https://www.qemu.org/
- [setup an NVMe development environment in a VM] follow these instructions: https://qemu-project.gitlab.io/qemu/system/devices/nvme.html
- Write a simple file read and write program that can create a file, write it, and then read it. It reports how much times it take to run this setup for 100,000 times.
- You can ask from us the Storage Systems handbook (that we used in the MSc course) to get a more detailed step-by-step guide on setting NVMe devices with QEMU.
Considering answering such questions to see if you understand what is happening at the device driver level
- What is NVMe? where is NVMe code in the Linux source code (Hint: https://en.wikibooks.org/wiki/The_Linux_Kernel/Storage#Storage_drivers)
- Do you understand various parameters that you can pass to this device driver?
- Do you understand how an I/O request is prepared and completed?
- Can you write the device driver from scratch in Linux for NVMe devices? (you can do this even in the user-space using libpci)
- Can you write an infinitely fast NVMe storage device?
- ...
Learn how to interact with a block device, and how to change them
- Can you write a simple block device in memory? See Linux block Ram device driver, https://github.com/torvalds/linux/blob/master/drivers/block/brd.c.
- What are the design consideration of a block layer? What are its responsibilities?
- Can you benchmark the performance at the block layer of a storage device?
- What is an I/O scheduler, and what does it do for NVMe devices?
- Can you write a new I/O scheduler?
- Do you know how to build RAID? (See the md command in Linux, https://linux.die.net/man/4/md)
- ...
Learn how to install a file system on a block device
- Can you compile and install a file system from scratch? (see the mount command)
- Can you write your own file system?
- What is FUSE? https://www.kernel.org/doc/html/next/filesystems/fuse.html
- Can you write and compile a hello world file system in FUSE?
- Where are the source code of ext4, F2FS and XFS file systems in the kernel source code?
- ...
Learn about common storage applications
- Key-value stores: redis, memcached, RocksDB
- Databases: MySQL, DuckDB, PostgreSQL and friends
- Workload specific storage: graphs (GraphDB, neo4j)
Benchmarks:
- fio, Filebench, YCSB
- https://github.com/mlcommons/storage
What does benchmarking storage stack or applications mean
- How to calculate latencies, bandwidth, IOPS
- What is 90, 95, and 99 percentile latencies
- https://cacm.acm.org/magazines/2018/7/229031-always-measure-one-level-deeper/fulltext
Learn how to use eBPF for systems profiling: https://ebpf.io/
- See our notes on this: https://github.com/stonet-research/stonet-research.github.io/wiki/Tracing-with-BPF
- See example codes: https://github.com/stonet-research/zns-tools
- Write a "hello world" kernel module
- Setup a user application ---> your kernel module communication. You may think of using mmap, or ioctl type communication
- Add a /sys and/or /proc/ configuration parameter to control various settings in your kernel module
- Setup QEMU/kernel setup so that when the kernel crashes, you can debut it. See kdump and crash utiilties https://ubuntu.com/server/docs/kernel-crash-dump.