Skip to content

animeshtrivedi/storage-systems-wiki-reading-list

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 

Repository files navigation

Welcome to the VU Amsterdam - storage-systems wiki!

We will collect and grow the reading list for the Storage Systems class (https://atlarge-research.com/courses/storage-systems-vu/) at VU Amsterdam.


If you find material from the class useful, then consider citing our work as PDF:

Reviving Storage Systems Education in the 21st Century — An experience report, Animesh Trivedi, Matthijs Jansen, Krijn Doekemeijer, Sacheendra Talluri, Nick Tehrany (2024 May) In 2024 IEEE/ACM 24rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid'24), https://www.computer.org/csdl/proceedings-article/ccgrid/2024/956600a616/20ShedG8GeQ.

@INPROCEEDINGS {2024-ccgrid-stosys-experience,
author = { Trivedi, Animesh and Jansen, Matthijs and Doekemeijer, Krijn and Talluri, Sacheendra and Tehrany, Nick },
booktitle = { 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) },
title = { Reviving Storage Systems Education in the 21st Century — An experience report },
year = {2024},
volume = {},
ISSN = {},
pages = {616-625},
doi = {10.1109/CCGrid59990.2024.00074},
url = {https://doi.ieeecomputersociety.org/10.1109/CCGrid59990.2024.00074},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {May}
}

Contributions: please open a pull request with 1-2 line description of the paper!

Table of Content

  1. NVM storage, introduction, device-level details
  2. Host interfacing, OS and Storage I/O Stack
  3. SSD internals (FTL, GC, buffering, staging, scheduling)
  4. NVMe, Flash, PMEM, SSD File Systems
  5. Key-Value Storage and Caches
  6. Persistent Memories / disaggregation / CXL
  7. Networked/distributed Flash/NVMoF/Storage Disaggregation
  8. Programmable storage, acceleration, offloading, computational storage, workload-specific storage
  9. Storage Virtualization, Emulation, Simulation
  10. Flash I/O Scheduling and quality-of-service/multi-tenancy
  11. Reliability and failures studies
  12. Graphs Storage and Processing Systems
  13. Performance, Efficiency, Scalability
  14. NVM storage and Energy consumption
  15. Database, Timeseries, VectorDB, Lookup, Indexes on Storage
  16. Emerging systems architectures
  17. Emerging storage interfaces and features
  18. SNIA/NVMe weblinks
  19. Benchmarking, traces, profiling, monitoring, and characterization
  20. RAID, Compression, De-duplication
  21. ML and (Storage) Systems
  22. A selection of storage related surveys

1. NVM storage, introduction, device-level details

  • Michael Cornwell. 2012. Anatomy of a Solid-state Drive: While the ubiquitous SSD shares many features with the hard-disk drive, under the surface they are completely different. Queue 10, 10 (October 2012), 30–36. https://doi.org/10.1145/2381996.2385276
  • Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. 2015. Non-volatile Storage: Implications of the Datacenter’s Shifting Center. Queue 13, 9 (November-December 2015), 33–56. https://doi.org/10.1145/2857274.2874238
  • Ethan Miller, Achilles Benetopoulos, George Neville-Neil, Pankaj Mehra, and Daniel Bittman. 2023. Pointers in Far Memory: A rethink of how data and computations should be organized. Queue 21, 3, Pages 50 (May/June 2023), 19 pages. https://doi.org/10.1145/3606029

2. Host interfacing, OS and Storage I/O Stack

3. SSD internals (FTL, GC, buffering, staging, scheduling)

  • HMB-SSD: Framework for Efficient Exploiting of the Host Memory Buffer in the NVMe SSD, https://ieeexplore.ieee.org/abstract/document/8868151
  • Improving Flash Storage Performance by Caching Address Mapping Table in Host Memory, https://www.usenix.org/conference/hotstorage17/program/presentation/jeong
  • Fantastic SSD internals and how to learn and use them. In Proceedings of the 15th ACM International Conference on Systems and Storage (SYSTOR '22). Association for Computing Machinery, New York, NY, USA, 72–84. https://doi.org/10.1145/3534056.3534940
  • LavaStore: ByteDance’s Purpose-built, High-performance, Cost-effective Local Storage Engine for Cloud Services, https://www.vldb.org/pvldb/vol17/p3799-jiao.pdf
  • Jinghan Sun, Shaobo Li, Yunxin Sun, Chao Sun, Dejan Vucinic, and Jian Huang. 2023. LeaFTL: A Learning-Based Flash Translation Layer for Solid-State Drives. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 442–456. https://doi.org/10.1145/3575693.3575744
  • Aayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS XIV). Association for Computing Machinery, New York, NY, USA, 229–240. https://doi.org/10.1145/1508244.1508271
  • S. Jiang, Lei Zhang, XinHao Yuan, Hao Hu and Yu Chen, "S-FTL: An efficient address translation for flash memory by exploiting spatial locality," 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), Denver, CO, 2011, pp. 1-12, doi: 10.1109/MSST.2011.5937215.
  • You Zhou, Fei Wu, Ping Huang, Xubin He, Changsheng Xie, and Jian Zhou. 2015. An efficient page-level FTL to optimize address translation in flash memory. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15). Association for Computing Machinery, New York, NY, USA, Article 12, 1–16. https://doi.org/10.1145/2741948.2741949
  • LearnedFTL: A Learning-Based Page-Level FTL for Reducing Double Reads in Flash-Based SSDs, https://doi.ieeecomputersociety.org/10.1109/HPCA57654.2024.00054
  • Rino Micheloni, Alessia Marelli, and Kam Eshghi. 2012. Inside Solid State Drives (SSDs). Springer Publishing Company, Incorporated. https://dl.acm.org/doi/10.5555/2412028

4. NVMe, Flash, PMEM, SSD File Systems

5. Key-Value Storage and Caches

6. Persistent Memories / disaggregation / CXL

7. Networked/distributed Flash/NVMoF/Storage Disaggregation

8. Programmable storage, acceleration, offloading, computational storage, workload-specific storage

9. Storage Virtualization, Emulation, Simulation

10. Flash I/O Scheduling and quality-of-service/multi-tenancy

11. Reliability and failures studies

12. Graphs Storage and Processing Systems

13. Performance, Efficiency, Scalability

14. NVM storage and Energy consumption

  • Can Storage Devices be Power Adaptive? https://theanoli.github.io/files/hotstorage24-xie.pdf
  • EMPower: The Case for a Cloud Power Control Plane https://hotcarbon.org/assets/2024/pdf/hotcarbon24-final57.pdf
  • The carbon footprint of distributed cloud storage https://arxiv.org/abs/1803.06973 (2019)
  • On the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud. In Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys '24). Association for Computing Machinery, New York, NY, USA, 924–941. https://doi.org/10.1145/3627703.3650079
  • Energy Implications of IO Interface Design Choices. In Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage '23). Association for Computing Machinery, New York, NY, USA, 58–64. https://doi.org/10.1145/3599691.3603411
  • When poll is more energy efficient than interrupt. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage ’22, page 59–64, New York, NY, USA, 2022. Association for Computing Machinery. https://www.hotstorage.org/2022/slides/hotstorage22-paper44-presentation_slides.pdf
  • Ultra‐low latency ssds’ impact on overall energy efficiency. In Proceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems, HotStorage’20, USA, 2020. USENIX Association. https://www.usenix.org/system/files/hotstorage20-paper61-slides-harris.pdf
  • On the energy overhead of mobile storage systems. In Proceedings of the 12th USENIX Conference on File and Storage Technologies, FAST’14, page 105–118, USA, 2014. USENIX Association.
  • Storage on your smartphone uses more energy than you think. In Proceedings of the 9th USENIX Conference on Hot Topics in Storage and File Systems, HotStorage’17, page 9, USA, 2017. USENIX Association.
  • Hamming Tree: The Case for Energy-Aware Indexing for NVMs. In Proceedings of the ACM on Management of Data. SIGMOD'2023. ACM New York, NY, USA, 2023.
  • Treehouse: A Case For Carbon-Aware Datacenter Software. SIGENERGY Energy Inform. Rev. 3, 3 (October 2023), 64–70. https://doi.org/10.1145/3630614.3630626

15. Database, Timeseries, VectorDB, Lookup, Indexes on Storage

  • SingleStore-V: An Integrated Vector Database System in SingleStore, VLDB 2024, https://doi.org/10.14778/3685800.3685805
  • Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC is Not Enough: An Analysis of the Amazon Redshift Fleet. Proc. VLDB Endow. 17, 11 (July 2024), 3694–3706. https://doi.org/10.14778/3681954.3682031
  • Apache TsFile: An IoT-native Time Series File Format, VLDB 2024.
  • Cloud-Native Database Systems and Unikernels: Reimagining OS Abstractions for Modern Hardware. Proc. VLDB Endow. 17, 8 (April 2024), 2115–2122. https://doi.org/10.14778/3659437.3659462
  • An Empirical Evaluation of Columnar Storage Formats. Proc. VLDB Endow. 17, 2 (October 2023), 148–161. https://doi.org/10.14778/3626292.3626298
  • Are There Fundamental Limitations in Supporting Vector Data Management in Relational Databases? A Case Study of PostgreSQL. Proceedings of International Conference on Data Engineering (ICDE), 2024.
  • Vector Database Management Techniques and Systems. In Companion of the 2024 International Conference on Management of Data (SIGMOD/PODS '24). Association for Computing Machinery, New York, NY, USA, 597–604. https://doi.org/10.1145/3626246.3654691
  • Milvus: A Purpose-Built Vector Data Management System. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 2614–2627. https://doi.org/10.1145/3448016.3457550
  • Vexless: A Serverless Vector Data Management System Using Cloud Functions. Proc. ACM Manag. Data 2, 3, Article 187 (June 2024), 26 pages. https://doi.org/10.1145/3654990
  • Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines. Proc. ACM Manag. Data 2, 1, Article 47 (February 2024), 28 pages. https://doi.org/10.1145/3639302
    • We present Limousine, a self-designing key-value storage engine, that can automatically morph to the near-optimal storage engine architecture shape given a workload, a cloud budget, and target performance
  • Viktor Leis. 2024. LeanStore: A High-Performance Storage Engine for NVMe SSDs. PVLDB 17(12). https://www.vldb.org/pvldb/vol17/p4536-leis.pdf
  • What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines https://www.vldb.org/pvldb/vol16/p2090-haas.pdf
  • AirIndex: Versatile Index Tuning Through Data and Storage, https://dl.acm.org/doi/10.1145/3617308
  • Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment. Proc. ACM Manag. Data 2, 1, Article 14 (February 2024), 27 pages. https://doi.org/10.1145/3639269
  • Chenguang Fang, Zijie Chen, Shaoxu Song, Xiangdong Huang, Chen Wang, and Jianmin Wang. 2024. On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDB. Proc. VLDB Endow. 17, 11 (July 2024), 2974–2986. https://doi.org/10.14778/3681954.3681977
  • Time Series Representation for Visualization in Apache IoTDB. Proc. ACM Manag. Data 2, 1, Article 35 (February 2024), 26 pages. https://doi.org/10.1145/3639290
  • MOST: Model-Based Compression with Outlier Storage for Time Series Data, (SIGMOD 2023) https://dl.acm.org/doi/10.1145/3626737.
  • Jalal Mostafa, Sara Wehbi, Suren Chilingaryan, Andreas Kopmann. 2022. SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things. https://arxiv.org/abs/2204.09795v2
    • Peter Bailis, Camille Fournier, Joy Arulraj, and Andy Pavlo. 2016. Research for Practice: Distributed Consensus and Implications of NVM on Database Management Systems: Expert-curated Guides to the Best of CS Research. Queue 14, 3 (May-June 2016), 53–67. https://doi.org/10.1145/2956641.2967618

16. Emerging systems architectures

17. Emerging storage interfaces and features

17.1 ZNS: Explanations, research directions and ZNS extensions

17.2 Other interfaces

  • The Design and Implementation of a Capacity-Variant Storage System, Ziyang Jiao and Xiangqun Zhang, Syracuse University; Hojin Shin and Jongmoo Choi, Dankook University; Bryan S. Kim, Syracuse University, https://www.usenix.org/conference/fast24/presentation/jiao
  • LightNVM: The Linux Open-Channel SSD Subsystem, https://www.usenix.org/system/files/conference/fast17/fast17-bjorling.pdf
  • "KAML: A Flexible, High-Performance Key-Value SSD," 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, USA, 2017, pp. 373-384, doi: 10.1109/HPCA.2017.15. https://ieeexplore.ieee.org/abstract/document/7920840/
  • "Elevating Commodity Storage with the SALSA Host Translation Layer," 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Milwaukee, WI, USA, 2018, pp. 277-292, doi: 10.1109/MASCOTS.2018.00035. https://ieeexplore.ieee.org/abstract/document/8526893/
  • AutoStream: automatic stream management for multi-streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference (SYSTOR '17). Association for Computing Machinery, New York, NY, USA, Article 3, 1–11. https://doi.org/10.1145/3078468.3078469
  • Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX conference on Hot Topics in Storage and File Systems (HotStorage'14). USENIX Association, USA, 13.

17.3 FDP

  • J. Park, H. Kim, J. Ha, H. Jung and H. Eom, "FDPVirt: Flexible Data Placement SSD Emulator," 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Kobe, Japan, 2024, pp. 198-199, doi: 10.1109/CLUSTERWorkshops61563.2024.00057.

18. SNIA/NVMe weblinks

19. Benchmarking, traces, profiling, monitoring, and characterization

  • LST-Bench: Benchmarking Log-Structured Tables in the Cloud. Proc. ACM Manag. Data 2, 1, Article 59 (February 2024), 26 pages. https://doi.org/10.1145/3639314.
  • Phitchaya Mangpo Phothilimthana, Saurabh Kadekodi, Soroush Ghodrati, Selene Moon, and Martin Maas. 2024. Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS '24), Vol. 3. Association for Computing Machinery, New York, NY, USA, 1016–1032. https://doi.org/10.1145/3620666.3651337
  • Jinhong Li, Qiuping Wang, Patrick P. C. Lee, and Chao Shi. 2023. An In-depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications. ACM Trans. Storage 19, 2, Article 16 (May 2023), 32 pages. https://doi.org/10.1145/3572779
  • Gala Yadgar, MOSHE Gabel, Shehbaz Jaffer, and Bianca Schroeder. 2021. SSD-based Workload Characteristics and Their Performance Implications. ACM Trans. Storage 17, 1, Article 8 (February 2021), 26 pages. https://doi.org/10.1145/3423137
  • A. K. Paul, O. Faaland, A. Moody, E. Gonsiorowski, K. Mohror and A. R. Butt, "Understanding HPC Application I/O Behavior Using System Level Statistics," 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), Pune, India, 2020, pp. 202-211, doi: 10.1109/HiPC50609.2020.00034.
  • S. Kavalanekar, B. Worthington, Qi Zhang and V. Sharda, "Characterization of storage workload traces from production Windows Servers," 2008 IEEE International Symposium on Workload Characterization, Seattle, WA, 2008, pp. 119-128, doi: 10.1109/IISWC.2008.4636097.
  • Tirthak Patel, Suren Byna, Glenn K. Lockwood, and Devesh Tiwari. 2019. Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). Association for Computing Machinery, New York, NY, USA, Article 65, 1–13. https://doi.org/10.1145/3295500.3356183
  • Omkar Desai, Seungmin Shin, Eunji Lee, and Bryan S. Kim. 2022. A principled approach for selecting block I/O traces. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage '22). Association for Computing Machinery, New York, NY, USA, 52–58. https://doi.org/10.1145/3538643.3539754
  • Large-Scale Analysis of Docker Images and Performance Implications for Container Storage Systems. IEEE Trans. Parallel Distributed Syst. 32(4): 918-930 (2021)
  • Marc-André Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2018. Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale. ACM Trans. Storage 14, 2, Article 18 (May 2018), 24 pages. https://doi.org/10.1145/3149376
  • Yang Liu, Raghul Gunasekaran, Xiaosong Ma, and Sudharshan S. Vazhkudai. 2014. Automatic identification of application I/O signatures from noisy server-side traces. In Proceedings of the 12th USENIX conference on File and Storage Technologies (FAST'14). USENIX Association, USA, 213–228.
  • I. Ahmad, "Easy and Efficient Disk I/O Workload Characterization in VMware ESX Server," 2007 IEEE 10th International Symposium on Workload Characterization, Boston, MA, USA, 2007, pp. 149-158, doi: 10.1109/IISWC.2007.4362191.
  • Jayanta Basak, Kushal Wadhwani, and Kaladhar Voruganti. 2016. Storage Workload Identification. ACM Trans. Storage 12, 3, Article 14 (June 2016), 30 pages. https://doi.org/10.1145/2818716
  • Ajay Gulati, Chethan Kumar, and Irfan Ahmad. Storage workload characterization and consolidation in virtualized environments. In Proc. Int'l Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT'09), 2009.
  • Bin Yang, Wei Xue, Tianyu Zhang, Shichao Liu, Xiaosong Ma, Xiyang Wang, and Weiguo Liu. 2023. End-to-end I/O Monitoring on Leading Supercomputers. ACM Trans. Storage 19, 1, Article 3 (February 2023), 35 pages. https://doi.org/10.1145/3568425 (NSDI: https://www.usenix.org/conference/nsdi19/presentation/yang)
  • V. Tarasov, S. Kumar, J. Ma, D. Hildebrand, A. Povzner, G. Kuenning, and E. Zadok. 2012. Extracting flexible, replayable models from large block traces. In Proceedings of the 10th USENIX conference on File and Storage Technologies (FAST'12). USENIX Association, USA, 22. https://static.usenix.org/events/fast12/tech/full_papers/Tarasov.pdf
  • COSBench: cloud object storage benchmark. In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering (ICPE '13). Association for Computing Machinery, New York, NY, USA, 199–210. https://doi.org/10.1145/2479871.2479900

20. RAID, Compression, De-duplication

21. ML and (Storage) Systems

22. A selection of storage related surveys

23. Company specific stack and details (biased towards AI/ML)

23.1 Alibaba

* "Optimizing NVMe Storage for Large-Scale Deployment: Key Technologies and Strategies in Alibaba Cloud," in IEEE Micro, vol. 44, no. 5, pp. 47-56, Sept.-Oct. 2024, doi: 10.1109/MM.2024.3426514.

23.2 Baidu

* CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical Sections. In Proceedings of the Eighteenth European Conference on Computer Systems (EuroSys '23). Association for Computing Machinery, New York, NY, USA, 331–346. https://doi.org/10.1145/3552326.3587443

23.3 Meta

* Facebook's Tectonic Filesystem: Efficiency from Exascale, https://www.usenix.org/conference/fast21/presentation/pan 
* Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training, https://www.usenix.org/conference/atc23/presentation/zhao 
* 2024, Training LlaMa - A Storage Perspective | Sumit Gupta & Robin Battey, https://www.facebook.com/atscaleevents/videos/training-llama-a-storage-perspective-sumit-gupta-robin-battey/1146039320066497/ 
* 2024, Building Meta’s GenAI Infrastructure, https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/ 
* 2022, Scaling data ingestion for machine learning training at Meta, https://engineering.fb.com/2022/09/19/ml-applications/data-ingestion-machine-learning-training-meta/ 
* The Llama 3 Herd of Models, https://arxiv.org/pdf/2407.21783, 2024.  

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •