Skip to content

Latest commit

 

History

History
89 lines (57 loc) · 4.32 KB

File metadata and controls

89 lines (57 loc) · 4.32 KB

Chapter 01: Understanding Filesystems

Welcome, friend. If you're reading this, you probably want to know how ls works under the hood. It's one of those commands we type a hundred times a day without really thinking about it. But when you peel back the layers, you find a beautiful machine essentially unchanged since the 1970s.

We can't really build ls until we understand what we're looking at. So, before we write a single line of C, we need to talk about filesystems. And strict warning: we aren't going to memorize textbook definitions. We're going to build a mental model of how data actually lives on your disk.

The Big Lie: Filenames

Here is the first thing you need to internalize: Files do not have names.

I know, I know. You see "main.c" right there in your project folder. But the filesystem doesn't see "main.c". It sees an inode.

Think of a filesystem like a massive library. But instead of books being on shelves with titles on the spine, every book is just thrown into a giant pile in the back room, and each book has a unique serial number stamped on it. That serial number is the inode (index node).

The inode contains everything about the file:

  • Who owns it?
  • How big is it?
  • When was it last touched?
  • Where are the actual data blocks on the disk?
  • But notably missing: The name of the file.

So where does the name come from?

That's where directories come in. A directory is just a special file. It's a file that contains a list. And that list is incredibly simple. It maps names to inode numbers.

Directory Entry:
  "main.c"  ->  Inode #4251
  "Makefile" ->  Inode #9923
  ".git"    ->  Inode #1024

When you type ls, you aren't asking the computer to "show me the files." You are asking it to "read this directory file and tell me what names are in the list."

Just a Big Map

This design, separating the identity of the file (inode) from its label (filename), is one of the most brilliant decisions in Unix history.

Dennis Ritchie and Ken Thompson designed it this way for flexibility. Because of this split, we can do things like have multiple names for the same file.

Hard Links

If filenames are just labels pointing to an inode, what happens if we create two labels pointing to the same inode number?

"project_notes.txt" -> Inode #500
"todo_list.txt"     -> Inode #500

This is a hard link. Both filenames are equally valid. Neither is the "original." They are just two entry doors into the same room. If you delete "project_notes.txt", the file isn't gone. You just removed one label. The inode (and the data) persists until zero labels point to it.

This is why the rm command is actually short for unlink. It doesn't necessarily delete a file; it unlinks a name from an inode.

The Filesystem Hierarchy

So if directories are just files that list other files, and those other files can be directories too... we get a tree.

The Root (/) is just a directory with a known inode number (usually 2). Inside /, there's an entry for home -> Inode #10. Inside home, there's an entry for user -> Inode #55.

When you ask for /home/user/main.c, the kernel is hopping from map to map:

  1. Go to Inode 2 (Root)
  2. Look up "home" in its list -> Get Inode 10
  3. Go to Inode 10
  4. Look up "user" in its list -> Get Inode 55
  5. Go to Inode 55
  6. Look up "main.c" -> Get Inode 4251

And there's your file.

Why This Matters for Us

We are building ls. Our job is to act as the interpreter of this structure for the human user.

The kernel gives us tools to peek at these maps.

  • opendir: "Open this directory file for reading."
  • readdir: "Give me the next entry in the list."
  • closedir: "I'm done."

And once we have an entry (a name + an inode number), we can ask the kernel for the details of that inode using stat.

It sounds simple, and effectively it is. But the devil is in the details: handling hidden files (names starting with .), recursing into subdirectories (finding a directory entry and jumping into it), and making it readable.

"A file system is a set of named objects, called files, and a set of operations that can be performed on those objects.", The UNIX Time-Sharing System, D.M. Ritchie and K. Thompson.

They kept it simple. We should too.

In the next chapter, we'll get our hands dirty with stat, the system call that lets us interrogate inodes for their secrets.