One thing that's confusing is why git allows you to have one version of
a file in the current HEAD, a second version in the index, and possibly a
third in the working directory. Why doesn't the index just contain a copy
of the current HEAD until you commit a new one? The answer is merging,
which does all its work in the index. Neither the object database nor
the working directory let you have multiple files with the same name.
The index is really very simple. It's a series of structures, each
describing one file. There's an object ID (SHA1) of the contents,
some file metadata to detect changes (time-stamps, inode number, size,
permissions, owner, etc.), and the path name relative to the root of
the working directory. It's always stored sorted by path name, for
efficient merging.
At (almost) any time, you can take a snapshot of the index and write
it as a tree object.
The only interesting feature is that each entry has a 2-bit stage number.
Normally, this is always zero, but each path name is allowed up to three
different versions (object IDs) in the index at once. This is used to
represent an incomplete merge, and an unmerged index entry (with more
than one version) prevents committing the index to the object database.