WinehouseLabs

Understanding how Git thinks about data is one of the most important things you can learn about it. Most developers pick up Git commands quickly, but many use it for years without understanding the mental model underneath — and that gap is exactly where confusion and mistakes come from. This topic closes that gap.

Snapshots, Not Differences

The most fundamental thing to understand about Git is how it stores data. This is where Git differs from almost every other version control system that came before it.

Most version control systems — CVS, Subversion, Perforce — store data as a list of file-based changes. They record what changed in each file over time. This is called delta-based version control: each version is represented as the original file plus a series of deltas (differences) applied on top of it.

Git does not work this way.

Instead, Git thinks of its data as a series of snapshots of a miniature filesystem. Every time you commit, Git takes a picture of what all your files look like at that exact moment and stores a reference to that snapshot. If a file hasn't changed since the last commit, Git doesn't store it again — it simply stores a link to the identical file already stored. Git's history is a stream of snapshots, not a stream of differences.

This distinction matters enormously for how Git's branching, merging, and history work — and is a large part of why Git is so much faster than its predecessors.

The Three States

This is the most important thing to understand about Git if you want everything else to make sense. Every file in a Git repository exists in one of three states:

State	What It Means
Modified	You have changed the file but have not yet told Git to include it in the next snapshot.
Staged	You have marked a modified file to go into the next commit snapshot.
Committed	The data is safely stored in your local Git database.

These three states map directly onto three distinct areas of a Git project.

The Three Sections of a Git Project

Every Git repository has three sections that correspond to the three states above:

1. The Working Directory (Working Tree)

This is the directory on your computer where you actually edit files. It is a single checkout of one version of the project — the files have been pulled out of Git's compressed database and placed on your disk for you to work with.

When you create a new file or modify an existing one, that change exists only in the working directory. Git is aware of it but has not been instructed to do anything with it yet. The file is in the modified state.

2. The Staging Area (Index)

The staging area is where you prepare the exact set of changes you want to include in your next commit. Its technical name in Git is the index, but "staging area" is the term most developers use.

When you run git add on a file, you are moving it from the working directory into the staging area. At this point the file is in the staged state — it is queued up to be included in the next snapshot.

The staging area exists so you have fine-grained control over what goes into each commit. You can modify ten files but only stage three of them, keeping the other seven for a separate commit. This allows you to create precise, meaningful commits rather than one massive dump of all your changes.

3. The Git Directory (.git)

The Git directory — the .git folder at the root of your project — is where Git stores the metadata and object database for your repository. This is the most important part of Git. It is what gets copied when you clone a repository, and it contains the complete history of the project.

When a file's snapshot has been permanently stored in the Git directory, the file is in the committed state.

How a Change Moves Through the Three Sections

The basic Git workflow follows a consistent cycle:

You modify files in your working directory.
You stage the changes you want to include — running git add moves specific changes into the staging area.
You commit — Git takes everything in the staging area and stores that snapshot permanently in the Git directory.

A file that is in the Git directory and matches what is in the working directory is unmodified. A file that has been changed since the last commit is modified. A file that has been changed and then staged is staged.

Working Directory  →  git add  →  Staging Area  →  git commit  →  Git Directory
   (modified)                       (staged)                        (committed)

This cycle repeats for every piece of work you do. Understanding it is the foundation for understanding every other Git concept.

Nearly Every Operation Is Local

Because your machine holds a complete copy of the entire repository — including its full history — most Git operations require no network connection. Browsing history, creating branches, comparing versions, making commits — all of this reads from and writes to your local .git directory.

This makes Git feel dramatically faster than version control systems that require a server round-trip for common operations. It also means you can work completely offline: on a plane, without WiFi, without a VPN. Your commits, branches, and history remain fully functional. Changes are synchronised with a remote server only when you explicitly push or pull.

Git Has Integrity

Everything stored in Git is checksummed before it is stored, and is then referred to by that checksum. The mechanism Git uses is called a SHA-1 hash — a 40-character hexadecimal string calculated from the contents of a file or directory:

24b9da6552252987aa493b52f8696cd6d3b00373

You will see these hash values throughout Git — in commit logs, in branch references, everywhere. Git stores everything in its database by hash value, not by filename. This means it is impossible to change the contents of any file or commit without Git detecting it. Data corruption or silent modification cannot go unnoticed.

Git Generally Only Adds Data

Almost every action you take in Git adds data to the Git database. It is difficult to make Git do anything that is not undoable or that erases data. Once you commit a snapshot, it is extremely difficult to lose, especially if you regularly push to a remote repository.

This makes Git safe to experiment with. You can try things, break things, create experimental branches, and discard them — knowing that your committed history is intact and recoverable. The fear of "breaking something" that holds many developers back from using Git confidently is largely unfounded once you understand this property.

What the `.git` Directory Contains

When you initialise a Git repository with git init, Git creates a .git directory. This hidden folder is the entire repository. If you deleted it, you would lose all version history — the working files would remain, but Git would have no memory of any of them.

Inside .git, Git stores:

Contents	What It Is
`objects/`	The object database — all commits, file contents, and trees stored as compressed objects.
`refs/`	References to commits — branches, tags, and remote tracking branches.
`HEAD`	A pointer to the branch you are currently on.
`index`	The staging area — the list of changes queued for the next commit.
`config`	Repository-specific configuration settings.

You will rarely need to interact with .git directly. But knowing it exists and what it contains demystifies a lot of Git's behaviour — particularly why certain commands are instant (they read local data) and why cloning creates a full, independent copy of a project.

Summary

Concept	What to Remember
Snapshots	Git stores the full state of all files at each commit, not just the differences.
Three states	Every file is either modified, staged, or committed.
Three sections	The working directory, staging area, and Git directory each correspond to one state.
Local operations	Most Git operations work entirely from your local `.git` directory — no network required.
Integrity	Every object is identified by a SHA-1 hash. Nothing can change without Git knowing.
Additive	Git almost never deletes data. Committed snapshots are safe and recoverable.

How Git works