Skip to main content

Home/ learning-git/ Group items tagged learning

Rss Feed Group items tagged

Daniel Jomphe

git Archives: as promised, docs: git for the confused - 0 views

  • 2005-12-08
  • A good rule of thumb is that the commands with one-word names (git-diff, git-commit, git-merge, git-push, git-pull, git-status, git-tag, etc.) are designed for end-user use. Multi-word names (git-count-objects, git-write-tree, git-cat-file) are generally designed for use from a script.
  • A good rule of thumb is that the commands with one-word names (git-diff, git-commit, git-merge, git-push, git-pull, git-status, git-tag, etc.) are designed for end-user use. Multi-word names (git-count-objects, git-write-tree, git-cat-file) are generally designed for use from a script. This isn't ironclad. The first command to start using git is git-init-db, and git-show-branch is pure porcelain, while git-mktag is a primitive. And you don't often run git-daemon by hand. But still, it's a useful guideline.
  • ...115 more annotations...
  • One thing that's confusing is why git allows you to have one version of a file in the current HEAD, a second version in the index, and possibly a third in the working directory. Why doesn't the index just contain a copy of the current HEAD until you commit a new one? The answer is merging, which does all its work in the index. Neither the object database nor the working directory let you have multiple files with the same name. The index is really very simple. It's a series of structures, each describing one file. There's an object ID (SHA1) of the contents, some file metadata to detect changes (time-stamps, inode number, size, permissions, owner, etc.), and the path name relative to the root of the working directory. It's always stored sorted by path name, for efficient merging. At (almost) any time, you can take a snapshot of the index and write it as a tree object. The only interesting feature is that each entry has a 2-bit stage number. Normally, this is always zero, but each path name is allowed up to three different versions (object IDs) in the index at once. This is used to represent an incomplete merge, and an unmerged index entry (with more than one version) prevents committing the index to the object database.
  • The most common object needed by git primitives is a tree. Since a commit points to a tree and a tag points to a commit, both of these are acceptable "tree-ish" objects and can be used interchangeably. Likewise, a tag is "commit-ish" and can be used where a commit is required. As soon as you get to the porcelain, the most commonly used object is a commit. Also known as a revision, this is a tree plus a history.
  • A "head" is mostly synonymous with a "branch", but the terms have different emphasis. The "head" is particularly the tip of the branch, where future development will be appended. A "branch" is the entire development history leading to the head. However, as far as git is concerned, they're both references to commit objects, referred to from refs/heads/.
  • When you specify the name of a reference, it is searched for in one of the directories: .git/ (or $GIT_DIR) .git/refs/ (or $GIT_DIR/refs/) .git/refs/heads/ (or $GIT_DIR/refs/heads/) .git/refs/tags/ (or $GIT_DIR/refs/tags/) You may use subdirectories by including slashes in the reference name. There is no search order; if searching the above four path prefixes produces more than one match for the reference name (it's ambiguous), then the name is not valid.
  • (commit^0 gives the commit object itself. A no-op if you're starting from a commit, but it lets you get the commit object from a tag object.)
  • although the most primitive git tools don't care, a convention among all the porcelain is that the current head of development is .git/HEAD, a symbolic link to a reference under refs/heads/. git-init-db creates HEAD pointing to refs/heads/master, and that is traditionally the name used for the "main trunk" of development. Note that initially refs/heads/master doesn't exist - HEAD is a dangling symlink! This is okay, and will cause the initial commit to have zero parents.
  • While you can always use the full object ID, you can also use a reference. A reference is a file that contains a 40-character hex SHA1 object ID (and a trailing newline).
  • When you actually do more (with git-commit, or git-merge), then the current HEAD reference is overwritten with the new commit's id, and the old HEAD becomes HEAD^. Since HEAD is a symlink, it's the file in refs/heads/ that's actually overwritten. (See the git-update-ref documentation for further details.)
  • The git-checkout command actually changes the HEAD symlink. git-checkout enforces the rule that it will only check out a branch under refs/heads. You can use refs/tags as a source for git-diff or any other command that only examines the revision, but if you want to check it out, you have to copy it to refs/heads.
  • As mentioned, the index and the working directory versions of a file could both be different from the HEAD. Git lets you merge "under" your current working directory edits, as long as the merge doesn't change the files you're editing.
  • git-reset --soft: Only overwrite the reference. If you can find the old object ID, you can put everything back with a second git-reset --soft OLD_HEAD. git-reset --mixed: This is the default, which I always think of as "--medium". Overwrite the reference, and (using git-read-tree) read the commit into the index. The working directory is unchanged. git-reset --hard: Do everything --mixed does, and also check out the index into the working directory. This really erases all traces of the previous version. (One caveat: this will not delete any files in the working directory that were added as part of the changes being undone.)
  • git-reset with no commit specified is "git-reset HEAD", which is much safer because the object reference is not actually changed. This can be used to undo changes in the index or working directory that you did not intend.
  • Like being sure what directory you're in when typing "rm -r", think carefully about what branch you're on when typing "git-reset <commit>".
  • There is an undelete: git-reset stores the previous HEAD commit in OLD_HEAD. And git-lost-found can find leftover commits until you do a git-prune.
  • Merging is central to git operations. Indeed, a big difference between git and other version control systems is that git assumes that a change will be merged more often than it's written, as it's passed around different developers' repositories. Even "git checkout" is a merge.
  • The "undo" command for commits to the object database is git-reset. Like all deletion-type commands, be careful or you'll hurt yourself. Given a commit (using any of the syntaxes mentioned above), this sets the current HEAD to refer to the given commit. This does NOT alter the HEAD symlink (as git-checkout <branch> will do), but actually changes the reference pointed to by HEAD (e.g. refs/heads/master) to contain a new object ID. The classic example is to undo an erroneous commit, use "git-reset HEAD^".
  • the procedure for the general 3-way merge case: merging branch B into branch A (the current HEAD).
  • If the merge is possible and safe, the versions are collapsed into one final result version.
  • 2) Add all three input trees (the Origin, A, and B) to the index by "git-read-tree -m O A B". The index now contains up to three copies of every file.
  • Then, for each file in the index, git-read-tree does the following:
  • 2a) For each file, git-merge-tree tries to collapse the various versions into one using the "trivial in-index merge". This just uses the file blob object names to see if the file contents are identical, and if two or more of the three trees contain an identical copy of this file, it is merged. A missing (deleted) file matches another missing file. Note that this is NOT a majority vote. If A and B agree on the contents of the file, that's what is used. (Whether O agrees is irrelevant in this case.) But if O and A agree, then the change made in B is taken as the final value. Likewise, if O and B agree, then A is used.
  • 2b) If this is possible, then a check is made to see if the merge would conflict with any uncommitted work in the index or change the index out from under a modified working directory file. If either of those cases happen, the entire merge is backed out and fails.
  • 1) Given two commits, find a common ancestor O to server as the origin of the merge.
  • 4) Check out all the successfully merged files into the working directory.
  • 3) Use git-merge-index to iterate over the remaining unmerged files, and apply an intra-file merge. The intra-file merge is usually done with git-merge-one-file, which does a standard RCS-style three-way merge (see "man merge").
  • 2c) If all three versions differ, the trivial in-index merge is not possible, and the three source versions are left in the index unmerged. Again, if there was uncommitted work in the index or the working directory, the entire merge fails.
  • 5) If automatic merging was successful on every file, commit the merged version immediately and stop.
  • 6) If automatic merging was not complete, then replace the working directory copies of any remaining unmerged files with a merged copy with conflict markers (again, just like RCS or CVS) in the working directory. All three source versions are available in the index for diffing against.
  • A "2-way merge" is basically a 3-way merge with the contents of the index as the "current HEAD", and the original HEAD as the Origin. However, this merge is designed only for simple cases and only supports the "trivial merge" cases. It does not fall back to an intra-file merge.
  • 8) Commit the final merged version of the conflicting file(s), replacing the unmerged versions with the single finished version.
  • Note that if the merge is simple, with no one file edited on both branches, git never has to open a single file. It reads three tree objects (recursively) and stat(2)s some working directory files to verify that they haven't changed. Also note that this aborts and backs out rather than overwrite anything not committed. You can merge "under" uncommitted edits only if those edits are to files not affected by the merge.
  • A major source of git's speed is that it tries to avoid accessing files unnecessarily. In particular, files can be compared based on their object IDs without needing to open and read them. As part of this, the responsibility for finding file differences (printing diffs) is divided into finding what files have changed, and finding the changes within those files. This is all explained in the Documentation/diffcore.txt in the git distribution, but the basics is that many of the primitives spit out a line like this: :100755 100755 68838f3fad1d22ab4f14977434e9ce73365fb304 0000000000000000000000000000000000000000 M git-bisect.sh when asked for a diff. This is known as a "raw diff". They can be told to generate a human-readable diff with the "-p" (patch) flag. The git-diff command includes this by default.
  • This merge is used by git-checkout to switch between two branches, while preserving any changes in the working directory and index. Like the 3-way case, if a particular file hasn't changed between the two heads, then git will preserve any uncommitted edits. If the file has changed in any way, git doesn't try to perform any sort of intra-file merge, it just fails.
  • 1-way merging This is not actually used by the git-core porcelain, and so is only useful to someone writing more porcelain, but I'll describe it for completeness. Plain (non-merging) git-read-tree will overwrite the index entries with those from the tree. This invalidates the cached stat data, causing git to think all the working directory files are "potentially changed" until you do a git-update-index --refresh. By specifying a 1-way merge, any index entry whose contents (object ID) matches the incoming tree will have its cached stat data preserved. Thus, git will know if the working directory file is not changed, and will not overwrite if you execute git-checkout-index. This is purely an efficiency hack.
  • There are two cases of 3-way or 2-way merging that are special. Recall that the basic merge pattern is B--------> A+B / / / / O -----> A The two special cases arise if one of A or B is a direct ancestor of the other. In that case, the common ancestor of both A and B is the older of the two commits. And the merged result is simply the newer of the two, unchanged.
  • Recalling that we are merging B into A, if B is a direct ancestor of A, then A already includes all of B. A is "already up to date" and not changed at all by the merge. The other case you'll hear mentioned, because it happens a lot when pulling, is when A is a direct ancestor of B. In this case, the result of the merge is a "fast-forward" to B. Both of these cases are handled very efficiently by the in-index merge done by git-read-tree.
  • A pack is a file, built all at once, which contains many delta-compressed objects. With each .pack file, there's an accompanying .idx file that indexes the pack so that individual objects can be retrieved quickly. You can reduce the disk space used by your repositories by periodically repacking them with git-repack. Normally, this makes a new incremental pack of everything not already packed. With the -a flag, this repacks everything for even greater compression (but takes longer).
  • The git wire protocol basically consists of negotiation over what objects needs to be transferred followed by sending a custom-built pack. The .idx file can be reconstructed from the .pack file, so it's never transferred.
  • 7) Manually edit the conflicts and resolve the merge. As long as an unmerged, multi-version file exists in the index, committing the index is forbidden.
  • * Advice on using git
  • If you're used to CVS, where branches and merges are "advanced" features that you can go a long time, you need to learn to use branches in git a lot more.
  • Branch early and often. Every time you think about developing a feature or fixing a bug, create a branch to do it on.
  • In fact, avoid doing any development on the master branch. Merges only.
  • A branch is the git equivalent of a patch, and merging a branch is the equivalent of applying that patch. A branch gives it a name that you can use to refer to it. This is particularly useful if you're sharing your changes.
  • Once you're done with a branch, you can delete it. This is basically just removing the refs/heads/<branch> file, but "git-branch -d" adds a few extra safety checks. Assuming you merged the branch in, you can still find all the commits in the history, it's just the name that's been deleted.
  • Periodically merge all of the branches you're working on into a testing branch to see if everything works. Blow away and re-create the testing branch whenever you do this. When you like the result, merge them into the master.
  • * The .git directory
  • index - The actual index file. objects/ - The object database. Can be overridden by $GIT_OBJECT_DIRECTORY hooks/ - where the hook scripts are kept. The standard git template includes examples, but disabled by being marked non-executable. info/exclude - Default project-wide list of file patterns to exclude from notice. To this is added the per-directory list in .gitignore. See the git-ls-files docs for full details. refs/ - References to development heads (branches) and tags. remotes/ - Short names of remote repositories we pull from or push to. Details are in the "git-fetch" man page.
  • git-lost-found.sh Find (using git-fsck-objects) any unreferenced commits and tags in the object database, and place them in a .git/lost-found directory. This can be used to recover from accidentally deleting a tag or branch reference that you wanted to keep. This is the opposite of git-prune.
  • HEAD - The current default development head. - Created by git-init-db and never deleted - Changed by git-checkout - Used by git-commit and any other command that commits changes. - May be a dangling pointer, in which case git-commit does an "initial checkin" with no parent. COMMIT_EDITMSG - Temp used by git-commit to edit a commit message. COMMIT_MSG - Temp used by git-commit to form a commit message, post-processed from COMMIT_EDITMSG. FETCH_HEAD - Just-fetched commits, to be merged into the local trunk. - Created by git-fetch. - Used by git-pull as the source of data to merge.
  • ORIG_HEAD - Previous HEAD commit prior to a merge or reset operation. LAST_MERGE - Set by the "resolve" strategy to the most recently merged-in branch. Basically, a copy of MERGE_HEAD. Not used by the other merge strategies, and resolve is no longer the default, so its utility is very limited. BISECT_LOG - History of a git-bisect operation. - Can be replayed (or, more usefully, a prefix can) by "git-bisect replay" BISECT_NAMES - The list of files to be modified by git-bisect. - Set by "git-bisect start" TMP_HEAD (used by git-fetch) TMP_ALT (used by git-fetch)
  • * Git command summary There are slightly over a hundred git commands. This section tries to classify them by purpose, so you can know which commands are intended to be used for what. You can always use the low-level plumbing directly, but that's inconvenient and error-prone.
  • I include ".sh", ".perl", etc. suffixes to show what the programs are written in, so you can read those scripts written in languages you're familiar with. These are the names in the git source tree, but the suffix is not included in the /usr/bin copies.
  • * Detailed list Here's a repeat, including descriptions. I don't try to include every detail you can find on the man page, but to explain when you'd want to use a command.
  • + Administrative commands git-init-db This creates an empty git repository in ./.git (or $GIT_DIR if that is non-null) using a system-wide template. It won't hurt an existing repository.
  • git-fsck-objects Validate the object database. Checks that all references point somewhere, all the SHA1 hashes are correct, and that sort of thing. This walks the entire repository, uncompressing and hashing every object, so it takes a while. Note that by default, it skips over packs, which can make it seem misleadingly fast.
  • MERGE_HEAD - Keeps track of what heads are currently being merged into HEAD. - Created by git-merge --no-commit with the heads used - Deleted by git-checkout and git-reset (since you're abandoning the merge) - Used by git-commit to supply additional parents to the current commit. (And deleted when done.) MERGE_MSG - Generated by git-merge --no-commit. - Used by git-commit as the commit message for a merge (If present, git-commit doesn't prompt.) MERGE_SAVE - cpio archive of all locally modified files created by "git-merge" before starting to do anything, if multiple merge strategies are being attempted. Used to rewind the tree in case a merge fails.
  • git-prune.sh Delete all unreachable objects from the object database. It deletes useless packs, but does not remove useless objects from the middle of partially useful packs. Git leaks objects in a number of cases, such as unsuccessful merges. The leak rate is generally a small fraction of the rate at which the desired history grows, so it's not very alarming
  • + Pack maintenance The classic git format is to compress and store each object separately. This is still used for all newly created changes. However, objects can also be stored en masse in "packs" which contain many objects and tan take advantage of delta-compressing. Repacking your repositories periodically can save space. (Repacking is pretty quick but not quick enough to be comfortable doing every commit.)
  • git-count-objects.sh Print the number and total size of unpacked objects in the repository, to help you decide when is a good time to repack.
  • git-repack.sh Make a new pack with all the unpacked objects. With -a, include already-packed objects in the new pack. With -d as well, deletes all the old packs thereby made redundant.
  • + Important primitives Although these primitives are not used directly very frequently, understanding them will help you understand other git commands that wrap them.
  • git-commit-tree Create a new commit object from a tree and a list of parent commits. This is the primitive that's the heart of git-commit. (It's also used by git-am, git-applypatch, git-merge, etc.)
  • git-rev-parse This is a very widely used command line canonicalizer for git scripts. It converts relative commit references (e.g. master~3) to absolute SHA1 hashes, and can also pass through arguments not recognizable as references, so the script can interpret them. It is important because it defines the <rev> syntax.
  • + Useful primitives These primitives are potentially useful directly.
  • git-ls-files List files in the index and/or working directory. A variety of options control which files to list, based on whether they are the same in both places or have been modified. This command is the start of most check-in scripts.
  • + General script helpers (used only by scripts) These are almost exclusively helpers for use in porcelain scripts and have little use by themselves from the command line.
  • git-cat-file Extract a file from the object database. You can ask for an object's type or size given only an object ID, but to get its contents, you have to specify the type. This is a deliberate safety measure.
  • git-hash-object Very primitive helper to turn an arbitrary file into an object, returning just the ID or actually adding it to the database. Used by the cvs-to-git and svn-to-git import filters.
  • git-ls-tree List the contents of a tree object. Will tell you all the files in a commit. Used by the checkout scripts git-checkout and git-reset.
  • git-symbolic-ref This queries or creates symlinks to references such as HEAD. Basically equivalent to readlink(1) or ln -s, this also works on platforms that don't have symlinks. See the man page.
  • + Code browsing
  • git-diff.sh Show changes between various trees. Takes up to two tree specifications, and shows the difference between the versions. Zero arguments: index vs. working directory (git-diff-files) One: tree vs. working directory (git-diff-index) One, --cached: tree vs. index (git-diff-index) Two: tree vs. tree (git-diff-tree) This wrapper always produces human-readable patch output. The helpers all produce "diff-raw" format unless you supply the -p option. There are some interesting options. Unfortunately, the git-diff man page is annoyingly sparse, and refers to the helper scripts' documentation rather than describing the many useful options they all have in common. Please do read the man pages of the helpers to see what's available. In particular, although git does not explicitly record file renames, it has some pretty good heuristics to notice things. -M tries to detect renamed files by matching up deleted files with similar newly created files. -C tries to detect copies as well. By default, -C only looks among the modified files for the copy source. For common cases like splitting a file in two, this works well. The --find-copies-harder searches ALL files in the tree for the copy source. This can be slow on large trees! See Documentation/diffcore.txt for an explanation of how all this works.
  • git-diff-tree Compare two trees. This is the git equivalent of the two-operand form of "cvs diff". This command is sometimes useful by itself to see the changes made by a single commit. If you give it only one commit on the command line, it shows the diff between that commit and its first parent. If the commit specification is long and awkward to type, using "git-diff-tree -p <commit>" can be easier than "git-diff <commit>^ <commit>".
  • git-grep.sh A very simple wrapper that runs git-ls-files and greps the output looking for a file name. Does nothing fancy except saves typing.
  • git-log.sh Wrapper around git-rev-list --pretty. Shows a history of changes made to the repository. Takes all of git-rev-list's options for specifying which revisions to list.
  • git-shortlog.perl This is a filter for the output of "git-log --pretty=short" to generate a one-line-per-change "shortlog" as Linus likes.
  • git-show-branch Visually show the merge history of the references given as arguments. Prints one column per reference and one line per commit showing whether that commit is an ancestor of each reference.
  • git-whatchanged.sh A simple wrapper around git-rev-list and git-diff-tree, this shows the change history of a repository. Specify a directory or file on the command line to limit the output to changes affecting those files. This isn't the same as "cvs annotate", but it serves a similar purpose among git folks. You can add the -p option to include patches as well as log comments. You can also add the -M or -C option to follow history back through file renames. -S is interesting: it's the "pickaxe" option. Given a string, this limits the output to changes that make that string appear or disappear. This is for "digging through history" to see when a piece of code was introduced. The string may (and often does) contain embedded newlines. See Documentation/cvs-migration.txt.
  • + Making local changes All of these are examples of "porcelain" scripts. Reading the scripts themselves can be informative; they're generally not too confusing.
  • git-add.sh A simple wrapper around "git-ls-files | git-update-index --add" to add new files to the index. You may specify directories. You need to invoke this for every new file you want git to track.
  • git-bisect.sh Utility to do a binary search to find the change that broke something. The heart of this is in "git-rev-list --bisect" A very handy little utility! Kernel developers love it when you tell them exactly which patch broke something.
  • There are three steps: git-bisect start [<files>] - Reset to start bisecting. If any files are specified, only they will be checked out as bisection proceeds. git-bisect good [<revision>] - Record the revision as "good". The change being sought must be after this revision. git-bisect bad [<revision>] - Record the revision as "bad". The change being sought must be before or equal to this revision. As soon as you have specified one good version and one bad version, git-bisect will find a halfway point and check out that revision. Build and test it, then report it as good or bad, and git-bisect will narrow the search. Finally, git-bisect will tell you exactly which change caused the problem. git-bisect log - Show a history of revisions. git-bisect replay - Replay (part of) a git-bisect log. Generally used to recover from a mistake, you can truncate the log before the mistake and replay it to continue. If git-bisect chooses a version that cannot build, or you are otherwise unable to determine whether it is good or bad, you can change revisions with "git-reset --hard <revision>" to another checkout between the current good and bad limits, and continue from there. "git-reset --hard <revision>" is generally dangerous, but you are on a scratch branch. This can, of course, be used to look for any change, even one for the better, if you can avoid being confused by the terms "good" and "bad".
  • git-branch.sh Most commonly used bare, to show the available branches. Show, create, or delete a branch. The current branches are simply the contents of .git/refs/heads/. Note that this does NOT switch to the created branch! For the common case of creating a branch and immediately switching to it, "git-checkout -b <branch>" is simpler.
  • git-checkout.sh This does two superficially similar but very different things depending on whether any files or paths are specified on the command line. git-checkout [-f] [-b <new-branch>] <branch> This switches (changes the HEAD symlink to) the specified branch, updating the index and working directory to reflect the change. This preserves changes in the working directory unless -f is specified. If -b is specified, a new branch is started from the specified point and switched to. If <branch> is omitted, it defaults to HEAD. This is the usual way to start a new branch. git-checkout [<branch>] [--] <paths>... This replaces the files specified by the given paths with the versions from the index or the specified branch. It does NOT affect the HEAD symlink, just replaces the specified paths. This form is like a selective form of "git-reset". Normally, this can guess whether the first argument is a branch name or a path, but you can use "--" to force the latter interpretation. With no branch, this is used to revert a botched edit of a particular file. Both forms use git-read-tree internally, but the net effect is quite different.
  • git-commit.sh Commit changes to the revision history. In terms of primitives, this does three things: 1) Updates the index file with the working directory files specified on the command line, or -a for all (using git-diff-files --name-only | git-update_index), 2) Prompts for or generates a commit message, and then 3) Creates a commit object with the current index contents. This also executes the pre-commit, commit-msg, and post-commit hooks if present. This will remove deleted files from the index, but will not add new files to the index, even if explicitly specified on the command line; you must use git-add for that.
  • git-reset.sh Explained in detail in "resetting", above. This modifies the current branch head reference (as pointed to by .git/HEAD) to refer to the given commit. It does not modify .git/HEAD Reset the current HEAD to the specified commit, so that future checkins will be relative to it. There are three variations: --soft: Just move the HEAD link. The index is unchanged. --mixed (default): Move the HEAD link and update the index file. Any local changes will appear not checked in. --hard: Move the HEAD links, update the index file, and check out the index, overwriting the working directory. Like "cvs update -C". In case of accidents, this copies the previous head object ID to ORIG_HEAD (which is NOT a symlink).
  • git-status.sh Show all files in the directory not current with respect to the git HEAD. The basic categories are: 1) Changed in the index, will be included in the next commit. 2) Changed in the working directory but NOT in the index; will be committed only if added via git-update-index or the git-commit command line. 3) Not tracked by git.
  • + Cherry-picking Cherry-picking is the process of taking part of the changes introduced on one tree and applying those changes to another. This doesn't produce a parent/descendant relationship in the commit history. To produce that relationship, there's a special type of merge you can do if you've taken everything you want off a branch and want to show it in the merge history without actually importing any changes from it: ours. "git-merge -s ours" will generate a commit that shows some branches were merged in, but not actually alter the current HEAD source code in any way. One thing cherry-picking is sometimes used for is taking a development branch and re-organizing the changes into a patch series for submission to the Linux kernel.
  • git-cherry.sh This searches a branch for patches which have not been applied to another. Basically, it finds the unpicked cherries. It searches back to the common ancestor of the named branch and the current head using git-patch-id to identify similarity in patches.
  • git-cherry-pick.sh Given a commit (on a different branch), compute a diff between it and its immediate parent, and apply it to the current HEAD. This is actually the same script as "git revert", but works forward. git-cherry finds the patches, this merges them. Handles failures gracefully.
  • git-rebase.sh Move a branch to a more recent "base" release. This just extracts all the patches applied on the local head since the last merge with upstream (using git-format-patch) and re-applies them relative to the current upstream with git-am (explained under "accepting changes by e-mail"). Finally, it deletes the old branch and gives its name to the new one, so your branch now contains all the same changes, but relative to a different base. Basically the same as cherry-picking an entire branch.
  • git-revert.sh Undo a commit. Basically "patch -R" followed by a commit. This is actually the same script as "git-cherry-pick", just applies the patch in reverse, undoing a change that you don't wish to back up to using git-reset. Handles failures gracefully by telling the user what to do.
  • + Accepting changes by e-mail
  • git-apply Apply a (git-style extended) patch to the current index and working directory.
  • git-am.sh The new and improved "apply an mbox" script. Takes an mbox-style concatenation of e-mails as input and batch-applies them, generating one commit per message. Can resume after stopping on a patch problem. (Invoke it as "git-am --skip" or "git-am --resolved" to deal with the problematic patch and continue.)
  • + Publishing changes by e-mail
  • git-format-patch.sh Generate a series of patches, in the preferred Linux kernel (Documentation/SubmittingPatches) format, for posting to lkml or the like. This formats every commit on a branch as a separate patch.
  • git-send-email.perl Actually e-mail the output of git-format-patch.
  • + Merging
  • git-merge.sh Merge one or more "remote" heads into the current head. Some changes, when there has been change only on one branch or the same change has been made to all branches, can be resolved by the "trivial in-index" merge done by git-read-tree. For more complex cases, git provides a number of different merge strategies (with reasonable defaults). Note that merges are done on a filename basis. While git tries to detect renames when generating diffs, most merge strategies don't track them by renaming. (The "recursive" strategy, which recently became the default, is a notable exception.)
  • git-merge-ours.sh A "dummy" merge strategy helper. Claims that we did the merge, but actually takes the current tree unmodified. This is used to cleanly terminate side branches that heve been cherry-picked in.
  • + Making releases
  • git-tag.sh Create a tag in the refs/tags directory. There are two kinds: "lightweight tags" are just references to commits. More serious tags are GPG-signed tag objects, and people receiving the git tree can verify that it is the version that you released.
  • + Accepting changes by network Pulling consists of two steps: retrieving the remote commit objects and everything they point to (including ancestors), then merging that into the desired tree. There are still separate fetch and merge commands, but it's more commonly done with a single "git-pull" command. git-fetch leaves the commit objects, one per line, in .git/FETCH_HEAD. git-merge will merge those in if that file exists when it is run. References to remote repositories can be made with long URLs, or with files in the .git/remotes/ directory. The latter also specifies the local branches to merge the fetched data into, making it very easy to track a remote repository.
  • git-clone.sh Create a new local clone of a remote repository. (Can do a couple of space-sharing hacks when "remote" is on a local machine.) You only do this once
  • git-fetch.sh Fetch the named refs and all linked objects from a remote repository. The resultant refs (tags and commits) are stored in .git/FETCH_HEAD, which is used by a later git-resolve or git octopus. This is the first half of a "git pull" operation.
  • git-ls-remote.sh Show the contents of the refs/heads/ and/or refs/tags/ directories of a remote repository. Useful to see what's available.
  • git-pull.sh Fetches specific commits from the given remote repository, and merges everything into the current branch. If a remote commit is named as src:dst, this merges the remote head "src" into the branch "dst" as well as the trunk. Typically, the "dst" branch is not modified locally, but is kept as a pristine copy of the remote branch. One very standard example of this contention is that a repository that is tracking another specifies "master:origin" to provide a pristine local copy of the remote "master" branch in the local branch named "origin".
  • git-shell A shell that can be used for git-only users. Allows git push (git-receive-pack) and pull (git-upload-pack) only.
  • + Publishing changes by network
  • git-daemon A daemon that serves up the git native protocol so anonymous clients can fetch data. For it to allow export of a directory, the magic file name "git-daemon-export-ok" must exist in it. This does not accept (receive) data under any circumstances.
  • git-push.sh Git-pull, only backwards. Send local changes to a remote repository. The same .git/remotes/ short-cuts can be used, and the same src:dst syntax. (But this time, the src is local and the dst is remote.)
  • git-request-pull.sh Generate an e-mail summarizing the changes between two commits, and request that the recipient pull them from your repository. Just a little helper to generate a consistent and informative format.
  • git-update-server-info To run git over http, auxiliary info files are required that describes what objects are in the repository (since git-upload-pack can't generate this on the fly). If you want to publish a repository via http, run this after every commit. (Typically via the hooks/post-update script.)
Daniel Jomphe

An introduction to git-svn for Subversion/SVK users and deserters - 0 views

  • This article is aimed at people who want to contribute to projects which are using Subversion as their code-wiki
  • Subversion users can skip SVK and move straight onto git-svn with this tutorial.
  • People who are responsible for Subversion servers and are converting them to git in order to lay them down to die are advised to consider the one-off git-svnimport, which is useful for bespoke conversions where you don't necessarily want to leave SVN/CVS/etc breadcrumbs behind. I'll mention bespoke conversions at the end of the tutorial, and the sort of thing that you end up doing with them.
  • ...75 more annotations...
  • A lot of this tutorial is dedicated to advocacy, sadly necessary. Those who would rather just cut to the chase will probably want to skip straight to
  • Yes, it's a bunch of small programs that do one thing and do it well, get over it, they're being unified
  • we've got a simple and efficient filesystem which competes with RevML but is XML free
  • Subversion added nothing to CVS' development model.
  • Another way of looking at it is to say that it's really a content- addressable filesystem, used to track directory trees.
  • Writing a tool to do something that you want is often quite a simple matter of plugging together a few core commands. It's simple enough that once a few basic concepts are there, you begin to feel comfortable knowing that the repository just can't wedge, changes can be discarded yet not lost unless you request them to be cleaned up, etc.
  • I used to push strongly for SVK, but got brow-beaten by people who were getting far more out of their version control system than I knew possible until I saw what they were talking about.
  • SVK could easily use git as a backing filesystem and drop the dependency on Subversion altogether. So could bzr or hg.
  • The repository model (see right) is also simple enough that there are complete git re-implementations you can draw upon, in a variety of languages.
  • git is first and foremost a toolkit for writing VCS systems
  • You might use a continual integration server that is responsible for promoting branches to trunk should they pass the strictures that you set.
  • I really haven't seen a nicer tool than gitk for browsing a repository.
  • gitk does some really cool things but is most useful when looking at projects that have cottoned onto feature branches (see feature branches, below). If you're looking at a project where everyone commits largely unrelated changes to one branch it just ends up a straight line, and not very interesting.
  • You can easily publish your changes for others who are switched on to git to pull. At a stretch, you can just throw the .git directory on an HTTP server somewhere and publish the path.
  • There's the git-daemon for more efficient serving of repositories (at least, in terms of network use), and gitweb.cgi to provide a visualisation of a git repository.
  • With Subversion, everyone has to commit their changes back to the central wiki, I mean repository, to share them.
  • With Git (actually this is completely true for other distributed systems), it's trivial to push and pull changes between each other. If what you're pulling has common history then git will just pull the differences.
  • If the person publishes their repository as described above, using the git-daemon(1), http or anything else that you can get your kernel to map to its VFS, then you can set it up as a "remote" and pull from it
  • Most people say "but I don't want branches". But users of darcs report that they didn't know how much they really did want branches, but never knew until darcs made it so easy. In essence every change can behave as a branch, and this isn't painful.
  • Because you can easily separate your repositories into stable branches, temporary branches, etc, then you can easily set up programs that only let commits through if they meet criteria of your choosing.
  • There's also a pure Java
  • Some repositories, for instance the Linux kernel, run a policy of no commit may break the build. What this means is that if you have a problem, you can use bisection to work out which patch introduced the bug.
  • Because you can readily work on branches without affecting the stable branch, it is perfectly acceptable for a stable branch to be updated by a single maintainer only
  • There is an awful lot less to keep in your head, and you don't have to do things like plan branching in advance.
  • Good feature branches mean you end up prototyping well-developed changes; the emphasis shifts away from making atomic commits. If you forgot to add a file, or made some other little mistake, it's easy to go back and change it. If you haven't even pushed your changes anywhere, that's not only fine, but appreciated by everyone involved. Review and revise before you push is the counter-balance to frequent commits.
  • Not only is the implementation fast locally, it's very network efficient, and the protocol for exchanging revisions is also very good at figuring out what needs to be transferred quickly. This is a huge difference - one repository hosted on Debian's Alioth SVN server took 2 days to synchronise because the protocol is so chatty. Now it fits in 3 megs and would not take that long to synchronise over a 150 baud modem.
  • Disk might be cheap, but my /home is always full - git has a separate step for compacting repositories, which means that delta compression can be far more effective. If you're a compression buff, think of it as having an arbitrarily sized window, because when delta compressing git is able to match strings anywhere else in the repository - not just the file which is the notional ancestor of the new revision. This space efficiency affects everything - the virtual memory footprint in your buffercache while mining information from the repository, how much data needs to be transferred during "push" and "pull" operations, and so on. Compare that to Subversion, which even when merging between branches is incapable of using the same space for the changes hitting the target branch. The results speak for themselves - I have observed an average of 10 to 1 space savings going from Subversion FSFS to git.
  • Perhaps somebody has already made a conversion of the project and put it somewhere
  • git-svn fetch
  • But people who use git are used to treating their repositories as a revision data warehouse which they use to mine useful information when they are trying to understand a codebase.
  • importing the whole repository from Subversion
  • git svn init
  • git svn fetch
  • If you like, you can skip early revisions using the -r option to git-fetch.
  • make a local branch for development
  • The name "foo" is completely private; it's just a local name you're assigning to the piece of work you're doing. Eventually you will learn to group related commits onto branches, called "topic branches", as described in the introduction.
  • Say you want to take a project, and work on it somewhere else in a different direction, you can just make a copy using cp or your favourite file manager. Contrast this with Subversion, where you have to fiddle around with branches/ paths, svn cp, svn switch, etc
  • Each of those copies is fully independent, and can diverge freely. You can easily push and pull changes between them without tearing your hair out.
  • Each time you have a new idea, make a new branch and work in that.
  • But anyway, that copying was too slow and heavy. We don't want to copy 70MB each time we want to work on a new idea. We want to create new branches at the drop of a hat. Maybe you don't want to copy the actual repository, just make another checkout. We can use git-clone again
  • The -l option to git-clone told git to hardlink the objects together, so not only are these two sharing the same repository but they can still be moved around independently. Cool. I now have two checkouts I can work with, build software in, etc.
  • But all that's a lot of work and most of the time I don't care to create lots of different directories for all my branches. I can just make a new branch and switch to it immediately with git-checkout:
  • Once you have some edits you want to commit, you can use git-commit to commit them. Nothing (not even file changes) gets committed by default; you'll probably find yourself using git-commit -a to get similar semantics to svn commit.
  • There is also a GUI for preparing commits in early (but entirely functional) stages of development.
  • People used to darcs or SVK's interactive commit will like to try git add -i
  • correcting changes in your local branch
  • If it's the top commit, you can just add --amend to your regular git-commit command to, well, amend the last commit. If you explored the git-gui interface, you might have noticed the "Amend Last Commit" switch as well.
  • You can also uncommit. The command for this is git-reset
  • HEAD~1 is a special syntax that means "one commit before the reference called HEAD". HEAD^ is a slightly shorter shorthand for the same thing. I could have also put a complete revision number, a partial (non-ambiguous) revision number, or something like remotes/trunk. See git-rev-parse(1) for the full list of ways in which you can specify revisions.
  • I sometimes write commands like `gitk --all `git-fsck | awk '/dangling commit/ {print $3}'`' to see all the commits in the repository, not just the ones with "post-it notes" (aka references) stuck to them.
  • "Another" way to revise commits is to make a branch from the point a few commits ago, then make a new series of commits that is revised in the way that you want. This is the same scenario as before.
  • I've introduced a new command there - git-cherry-pick. This takes a commit and tries to copy its changes to the branch you've currently got checked out. This technique is called rebasing commits. There is also a git-rebase command which probably would have been fewer commands than the above. But that's my way.
  • Using Git opens the door to a bazaar of VCS tools rather than sacrificing your projects at the altar of one.
  • keep your local branch up to date with Subversion
  • The recommended way to do this for people familiar with Subversion is to use git-svn rebase.
  • Note: before you do this, you should have a "clean" working tree - no local uncommitted changes. You can use git-stash (git 1.5.3+) to hide away local uncommitted changes for later.
  • This command is doing something similar to the above commands that used git-cherry-pick; it's copying the changes from one point on the revision tree to another
  • Better still is to bunch up your in-progress working copy changes into a set of unfinished commits, using git add -i (or git-gui / git-citool). Then try the rebase. You'll end up this time with more commits on top of the SVN tree than just one, so using Stacked Git you can "stg uncommit -n 4" (if you broke your changes into 4 commits), then use "stg pop" / "stg push" to wind around the stack (as well as "stg refresh" when finished making changes) to finish them - see
  • Once you grok that, you'll only need to use stg and git-svn fetch.
  • in my experience stg is the best tool for rebasing
  • Ok, so you've already gone and made the commits locally that you wanted to publish back to the Subversion server. Perhaps you've even made a collection of changes, revising each change to be clearly understandable, making a single small change well such that the entire series of changes can be easily reviewed by your fellow project contributors. It is now time to publish your changes back to Subversion. The command to use is git svn dcommit. The d stands for delta
  • git-svn won't let the server merge revisions on the fly; if there were updates since you fetched / rebased, you'll have to do that again.
  • People are not used to this, thinking somehow that if somebody commits something to file A, then somebody else commits something to file B, the server should make a merged version with both changes, despite neither of the people committing actually having a working tree with both changes. This suffers from the same fundamental problem that darcs' patch calculus does - that just because patches apply 'cleanly' does not imply that they make sense - such a decision can only be automatically made with a dedicated continual integration (smoke) server.
  • This is normally what I use in preference to rebase.
  • This will merge all the commits that aren't in your ancestry, but are in the ancestry of the branch trunk (try setting rightmost drop-down in gitk to 'ancestor' and clicking around to get a feel for what this means), and make a new commit which has two parents - your old HEAD, and whatever commit trunk is up to.
  • there are many shortfallings in git.
  • Sadly, this model is in use by virtually every Subversion hosted project out there. And that is going to be hard to undo.
  • It is possible to use git in this way (see the figure to the right) - but it's not trivial, and not default. In fact git itself is developed in this way, using feature branches, aka topic branches.
  • Left: what darcs thinks when you start committing without marking tag points.
  • Right: Subversion has a somewhat smaller brain...
  • bzr comes with some great utilities like the Patch Queue Manager which helps show you your feature branches. With PQM, you just create a branch with a description of what you're trying to do, make it work against the version that you branched off, and then you're done. The branch can be updated to reflect changes in trunk, and eventually merged and closed.
  • Windows support is good. Consistent implementation. Experience with the distributed development model. Friendly and approachable author and core team.
  • Actually the models of git and bzr are similar enough that bzr could be fitted atop of the git repository model
  • Mercurial is missing lightweight branches that makes git so powerful, and there is no content hashing, so it doesn't really do the whole "revision protocol" thing like git.
  • If you're on Windows it's probably a lot easier to get going.
Daniel Jomphe

nvie.com » Blog Archive » A successful Git branching model - 0 views

shared by Daniel Jomphe on 19 Jan 10 - Cached
despairblue liked it
  •  
    Looks like a very fine high-level learning resource
Daniel Jomphe

GitFaq - GitWiki - 0 views

shared by Daniel Jomphe on 13 Oct 08 - Cached
Daniel Jomphe

Git - 0 views

shared by Daniel Jomphe on 12 Oct 08 - Cached
Daniel Jomphe

git awsome-ness [git rebase --interactive] - MadBlog - 0 views

  • What the small help doesn't say is that you can actually reorder your commits, and it will do what you expect it to do. I used it 10 minutes ago, because I have this string buffer module I extend on a regular basis, I squashed every API extension of that module in one commit using that. Each time one change needs you to edit anything because either you asked for it, or that one of the change you asked for generated a conflict, then as usual the rebase will stop. You will be prompted to make the change, or fix the conflict, or merge comments (in case of a squash), and when all is in order, you just need to: $ git rebase --continue This is just awsomely simple and intuitive
Daniel Jomphe

The Thing About Git - 0 views

  • Version control systems have traditionally required a lot of up-front planning followed by constant interaction to get changes to the right place at the right time and in the right order. And woe unto thee if a rule is broken somewhere along the way, or you change your mind about something, or you just want to fix this one thing real quick before having to commit all the other crap in your working copy.
  • You can work on five separate logical changes in your working copy – without interacting with the VCS at all – and then build up a series of commits in one fell swoop. Or, you can take the opposite extreme and commit really frequently and mindlessly, returning later to rearrange commits, annotate log messages, squash commits together, tease them apart, or rip stuff out completely. It’s up to you, really. Git doesn’t have an opinion on the matter.
  • I’ve personally settled into a development style where coding and interacting with version control are distinctly separate activities. I no longer find myself constantly weaving in and out due to the finicky workflow rules demanded by the VCS. When I’m coding, I’m coding. Period. Version control - out of my head. When I feel the need to organize code into logical pieces and write about it, I switch into version control mode and go at it. I’m not saying this is the Right Way to use Git: in the end, it all goes to the same place. I’m saying that this is the way I seem naturally inclined to develop software, and Git is the first VCS I’ve used that accommodates the style.
  • ...20 more annotations...
  • Taking Control of Your Local Workflow
  • Git means never having to say, “you should have”
  • The big problem here is models.rb - it’s “tangled” in the sense that it includes modifications from two different logical changes. I need to tease these changes apart into two separate commits, somehow. This is the type of situation that occurs fairly regularly (to me, at least) and that very few VCS’s are capable of helping out with. We’ll call it, “The Tangled Working Copy Problem.”
  • Git is quite different in this regard. You can work on five separate logical changes in your working copy — without interacting with the VCS at all — and then build up a series of commits in one fell swoop. Or, you can take the opposite extreme and commit really frequently and mindlessly, returning later to rearrange commits, annotate log messages, squash commits together, tease them apart, or rip stuff out completely. It’s up to you, really. Git doesn’t have an opinion on the matter.
  • I've personally settled into a development style where coding and interacting with version control are distinctly separate activities. I no longer find myself constantly weaving in and out due to the finicky workflow rules demanded by the VCS. When I'm coding, I'm coding. Period. Version control – out of my head. When I feel the need to organize code into logical pieces and write about it, I switch into version control mode and go at it. I'm not saying this is the Right Way to use Git: in the end, it all goes to the same place. I'm saying that this is the way I seem naturally inclined to develop software, and Git is the first VCS I've used that accommodates the style.
  • The Index is also sometimes referred to as The Staging Area, which makes for a much better conceptual label in this case. I tend to think of it as the next patch: you build it up interactively with changes from your working copy and can later review and revise it. When you're happy with what you have lined up in the staging area, which basically amounts to a diff, you commit it. And because your commits are no longer bound directly to what’s in your working copy, you're free to stage individual pieces on a file-by-file, hunk-by-hunk basis. Once you've wrapped your head around it, this seemingly simple and poorly named layer of goo between your working copy and the next commit can have some really magnificent implications on the way you develop software.
  • We want to commit all of the changes to synchronize-bookmarks and some of the changes to models.rb, so let’s add them to the staging area:
  • add bin/synchronize-bookmarks
  • add --patch models.rb
  • tage this hunk [y/n/a/d/j/J/?]?
  • Stage this hunk [y/n/a/d/j/J/?]?
  • I run into The Tangled Working Copy Problem so often that I've devised a manual process for dealing with it under VCS’s that punt on the problem. For instance, if I were using Subversion, I might go at it like this:
  • The magic is in the --patch argument to git-add(1). This instructs Git to display all changes to the files specified on a hunk-by-hunk basis and lets you choose one of the following options for each hunk: y – stage this hunk _n_ – do not stage this hunk _a_ – stage this and all the remaining hunks in the file _d_ – do not stage this hunk nor any of the remaining hunks in the file _j_ – leave this hunk undecided, see next undecided hunk _J_ – leave this hunk undecided, see next hunk _k_ – leave this hunk undecided, see previous undecided hunk _K_ – leave this hunk undecided, see previous hunk _s_ – split the current hunk into smaller hunks
  • I like to review that the changes in the staging area match my expectations before committing: $ git diff --cached [diff of changes in staging area]
  • I also like to verify that my unstaged / working copy changes are as I expect: $ git diff [diff of changes in working copy that are not in the staging area]
  • Everything looks good, so I commit the staged changes: $ git commit -m "fix bookmark sucking problems"
  • git add --patch is actually a shortcut to features in git add --interactive, a powerful front-end for managing all aspects of the staging area. The git-add(1) manual page is a treasure trove of worthwhile information that’s often passed over due to the traditional semantics of VCS “add” commands. Remember that git-add(1) does a lot more than just add stuff – it’s your interface for modifying the staging area.
  • git commit --amend takes the changes staged in the index and squashes them into the previous commit. This lets you fix a problem with the last commit, which is almost always where you see the technique prescribed, but it also opens up the option of a commit-heavy workflow where you continuously revise and annotate whatever it is you're working on. See the git-commit(1) manual page for more on this.
  • And then there’s git rebase --interactive, which is a bit like git commit --amend hopped up on acid and holding a chainsaw – completely insane and quite dangerous but capable of exposing entirely new states of mind. Here you can edit, squash, reorder, tease apart, and annotate existing commits in a way that’s easier and more intuitive than it ought to be. The “INTERACTIVE MODE” section of the git-rebase(1) manual page is instructive but Pierre Habouzit’s demonstration is what flipped the light on for me.
  • There’s a section of the Git User’s Manual called The Workflow that describes, at a fairly low level, the various interactions between the working copy, the index, and the object database.
Daniel Jomphe

Capi's Corner » Blog Archive » Git on Windows: "You have some suspicious patc... - 0 views

  • use “git-config core.autocrlf true” and “git-config core.safecrlf true”
1 - 20 of 66 Next › Last »
Showing 20 items per page