Some of (remote) git internals, part II

Previously.

First of all, if you want to share your project with all its history, you can just tar it with the .git directory included. A user can download it, unpack and will be able to run git log, git diff, etc. Indeed, there is usually nothing secret in git repo, no account info, no passwords.

Git server via SSH and "bare" repo

Configuring a remote git server is just initializing a bare repo at *NIX box somewhere + configuring user accounts.

Bare repo is a repo with no files checked out. It has only only .git directory. Try: git init --bare.

One very important thing is that you can clone from it locally. This is my bare git repo for my website at my server:

~/git/yurichev.com.git$ ls
branches  config  description  HEAD  hooks  info  objects  packed-refs  refs

I can clone it locally, at the same server:

~/tmp$ git clone /home/i/git/yurichev.com.git
Cloning into 'yurichev.com'...
done.

~/tmp$ cd yurichev.com/

~/tmp/yurichev.com$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

(Usual non-bare git repos (with all the files checked out) can be cloned locally as well.)

This is important: you can access a bare git repo locally, but in the same manner, you can do this via SSH. Whoever can do this, if he/she has an account on my server and can read these git files. Instead of rolling out his own network protocol, etc, Linus Torvalds decided to rely on SSH and *NIX user accounts system. Why bother with user accounts, credentials, user rights, etc, if all this headache can be offloaded to an operating system? If a user has read-only (SSH) access to the git files in your bare repo, he/she will be able to clone your repo, pull from it. If a user has read-write (SSH) access to the git files in your repo, he/she will be able to push changes to it. (A lock-file will be created during both pulling and pushing, preventing other git clients from access.)

Heck, you can access a git repo even via NFS protocol, if you really like it. I think, it would be also possible to keep a bare git repo on dropbox storage, but that would be utter perverse. Still, all this is possible.

All this implies that whoever have write access to your repo, can forge commits by writing anything in commiter's name and email, yours, or Linus Torvalds'. But usually a repo has only one owner with write access. Also, git commits can be signed using PGP.

When I stopped using GitHub in Nov-2019 (it's too distractive to my taste, like any other social network service), my several contributors continued to help me with my books. So a contributed pulled from my repo at web, like this one: https://math.recipes/git/. After committing changes, he pushes them to a SSH account I made for him on my server. The path is like /home/john/SSBE.git, and he can write to it, so he pushes his changes to that bare repo. I pull them to my local computer from the same path on server, using root access (obviously, I have it). I review changes, merge and push them to my main repo at my webserver. After some time, he pulling updates from my webserver to his repo at his local computer.

By the way, it's a valuable experience to get rid of GitHub, maybe, for an exercise, it helps you to understand git better. Because GitHub doing some things for you to make it all easier.

However, my another contributor got accustomed to all that fancy visual candy on GitHub and wasn't ready to quit it. So he pulled my repo from my webserver to his local computer. Then he created a GitHub repo like SSBE fork (under his account, of course). Then he pushed all that to his GitHub repo. Then he did some changes there. Then he wrote me a email about them. I pulled his changes from his GitHub account, like https://github.com/user/SSBE_fork.git, to my local repo. I reviewed them, merged, and pushed all that to my main public repo at my webserver. Some times after, he pulling updates from my repo to his local computer and updates his forked repo at GitHub.

Git server via HTTP(S)

Point your browser to https://math.recipes/git/:

That looks familiar - files under a typical .git directory. You can clone my git repo from it using command: git clone https://math.recipes/git/

Apache2 at my webserver shows a directory of bare repo, as is.

Thanks to Apache2's fancy indexing, you can just wget all the files and try using my bare repo:

% wget --mirror -np https://math.recipes/git/
...

% cd math.recipes/git

% git log
commit 1d4a686e9da06805e645b876da7832cb53f71e7b (HEAD -> master)
Author: Dennis Yurichev 
Date:   Tue Jan 5 19:11:43 2021 +0200

    ...

commit e45e531b6f133a44691fc37701afcc33461244e8
Author: Dennis Yurichev 
Date:   Tue Jan 5 15:04:17 2021 +0200

    ...

You can even clone it locally and check out all the files. But this is not much usable.

Just wondering, when you clone my repo using the usual git client, what files are fetched via HTTP(S)? Let's take a look into my webserver's log files:

123.456.789.1 - - [17/Jan/2021:00:32:03 +0100] "GET /git/info/refs?service=git-upload-pack HTTP/1.1" 200 3929 "-" "git/2.25.1"
123.456.789.1 - - [17/Jan/2021:00:32:03 +0100] "GET /git/HEAD HTTP/1.1" 200 247 "-" "git/2.25.1"
123.456.789.1 - - [17/Jan/2021:00:32:03 +0100] "GET /git/objects/1d/4a686e9da06805e645b876da7832cb53f71e7b HTTP/1.1" 200 385 "-" "git/2.25.1"
...
123.456.789.1 - - [17/Jan/2021:00:32:03 +0100] "GET /git/objects/info/http-alternates HTTP/1.1" 404 458 "-" "git/2.25.1"
...
123.456.789.1 - - [17/Jan/2021:00:32:03 +0100] "GET /git/objects/ce/1fa7f5e6042b4bc79ef38509f929b15680e2b2 HTTP/1.1" 200 310 "-" "git/2.25.1"
...
123.456.789.1 - - [17/Jan/2021:00:32:03 +0100] "GET /git/objects/info/alternates HTTP/1.1" 404 458 "-" "git/2.25.1"
...
123.456.789.1 - - [17/Jan/2021:00:32:04 +0100] "GET /git/objects/5d/1eacb1a99b9479abfc62d4da285bd67a198c63 HTTP/1.1" 404 458 "-" "git/2.25.1"
...
123.456.789.1 - - [17/Jan/2021:00:32:04 +0100] "GET /git/objects/info/packs HTTP/1.1" 200 278 "-" "git/2.25.1"
...
123.456.789.1 - - [17/Jan/2021:00:32:04 +0100] "GET /git/objects/pack/pack-d9f69399a75f1ed7b3494fd245aa498979cd1c60.idx HTTP/1.1" 200 8994 "-" "git/2.25.1"
...

You see, a git client fetches info/refs and HEAD files first. Let's see, what is there?

% curl -s https://math.recipes/git/HEAD
ref: refs/heads/master

% curl -s https://math.recipes/git/info/refs
1d4a686e9da06805e645b876da7832cb53f71e7b        refs/heads/master

Aha, this is how git at client side can find first object's ID (1d4a686e9da06805e645b876da7832cb53f71e7b). It then fetches it via objects/1d/4a686e9da06805e645b876da7832cb53f71e7b file. Then it fetches everything else, probably, recursively, including, pack files, etc.

This workable even for Linux Kernel repo. This is the main URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/.

Add HEAD and info/refs to URL:

 % curl -s https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/HEAD
ref: refs/heads/master

 % curl -s https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/info/refs | head
e2da783614bb8930aa89753d3c3cd53d5604665d        refs/heads/master
5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c        refs/tags/v2.6.11
c39ae07f393806ccf406ef966e9a15afc43cc36a        refs/tags/v2.6.11^{}
5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c        refs/tags/v2.6.11-tree
c39ae07f393806ccf406ef966e9a15afc43cc36a        refs/tags/v2.6.11-tree^{}
26791a8bcf0e6d33f43aef7682bdb555236d56de        refs/tags/v2.6.12
9ee1c939d1cb936b1f98e8d81aeffab57bae46ab        refs/tags/v2.6.12^{}
9e734775f7c22d2f89943ad6c745571f1930105f        refs/tags/v2.6.12-rc2
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2        refs/tags/v2.6.12-rc2^{}
0397236d43e48e821cce5bbe6a80a1a56bb7cc3a        refs/tags/v2.6.12-rc3
...

You can fetch a random object via HTTP(S):

% wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/objects/e2/da783614bb8930aa89753d3c3cd53d5604665d
--2021-01-18 00:07:15--  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/objects/e2/da783614bb8930aa89753d3c3cd53d5604665d
Resolving git.kernel.org (git.kernel.org)... 136.144.49.103, 2604:1380:40b0:1a00::1
Connecting to git.kernel.org (git.kernel.org)|136.144.49.103|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8968 (8.8K) [application/octet-stream]
Saving to: ‘da783614bb8930aa89753d3c3cd53d5604665d’

da783614bb8930aa89753d3c3cd53d5604 100%[===============================================================>]   8.76K  --.-KB/s    in 0s

2021-01-18 00:07:15 (68.3 MB/s) - ‘da783614bb8930aa89753d3c3cd53d5604665d’ saved [8968/8968]

However, GitHub doesn't support this dumb protocol anymore:

% curl -s https://github.com/torvalds/linux.git/info/refs
Please upgrade your git client.
GitHub.com no longer supports git over dumb-http: https://github.com/blog/809-git-dumb-http-transport-to-be-turned-off-in-90-days

Yes, git supports more advanced protocols, and you should use them instead of "dumb" one: 1 2.

But I use the dumb protocol because it can be deployed without effort. I push my changes to a "bare" repo at my webserver and anyone can fetch all the files via Apache2, as is.

If you want to use it too, do not forget about post-update hook.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.