On the surface: - Git is slow on large repos, even on an SSD. - Git has trouble ...

5e92cb50239222b · on July 5, 2022

> Git is slow on large repos, even on an SSD.

Maybe on Windows, but then everything is slow on Windows. On my 2015-era machine `git pull` on the Linux kernel source tree is nearly instantaneous after the remote objects are downloaded. Same with `git status`, `git diff`, etc. I mean, that's what it was developed for, because everything else was slow.

nine_k · on July 5, 2022

How about `git status`?

The first SSD I bought back in 2008 was to put a large git repo on it; it helped. With much larger repos, like those I had to work with at Facebook, even an NVMe drive becomes a bit uncomfortable, and one has to use something like Watchman [1] to track changes without a rather noticeable delay.

[1]: https://github.com/facebook/watchman

vlovich123 · on July 5, 2022

Facebook uses mercurial across the board, at least from what I saw a couple years back. A good chunk of the large file / large repo has been open sourced as EdenFS [1] which uses file notifications to update the status as you make changes to amortize the cost so that it’s already computed by the time you query (watchman is integrated). That being said, very few code bases grow to this size unless you are a major tech company and have a single mono repo (with a few major OSS projects as notable counter examples).

[1] https://github.com/facebookexperimental/eden

nine_k · on July 5, 2022

This is correct! Sorry. It's sometimes better to sleep in the middle of a night than to post. So embarrassing.

glandium · on July 5, 2022

FWIW, see https://news.ycombinator.com/item?id=31928844

Akronymus · on July 5, 2022

maybe this is of interest to you? https://github.blog/2022-06-29-improve-git-monorepo-performa...

mikewarot · on July 5, 2022

>Git is line-oriented and has no notion of semantic diffs and semantic merges. This makes it a raw tool when working with, ironically, source code.

Git is a content addressable snapshot system, with bolted on code to make it retrospectively appear to be a line-oriented system.

It's worse than you thought.

morelisp · on July 5, 2022

It's not worse. Snapshots are exactly what you want if you would like to have format-aware diff/merge or to experiment with alternate algorithms.

But it's easier to complain about git and throw out pie-in-the-sky ideas about "modernizing our tools" than to try the actually-existing AST-based diff/merge tools and realize it's 100x more complex for no workflow gain.

kardos · on July 5, 2022

As I understand it, you can configure git's difftool to be something more clever than line based

beermonster · on July 5, 2022

See https://tekin.co.uk/2020/10/better-git-diff-output-for-ruby-...

globular-toast · on July 5, 2022

> - Git is slow on large repos, even on an SSD.

I think this is an example of induced demand[0]. One of git's main advantages compared to other options is its speed. Git was so fast it completely changed the way you could work. It went from reluctantly interacting with version control when you needed to check in work, to integrating it tightly into your workflow. But, like with many things, people always find a way to "use up" the resource and make it slow again.

[0] https://en.wikipedia.org/wiki/Induced_demand

pmeunier · on July 5, 2022

> - Git is line-oriented and has no notion of semantic diffs and semantic merges. This makes it a raw tool when working with, ironically, source code.

Compare this with Pijul as well!

I've been working on legit.pijul.com (you can try it, but nothing is ready!), which leverages byte-level storage to get higher-level diffs (I know this sounds counter-intuitive, but finare storage granularity gives you more flexibility to compute diffs).

That said, Git isn't actually line-oriented, 3-way merge is. But then even a byte-oriented 3-way merge would give the same shitty merges as Git.

formerly_proven · on July 5, 2022

Diffs aren't actually a first-class object either, they're made up on the spot. Git just stores a complete snapshot for each commit; delta compression in the repo is incidental and unrelated to the diffs you see.

hinkley · on July 5, 2022

Git doesn't understand move and copy operations very well, and has to be tricked into doing it.

With Java, checking in your dependencies was always complicated by the trouble handling binaries efficiently. With NodeJS that's not a problem, but conflict resolution often ends up with duplicate files, so checking them in is still challenging.