Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On the surface:

- Git is slow on large repos, even on an SSD.

- Git has trouble with large objects; git-annex and git-lfs sort of help, but are bolted on, not integral.

- Git's submodules are unergonomic at best.

- Git's CLI is a mess.

Deeper:

- Git has no idea of a conflict as a first-class object; hence merges and rebases with the user fixing the same conflicts multiple times (and `git rerere`). Compare this to Pijul.

- Git is line-oriented and has no notion of semantic diffs and semantic merges. This makes it a raw tool when working with, ironically, source code.

Don't get me wrong: the data structures and ideas on which git is based are beautiful and reliable. But something (even) better can be built on these ideas.



> Git is slow on large repos, even on an SSD.

Maybe on Windows, but then everything is slow on Windows. On my 2015-era machine `git pull` on the Linux kernel source tree is nearly instantaneous after the remote objects are downloaded. Same with `git status`, `git diff`, etc. I mean, that's what it was developed for, because everything else was slow.


How about `git status`?

The first SSD I bought back in 2008 was to put a large git repo on it; it helped. With much larger repos, like those I had to work with at Facebook, even an NVMe drive becomes a bit uncomfortable, and one has to use something like Watchman [1] to track changes without a rather noticeable delay.

[1]: https://github.com/facebook/watchman


Facebook uses mercurial across the board, at least from what I saw a couple years back. A good chunk of the large file / large repo has been open sourced as EdenFS [1] which uses file notifications to update the status as you make changes to amortize the cost so that it’s already computed by the time you query (watchman is integrated). That being said, very few code bases grow to this size unless you are a major tech company and have a single mono repo (with a few major OSS projects as notable counter examples).

[1] https://github.com/facebookexperimental/eden


This is correct! Sorry. It's sometimes better to sleep in the middle of a night than to post. So embarrassing.




>Git is line-oriented and has no notion of semantic diffs and semantic merges. This makes it a raw tool when working with, ironically, source code.

Git is a content addressable snapshot system, with bolted on code to make it retrospectively appear to be a line-oriented system.

It's worse than you thought.


It's not worse. Snapshots are exactly what you want if you would like to have format-aware diff/merge or to experiment with alternate algorithms.

But it's easier to complain about git and throw out pie-in-the-sky ideas about "modernizing our tools" than to try the actually-existing AST-based diff/merge tools and realize it's 100x more complex for no workflow gain.


As I understand it, you can configure git's difftool to be something more clever than line based



> - Git is slow on large repos, even on an SSD.

I think this is an example of induced demand[0]. One of git's main advantages compared to other options is its speed. Git was so fast it completely changed the way you could work. It went from reluctantly interacting with version control when you needed to check in work, to integrating it tightly into your workflow. But, like with many things, people always find a way to "use up" the resource and make it slow again.

[0] https://en.wikipedia.org/wiki/Induced_demand


> - Git is line-oriented and has no notion of semantic diffs and semantic merges. This makes it a raw tool when working with, ironically, source code.

Compare this with Pijul as well!

I've been working on legit.pijul.com (you can try it, but nothing is ready!), which leverages byte-level storage to get higher-level diffs (I know this sounds counter-intuitive, but finare storage granularity gives you more flexibility to compute diffs).

That said, Git isn't actually line-oriented, 3-way merge is. But then even a byte-oriented 3-way merge would give the same shitty merges as Git.


Diffs aren't actually a first-class object either, they're made up on the spot. Git just stores a complete snapshot for each commit; delta compression in the repo is incidental and unrelated to the diffs you see.


Git doesn't understand move and copy operations very well, and has to be tricked into doing it.

With Java, checking in your dependencies was always complicated by the trouble handling binaries efficiently. With NodeJS that's not a problem, but conflict resolution often ends up with duplicate files, so checking them in is still challenging.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: