Hacker Newsnew | past | comments | ask | show | jobs | submit | molf's commentslogin

After bun [1] this is another high-profile project that was ported to Rust by extensively using LLMs.

Very curious to see how these rewrites play out. Is the LLM foundation solid enough to build upon and iterate on? Or does this cause projects to become unmaintainable because no person understands the implementation anymore?

[1]: https://news.ycombinator.com/item?id=48132488


I'd love to see Asciidoctor vibe-ported to Rust! Have either of these people detailed their methodology & costs?

I think there's a few Rust ones knocking around, some LLM-assisted. I've recently had Claude re-implement the inline parser (https://github.com/oxidecomputer/react-asciidoc) using our large corpus (600+) of AsciiDoc documents

Had it convert it in this and the stock version, compare the output, fix and repeat.


OXC is not the only consumer, so using the OXC AST wouldn't particularly make sense? I thought it was pretty well explained in the PR:

> Note that the conversion from any AST into our HIR is complex, and we can only maintain one version. Hence we've aligned on using a Babel-like AST as our public API. Another key point is that we don't yet implement our own scope analysis (since the TS version of the compiler relied on Babel's scope analysis), so for now we require that the scope data be serialized. It's a denormalized graph, and some metadata has to be stored to associate nodes with scopes. We're open to feedback about the AST and scope representation - we iterated a bit just to get things to work, but it can be more optimal.


I saw, I just don't understand the rationale for picking Babel over OXC or something else as the interchange format -- other than "we were already doing it this way". After all, you know what they say about temporary solutions.

So isn't not changing more sensible than changing to an arbitrary alternative?

The current developers surely are more familiar with the Babel representation than OXC, so why switch?


What I mean is, if you're going to rewrite it in Rust, why rewrite Babel rather than leaning on the existing ecosystem? I know they're not actually rewriting Babel, just reusing the semantic layout of its AST, but it's feeling a bit like the MediaWiki parser situation to me (roughly "if we started from scratch today, we wouldn't choose to have it this way, but we started a different way before, and it's been a difficult path to get to where we want to be"). Maybe that's a fairly remote analogy but it feels similar.

> if we started from scratch today, we wouldn't choose to have it this way, but we started a different way before, and it's been a difficult path to get to where we want to be"

This is just how software development for a large public project goes. If you want to land big changes in a finite amount of time, you take the path that breaks the least number of things along the way.

Once the whole ecosystem is rust-based, they can always trim out some of the redundant intermediate representations for performance gains.


Yes absolutely.

It's brilliant: all useMemo and useCallback can be removed and you get the same runtime performance and then some, at the cost of only a slight increase in code size.

A small downside at the moment is the build time. This change will hopefully help address that because it will no longer depend on babel.


Very insightful, thanks. I just delved into it, starting here: https://react.dev/learn/react-compiler/introduction

I haven't tried the compiler yet, but I been very skeptical of the automatic memoization features. Both in that sometimes the default strategy to decide when to memoize is not good enough but also the hidden flow to trigger the memoization causing hard to spot performance regressions.

I would be interest to hear how it worked out for you.


It really does work very well in practice. A few things really help:

- Lints [1] that flag code that cannot be (correctly) optimised. Usually this is obscure code that is too smart for its own good. But the compiler leaves it alone and flags it for review, so most things just keep working.

- Lints that flag code that violate the rules of hooks. These rules became more critical to follow: failure to do so may break rendering. But non-compliant code can be easily be excluded from compilation [2], so you do not have to fix everything at once.

- Popular libraries that are not compatible (yet) are flagged and excluded automatically [3].

The compiler is better than manual memoization, because 1) it is hard not to forget memoizations, and 2) the compiler's output memoizes more granularly than manual memoization realistically could.

I have not found performance regressions. Not saying they're not possible; but we haven't encountered them.

We have a very performance-sensitive project that used preact (chosen for performance) via its compatibility layer, that we switched to React + React compiler. Performance is noticeably better than with preact. Whereas previously the React-only version was incredibly slow even with carefully placed memoizations, because they were very hard to get right.

[1]: https://react.dev/learn/react-compiler/installation#eslint-i...

[2]: https://react.dev/learn/react-compiler/incremental-adoption

[3]: https://react.dev/reference/eslint-plugin-react-hooks/lints/...


I was thinking mainly cases like this

const nestedDependency = { a: { b: { c: 'c' } } }

useMemo(() => nestedDependency.a.b.c, [nestedDependency])

vs

useMemo(() => nestedDependency.a.b.c, [nestedDependency.a.b.c])

neither triggers react hook lint warnings, although I guess this is more relevant to useEffect than memoization.


If you’re interested in what a specific piece of code compiles to, it’s worth checking out the online compiler playground [1]

https://playground.react.dev/


There is no SQL successor: SQL is here to stay.

Applying the Lindy effect [1]: after half a century of SQL we can expect it to survive for at least as long.

Disruption/displacement of SQL is like attempting to replace email. It's not going to happen. At best an alternative technology can carve out a small niche (and there's nothing wrong with that).

[1]: https://en.wikipedia.org/wiki/Lindy_effect


That wikipedia article was super interesting, I'd never heard of the Lindy Effect before. A bit difficult to wrap my noggin around but really fascinating to think about.

Read the books of Nassim Taleb. They are full of this kind of interesting stuff. Sadly, he blocked me on twitter back when I asked him why he had a paid subscription for a self-described communist hardcore-Putinist hardcore-Antisemite :/

Never heard of the Lindy effect either, learn something new from this site every day haha

It was made famous by Nassim Nicholas Taleb in Incerto.

I have no idea; but I presume they don't, given that ZJIT today is still much slower than YJIT? [1]

[1]: https://rubybench.github.io


Were there 1M line diffs in the past, before LLMs? That seems (seemed?) legitimately insane.

Simultaneously a very good example of how Github needs to adapt to the changing software development landscape?


Agreed. I absolutely adore the idea of it! But all the brownish colours tell the same story.

For some additional context; many old pigments were not stable at all.

https://www.vangoghstudio.com/what-were-the-original-colors-...


Is there enough color data left in the brown to correct it?

Or do you need to infer it based on location, budget, time, climate etc?


This specific painting was reinterpreted based on specific descriptions of the colours in a letter from the painter.

As far as I'm aware there is no way to know for sure what colours originally looked like, especially if the information is limited. There are so many variables, we can only guess.


Its absolutely lossy and you'd have to know a lot about each piece to know how big the loss is.


> Most useful async blocks are big enough that the overhead for the error cases disappears.

Is it really though?

In my experience many Rust applications/libraries can be quite heavy on the indirection. One of the points from the article is that contrary to sync Rust, in async Rust each indirection has a runtime cost. Example from the article:

    async fn bar(blah: SomeType) -> OtherType {
       foo(blah).await
    }
I would naively expect the above to be a 'free' indirection, paying only a compile-time cost for the compiler to inline the code. But after reading the article I understand this is not true, and it has a runtime cost as well.


It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right. And if we guess incorrectly, in the worst case, the algorithm degrades to a linear scan.

Unless we have prior knowledge. For example: if there is a particular distribution, or if we know we're dealing with integers without any repetition (i.e. each element is strictly greater than the previous one), etc.


> It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.

You have another piece of information, you don't only know if the element was before or after the compared element. You can also know the delta between what you looked at and what you're looking for. And you also have the delta from the previous item you looked at.


Assuming your key space is anything like randomly distributed.

Thinking about it--yeah, if you can anticipate anything like a random distribution it's a few extra instructions to reduce the number of values looked up. In the old days that would have been very unlikely to be a good deal, but with so many algorithms dominated by the cache (I've seen more than one case where a clearly less efficient algorithm that reduced memory reads turned out better) I suspect there's a lot of such things that don't go the way we learned them in the stone age.


And you always start off knowing the total length of the array, and the width of the datatype.

Actually deciding what to do with that information without incurring a bunch more cache misses in the process may be tricky.


Is the disconnect here that in many datasets there is some implicit distribution? For example if we are searching for english words we can assume that the number of words or sentences starting with "Q" or "Z" is very small while the ones starting with "T" are many. Or if the first three lookups in a binary search all start with "T" we are probably being asked to search just the "T" section of a dictionary.

Depending on the problem space such assumptions can prove right enough to be worth using despite sometimes being wrong. Of course if you've got the compute to throw at it (and the problem is large) take the Contact approach: why do one when you can do two in parallel for twice the price (cycles)?


> If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right.

This is true for abstract and random data. I don't think it's true for real world data.

For example, python's sort function "knows nothing" about the data you're passing in. But, it does look for some shortcuts and these end up saving time, on average.


Just tried it out and it works great and is really fast! It's a breath of fresh air compared to VS Code. Lots of other editors are fast, but this seems feature complete as well as fast.

Migrating from VS Code was also super simple and integrations with AI assistant seem to just work.

I can definitely appreciate the engineering work that went into it. Loving it so far! Thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: