I genuinely worked somewhere that used the term API to mean "a person in India". The same company had someone order me not to use the term "postmortem" as part of the SRE function. I did not stay long after that.
The use of the word "chore" in many users of conventional commits has always riled me. I've always tended to favour the "linux kernel"[0] style of commit subject, which thankfully gets a mention here.
Completely agree, the attitude implied by “chore” is very off-putting to me. As if the rest should all be marked “fun” or “indifferent”. That kind of emotional judgement doesn’t belong in a commit message.
I’ve never personally used the chore term, but it doesn’t bother me to see it and I don’t feel it has a negative connotation.
Cleaning my kitchen after a meal may be a chore, but it’s not an intrinsically bad or unpleasant experience most of the time, it’s just good hygiene and afterwards I have the satisfaction of things being clean. Not cleaning the kitchen feels way worse to me as it ultimately leads to other far more unpleasant situations.
Such it is with updating dependencies, it generally needs to be done, so it’s good to do it, but it’s in no way noteworthy, so chore describes it perfectly, to me it signals that: “it’s work that needed to be done, but not for a feature, functionality change or bug fix on this particular code base, so you’re unlikely to see much change”.
You just made me realize why I've always considered 'chore' the most ambiguous type. In addition to being loosely defined ("transparent change with zero functionnal impact"?), this one is indeed a word related to emotion. No wonder it has a more subjective meaning than 'fix' or 'feat'.
This is why I never use it and almost always pick 'feat' to please the linter. Because I can't help considering that any change worth committing is improving the quality of the code in one way or the other, and thus a feature.
It is bad terminology, yes. But also - a pretense that you know the overarching influence of a commit ahead of time, which you don't - but once you have conventional commits everyone on the team and the LLMs have to spend time/tokens inventing that stupid nomenclature.
I'd have thought that by now, most would have been swapping to WebAssembly. It's really nicely sandboxed, you expose it to only what you want, and you can compile a lot of languages into a WASM form meaning you're not stuck with only Javascript or similar. Am I naive for thinking that?
I have done extensive research on CDC and it almost never works out because most utilities don't create compressed archives in an "rsyncable" (rsync does CDC) format, I actually saved a lot of storage using restic when I switched my backups of certain things so that files were stored in archives uncompressed, and sorted in a stable order. I know syncthing eventually removed CDC and just went with constant-size block sizes.
Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!
As someone who predominantly writes in Go, cider-v was a massive step backwards compared to cider. I eventually moved entirely over to using vim (with the set of internal plugins for blaze etc) which became so much more useful, but I still missed the features of a proper IDE that cider just excelled at.
I imagine a lot of it came from that push to "use outside world tools more rather than writing our own" which is great in theory, but really felt like a huge leap backwards in terms of convergence.
Yeah, I was working out of the Sydney office. Almost everything was incredibly slow due to that latency, not just chromoting but also just accessing most sites through beyondcorp.
Cider (and p4/g4c etc) was amazing when I left back in 2020, I loved it so much, and truly miss it. I rejoined Google last year, and they'd replaced it with a VSCode clone that truly was just a glorified text editor and most were all-in on mercurial as a piper/citc shim -- I was only there for 5 months before I decided not to stay, and I never managed to get Go type definition hints working.
That is not quite the right word. For Python, the headcount was moved from the Bay Area (the most expensive place in the world to hire software engineers) to Munich (the most expensive place in Germany to hire SWEs.), for cost saving reasons.
Most of the engineers making most of the tools being praised in this thread are in Germany, so I don't think that generalization quite holds.
Even if the best SWEs are better in the Bay area, there's also a lot more competition for them, so Google in Germany might be able to get top 1% there (and in neighboring countries) but Google in the Bay Area is probably having a tough time getting even top 10%.
That's a good point, and why I'm happy to see remote offices pop up in many locations. The problem is the top .1% which can live anywhere, is often a poor representation of the depth of talent density.
Similar to that IBM/Rational ClearCase, both are so unfriendly compared to subversion/cvs or git/mercurial that I always struggle to believe why someone would torture themselves using that. Probably admins love them because they allow some tooling to be added.
> Just create separate AWS accounts for separate services
My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.
I've been irritated at AWS (and the other large cloud providers) that they charge $0.01/GB for cross-az traffic. That's $3.24/Mbps -- about the same I was paying for internet transit (as in: from London to anywhere in the world) 20 years ago, and this is just between two datacenters in the same city controlled by the same organisation, markup must be 10,000x or more considering these places are cross-connected with massive bundles of fiber!
> My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.
As far as I remember, accounts within the same organization will have the same mapping. You also can use stable zone names these days, instead of the regular mappings.
And yeah, egress traffic pricing is freaking insane at this point. It's the biggest reason to NOT use AWS.
Insanely high S3 storage charges too. $23/TB/month? Even with the insane HDD pricing that we see today, that's paying off a drive in 1 month (at retail) that will last for 50-100 months. Sure, there's probably some encoding overhead, but it's still mad.
S3 is pretty competitive if you want similarly-performing storage with consistent millisecond-level latency, high scalability, and at least 3x redundancy. Try looking at how much it's going to cost you in enterprise SSDs :)
Is it 3x redundancy forever? I always just kinda assumed it was RS encoded after a while, so only 30-50% larger than a single copy. Plus, almost all object storage is written to / read from hard disks, not to SSDs. Unless they're in a caching layer that is.
I know Azure has done a bunch of work around Pyramid Codes (essentially a locally repairable EC/RS variant), and Google obviously have the Colossus infrastructure that allows variable encodings, I'd be surprised if AWS is still triple-replicated everywhere.
Yes, and S3 is multiply redundant and is designed to survive a total AZ failure. So your data is physically replicated into at least 2 different AZs and might be multiply-redundant within them. They also provide a crazy SLA for data integrity, meaning that data must never be lost.
S3 also has a reduced redundancy tier and infrequent access tiers that are quite a bit cheaper.
It _is_ expensive, but once you crunch all the numbers, it's actually not unreasonable. I'd argue that using the real S3 is overkill for most scenarios that don't need infinite scalability.
GCS / Azure can survive a total AZ failure too, think of 3 x Replicas as RAID-1, whereas Erasure Coding is more like RAID-6. Only it's actually more resilient:
Let's say you have 10 data blocks, and you have 4 parity blocks. You can now lose 4 servers containing a block and still be able to repair the data, whereas in 3 x Replica you can only lose 2, and have to store everything 300% of size, instead of only 140%.
And yes, it is unreasonable how much they charge for both storage and inter-az bandwidth.
The problem with erasure coding is that if a disk fails, you suddenly need to read 3-5x more data to reconstruct the missing data from parity blocks. This is especially problematic if your replicas are split across zones. The inter-zonal bandwidth is large, but not infinite.
So I'm pretty sure that you need to have at least 2 full copies in different AZs, and then likely at least some additional redundancy within a single AZ (in the form of erasure codes or a full mirror).
So that's at least 3-4x the amount of data. 1Tb of NVMe SSD capacity is around $200 and with 3x redundancy that's $600, or about 2 years of AWS S3 storage. As I said, it's expensive but not unreasonable.
That's what Locally Recoverable Codes (including Pyramid Codes) are designed to address: you can repair a missing block using only a subset of the blocks, which you can make sure are placed in just one zone, eliminating the cross-az bandwidth requirements. Sure, if you lose multiple blocks, you are going to end up needing that extra bandwidth, but the chances of having two blocks offline at once is very low -- if it's just for a failure rather than extended maintenance, then you have probably already repaired the first block by the time the second fails.
In fact, the RS configuration is often in the >50 data blocks and >10 parity blocks range (albeit with an LRC/nested RS config) for object stores because it's more important to have that recoverability than repair efficiency. While one large provider I worked at did have a system whereby they did effectively have two copies of the RS-encoded data (so that 130% turned into 260%) across two AZs, they were actively in the process of swapping to the blocks being evenly distributed across the AZs, near-halving the total required disk space.
As I said before, most object storage is not on SSD, it's on hard disk: it's 20% of the price per TB, and most objects are read very infrequently. I can promise you that they're not paying $200 for 1TB of SSD either... I realise prices are higher than sensible in the last 6 months, but it was fairly easy to pick up SSDs for under $50/TB at retail pricing (and hard disks for under $10/TB) only a year ago.
reply