I’m interested in learning more about your theory that these models can be trained more cheaply. Is anyone doing it from scratch, rather than adversarial distillation?
It is a lot cheaper to train a 27b model such as qwen3.6 which you can even vibe code or agentic code with than it is to train a 1t+ parameter model. It runs on a single commodity GPU for goodness sake
It's not a theory. These smaller models that are coming out are huge advances for the field.
I can't comment on companies training practices. That would be proprietary stuff I guess. I think the claims that the advances being made are due to distillation alone are completely unfair. The advances alone are not just data.
Several times the speed of sound? That is meaningless when there is no media for the sound waves.
I think a better unit might be furlongs per fortnight.
> 2.43 kilometers a second, or 1.51 miles a second, or 5,400 miles an hour, or 8,700 kilometers an hour.
> There is, of course, no air and no sound on the Moon, so a "Mach number" doesn't really make sense. But if there were air, the speed would be about Mach 7, seven times the speed of sound.
"If there were air". Air at which temperature though? Th sound of speed, and hence what Mach numbers mean, depends on the temperature of the air. The temperature air would have at the moon's surface? By day or by night? Or the air at Earth's surface? Or at some other altitude?
Well, there is a speed of sound on the moon. Sound does travel through the regolith. If you were standing on the moon you would indeed "hear" this impact as the sound moved up through your feet. It would sound/feel like standing beside a subwoofer.
I still call it sound when I hear things under water. Gas isn't special.
When someone knocks on your door, you still say you heard the sound, even though the pressure wave was transmitted through the solid door material (before then being transmised to the gas in the room). Likewise, we still file a noise complaint when the neighbor is throwing a raging party, even though we are feeling the bass as much as we are hearing it.
Didn’t Anthropic vibe code all of those integrations? If AI coding is as useful and successful as it is touted, then those integration should be no moat at all.
SWE are paid that because the industry makes so much money off advertising, and it marks the market for everything else.
It's more business model than skillset, because RF engineering is, in many ways, so much more technically challenging.
People who care about pay should mostly be thinking about how their potential employers make money. Do they have fat variable margins? Is there volume? Do I have the opportunity to impact those margins in some way? If you do, there's a good chance you can make good money, regardless of the actual technical challenge at hand.
For a lot of RF engineering, the answers are generally no, at least enough such that the general market isn't getting set at a high clearing rate.
Hardware engineers can get paid that, although it’s rarer. That said, there’s also a much broader base of hardware engineers than just the Bay Area… so cost of living is a lot lower, therefore salaries don’t need to be as sky high to compensate.
Imagine vibe coding your core consumer application and associated backend…
Oh wait, I don’t have to imagine. That’s what Anthropic does. A nice preview for what is in store for those who chose to turn off their brains and turn on their AI agents.
Given how inefficient Meta et al are, why do the pay so much more than the nimbler smaller companies? (Rhetorical question, I already know the answer: monopoly and regulatory capture)
Of course those engineers would rather have more meaningful work if it came with similar compensation and work life balance.
I am a terrible person to ask. My employers get their money's worth from me: I genuinely like my work and regularly work more than 8hrs a day. I also work in a field with others who, with some exception, do the same, so its strange for me to see "normal people" clock out on the dot.
Meta offices are pretty full at 5pm lmao. In fact they are still decently full at 7pm after dinner at 6. Baffles me why people just make up random crap in areas they clearly know less than nothing about.
This is my experience too. I actually briefly took the cool exciting climate change related science job and then realized that I couldn’t actually support my family’s lifestyle on $160k so I left and went back to surveillance capitalism. I do feel guilt about that decision, but I like to imagine I’ll be able to go back to working on interesting and ethical things after my kids are out of the house.
I’m curious how does it track with class sizes and tracks (remedial, regular, honors, “gifted”).
The level is so low in my local elementary school that the single track math class is still doing addition within 20 for first grade.
The funny thing is that the state standard actually measure 2 and 3 digit addition for K and 1st graders, and proficiency at that level would be p75 for a 1st grader.
So why is the actual class teaching level at p<50?
My child took the “CASTL” test as part of IEP assessment. According to the result percentiles, a 6 year old who can do 2 digit vertical addition is at the 70th percentile of 6 year olds.
How come in kindergarten the curriculum is counting to 100 and addition within 10, while the assessment test shows double digit addition is well within the norm?
BTW my child didn’t learn any arithmetic from school.
EDIT: In case I wasn’t clear: there is a huge discrepancy between the standardized tests and the actual curriculum “standards”.
You gotta do what you think is best, but I hope for future you's sake you decide to not pull the money out. Or if you do you have other retirement plans.
I'm trying to help my parents now their at retirement age and am seeing first hand what not planning for your future looks like. They hit retirement with nothing but a small social security check every month. Not even enough to cover rent in most places.
I don't know how much you have in your 401k, but it will be worth literally hundreds of thousands more if you pull it out when you retire. You aren't just paying the penalties now, you're paying for potentially decades of compounding.
You could just buy deep out of money SP500 puts expiring in 1+ year. That way you would be "insured" against the bubble popping.
The thing is, every dollar you spend on insurance is a dollar (and its interest) you lose. Furthermore, we don't know when it will pop. 1 year? 5 years?
The more reasonable solution is probably gradually reduce exposure to US markets by selling SP500 shares and turning to Europe and emerging markets ETFs. No need to cash out 401k.
If you just look at the past 20 years, the US has had exceptional returns compared to the rest of the world.
The thing is, historically, high PE ratios like what we're seeing in the US do not correlate with short term returns that are as high. Expected future returns decrease as the PE ratios go up in a pretty linear fashion.
Why 20 years? Just because we know, post hoc, the usa outperformed other places in the last 20 years, in no way means the next 20 years will be the same.
If you want a different point to backtest from, try Japan in the 80s and early 90s
Sure, and she gets that, but at some point she completely memorizes the stories. She also asks if we can get new books at the store, but they don't make 'em that fast.
Sure, and she already got that lesson when there literally weren't more. Then she got another lesson: we can just make our own. In fact that may be one of the most important lessons to learn: you have agency, and you can use the tools you have as accelerants to better yourself, further increasing your agency.
reply