Most of the memory cost of a POSIX thread is in the stack, and you can customize the stack size to be quite small. Small stacks are properly thought of as a property that GC enables, not a property that M:N threading enables.
> Most of the memory cost of a POSIX thread is in the stack, and you can customize the stack size to be quite small.
The problem there is that you need to very carefully size your stack as a mis-sizing will lead to a risky stack overflow. I'm not sure it's necessary either as allocating a "large stack" but using very little of it means most of it is never committed, and thus only costs memory mappings.
Is there any systems where the C stack is growable? Do stack frames get prefixed with an explicit request for some amount of stack memory, leading to the stack possibly being moved before the funcall happens?
We used to have growable C stacks in Rust using that technique. They worked (though were too slow for us). It could have been fixed by using stack copying like Go does.
As I recall the minimum total user + kernel stack size is 10kB in the Linux kernel, so 1M threads is 10GB of space. It should be doable, though you will probably have to bump up kernel limits.
A million threads is an extreme case, though. No system can reliably spawn that many threads that are actually doing something interesting without a very large amount of memory. When you leave the realm of microbenchmarks you have to expect that an unknown quantity of threads will have deep call stacks at any given time, so you really need to give yourself leeway to avoid the risk of OOM.
So a big advantage that Go's M:N goroutine model brings to the table is how cheap they are. Cheap enough that tons of concurrency-related stuff you want to do in application code, like implementing a highly-concurrent algorithm, can be done with goroutines directly, without having to think too hard about mechanical sympathy and e.g. translate logical concurrency to physical threading. Go processes commonly have 1M or even 10M active goroutines at once.
So I don't think it's fair to say POSIX threads are comparable or whatever if they don't have this property.
Go processes do not typically have 1M or 10M active goroutines at once. The initial stack size for goroutines is 2kB, so 10M goroutines would mean 20GB just for stacks, even assuming that the goroutines never grow their stack (which cannot be assumed for anything nontrivial). The 2kB minimum stack size is on the same order of magnitude as the 10kB POSIX thread stack size.