Hacker Newsnew | past | comments | ask | show | jobs | submit | 331c8c71's commentslogin

These are not bankers, but the culture is still bonkers

Interesting.

I am wondering why would anyone use a t-test when the experiment is clearly modelled by a binomial distribution: 250 independent questions and each one is either answered correctly or not (the null is that the success rate is the same).


The methods could be better described in the paper, but my understanding is that they did 10 runs for each question for each prompt and took an average of those, so the compared values are not binary. You could do a sign test, but you'd lose power and answer a bit different question.

You can do a generalised mixed effects linear model with binomial outcome (ie a binomial test but with added random effects structure). But unless you want to introduce a richer random effects structure with more variables, it is overkill and overcomplicating things, and the result should be the same as t-tests.

I don't know much about stats, but does "the null is that the success rate is the same" imply that it's a sketchy methodology because they can come up with some findings ("ruder prompts are better/worse!") more often?

You are asking about one-sided vs two-sided tests. Not really "more often" because formal type 1 error rate is still the same. I'd say two-sided tests leave more space for post-hoc theorizing but there are valid situations when there is no clear one-sided hypothesis a priori. Do we really know whether that the hypothesis should have been "ruder prompts are better"?

I'd say this is benign compared to other ways of (mis)using statistics e.g. looking which way the difference goes and then running one-sided tests or tweaking the setup until one gets "significant" p vals.

EDIT: I looked in the paper again and noticed that they actually did pairwise t-test on all possible combinations of tones. They should have adjusted for multiple testing since they are doing 10 tests (choose 2 from 10) and not one.


That's the usual null hypothesis for these kinds of tests.

10k only??? Incomparable to the value delivered any way you measure it...


Yeah, that's pocket-change for NVIDIA, doesn't sound legit.


It's suddenly you who's deciding for others what's stupid?


Exponential productivity gains?;)


> Customers dictate what gets produced.

Sure? It seems to me that the companies dictate what I consume. Many many times I wanted to buy exactly the same clothes item or shoes to replace an old one (because I know exactly how it'd fit and wear) only to discover it has been discontinued with no obvious "heir". Sometimes only 6 months later...

Whats the percentage of people chasing "fashion", especially after mid 30s?


More accurate to say that it's the other customers that dictate what you consume, by out voting you with their wallet.


Why massive discounts seem to be much more of a thing in the US compared to Europe?


ra is autoimmune


If anything focus gets better without sugar and excessive carbs for me - but those work well for outdoors or workouty days I find.


Definitely, carbs means alternating drowsy, hunger cycles with blood sugar level. While an even level enables the zone.


Looks like it's true that low-carb adapted athletes rely more on fat oxidation during exercise but performance suffers nonetheless because of increased oxygen demands that basically cannot be met.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: