LessWrong 2.0 Reader
View: New · Old · Topnext page (older posts) →
next page (older posts) →
I agree. This is unfortunately often done in various fields of research where familiar terms are reused as technical terms.
For example, in ordinary language "organic" means "of biological origin", while in chemistry "organic" describes a type of carbon compound. Those two definitions mostly coincide on Earth (most such compounds are of biological origin), but when astronomers announce they have found "organic" material on an asteroid this leads to confusion.
maxime-riche on Scaling of AI training runs will slow down after GPT-5Thank for the great comment!
Do we know if distributed training is expected to scale well to GPT-6 size models (100 trillions parameters) trained over like 20 data centers? How does the communication cost scale with the size of the model and the number of data centers? Linearly on both?
After reading for 3 min this:
Google Cloud demonstrates the world’s largest distributed training job for large language models across 50000+ TPU v5e chips (Google November 2023). It seems that scaling is working efficiently at least up to 50k GPUs (GPT-6 would be like 2.5M GPUs). There are also some surprising linear increases in start time with the number of GPUs, 13min for 32k GPUs. What is the SOTA?
I don't think staging a civil war is generally a good way of saving lives. Moreover, ordinary aging has about a 100% chance of "killing literally everyone" prematurely, so it's unclear to me what moral distinction you're trying to make in your comment. It's possible you think that:
In the case of (1) I'm not sure I share the intuition. Being forced to die from old age seems, if anything, worse than being forced to die from AI, since it is long and drawn-out, and presumably more painful than death from AI. You might also think about this dilemma in terms of act vs. omission, but I am not convinced there's a clear asymmetry here.
In the case of (2), whether AI takeover is worse depends on how bad you think an "AI civilization" would be in the absence of humans. I recently wrote a post [EA · GW] about some reasons to think that it wouldn't be much worse than a human civilization.
In any case, I think both scenarios are comparisons between "everyone literally dies" vs. "everyone literally dies but in a different way". So I don't think it's clear that pushing for one over the other makes someone a "Dark Lord", in the morally relevant sense, compared to the alternative.
interstice on Spatial attention as a “tell” for empathetic simulation?Tangentially related: some advanced meditators report that their sense that perception has a center vanishes at a certain point along the meditative path, and this is associated with a reduction in suffering.
johannes-c-mayer on Johannes C. Mayer's ShortformYou write:
…But I think people can be afraid of heights without past experience of falling…
I have seen it claimed that crawling-age babies are afraid of heights, in that they will not crawl from a solid floor to a glass platform over a yawning gulf. And they’ve never fallen into a yawning gulf. At that age, probably all the heights they’ve fallen from have been harmless, since the typical baby is both bouncy and close to the ground.
nathan-young on Nathan Young's ShortformI’m discussing with Carson. I might change my mind but i don’t know that i’ll argue with both of you at once.
alexander-gietelink-oldenziel on Why I stopped being into basin broadnessThis is all answered very elegantly by singular learning theory.
You seem to have a strong math background! I really encourage you take the time and really study the details of SLT. :-)
alexander-gietelink-oldenziel on Examples of Highly Counterfactual Discoveries?I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions.
The story that symmetries mean that the parameter-to-function map is not injective is true but already well-understood outside of SLT. It is a common misconception that this is what SLT amounts to.
To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training.
The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy.
I don't have the time to recap this story here.
richard_kennaway on Breadboarding a Whistle SynthThis means C2 should be 8.4µF, but I didn't have one so I used a 4.7µF and 3.3µF in series for a total of 8µF.
You want those in parallel for them to add. The series combination (which I see in the breadboard pic, not just the text) is only 2µF, making your high-pass frequency a little over 10kHz.