LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
I can forget one particular thing, but preserve most of my selfidentification information
alexander-gietelink-oldenziel on Examples of Highly Counterfactual Discoveries?Did I just say SLT is the Newtonian gravity of deep learning? Hubris of the highest order!
But also yes... I think I am saying that
This doesn't get into the groundbreaking upcoming new work by Simon-Pepin Lehalleur recovering the RLCT as the asymptotic dimension of jet schemes around which suggest a much more mathematically precise conception of basins and their breadth.
zac-hatfield-dodds on Towards Monosemanticity: Decomposing Language Models With Dictionary LearningIt's a sparse autoencoder because part of the loss function is an L1 penalty encouraging sparsity in the hidden layer. Otherwise, it would indeed learn a simple identity map!
richard_kennaway on Bayesian inference without priorsUsing a discrete hypothesis space avoids big parts of the problem.
Only if there is a "natural" discretisation of the hypothesis space. It's fine for coin tosses and die rolls, but if the problem itself is continuous, different discretisations will give the same problems as different continuous parameterisations.
In general, when infinities naturally arise but cause problems, decreeing that everything must be finite does not solve those problems, and introduces problems of its own.
nathan-young on Difference between European and US healthcare systems [discussion post]This comment may be replied to by anyone.
Other comments are for the discussion group only.
ann-brown on Thoughts on seed oilRaw spinach in particular also has high levels of oxalic acid, which can interfere with the absorption of other nutrients, and cause kidney stones when binding with calcium. Processing it by cooking can reduce its concentration and impact significantly without reducing other nutrients in the spinach as much.
Grinding and blending foods is itself processing. I don't know what impact it has on nutrition, but mechanically speaking, you can imagine digestion proceeding differently depending on how much of it has already been done.
You do need a certain amount of macronutrients each day, and some from fat. You also don't necessarily want to overindulge on every micronutrient. If we're putting a number of olives in our salad equivalent to the amount of olive oil we'd otherwise use, we'll say 100 4g olives, that we've lowered the sodium from by some means to keep that reasonable ... that's 72% of recommended daily value of our iron and 32% of our calcium. We just mentioned that spinach + calcium can be a problem; and the pound of spinach itself contains 67% of iron and 45% of our calcium.
... That's also 460 calories worth of olives. I'm not sure if we've balanced our salad optimally here. Admittedly, if I'm throwing this many olives in with this much spinach in the first place, I'm probably going to cook the spinach, throw in some pesto and grains or grain products, and then I've just added more olive oil back in again ... ;)
And yeah, greens with oil might taste better or be easier to eat than greens just with fatty additions like nuts, seeds, meat, or eggs.
Hey Bogdan, I'd be interested in doing a project on this or at least putting together a proposal we can share to get funding.
I've been brainstorming new directions (with @Quintin Pope [LW · GW]) this past week, and we think it would be good to use/develop some automated interpretability techniques we can then apply to a set of model interventions to see if there are techniques we can use to improve model interpretability (e.g. L1 regularization).
I saw the MAIA paper, too; I'd like to look into it some more.
Anyway, here's a related blurb I wrote:
Project: Regularization Techniques for Enhancing Interpretability and Editability
Explore the effectiveness of different regularization techniques (e.g. L1 regularization, weight pruning, activation sparsity) in improving the interpretability and/or editability of language models, and assess their impact on model performance and alignment. We expect we could apply automated interpretability methods (e.g. MAIA) to this project to test how well the different regularization techniques impact the model.
In some sense, this research is similar to the work Anthropic did with SoLU activation functions. Unfortunately, they needed to add layer norms to make the SoLU models competitive, which seems to have hide away the superposition in other parts of the network, making SoLU unhelpful for making the models more interpretable.
That said, we can increase our ability to interpret these models through regularization techniques. A technique like L1 regularization should help because it encourages the model to learn sparse representations by penalizing non-zero weights or activations. Sparse models tend to be more interpretable as they rely on a smaller set of important features.
Whether this works or not, I'd be interested in making more progress on automated interpretability, in the similar ways you are proposing.
slapstick on Thoughts on seed oilI would consider most bread sold in stores to be processed or ultra processed and I think that's a pretty standard view but it's true there might be some confusion.
Or take traditional soy sauce or cheese or beer or cured meats
I would consider all of those to be processed and unhealthy and I think thats a pretty standard view, but fair enough if there's some confusion around those things.
So as a natural category "ultra processed" is mostly hogwash.
I guess my view is that it's mostly not hogwash?
The least healthy things are clearly and broadly much more processed than the healthiest things.
slapstick on Thoughts on seed oilI typically consume my greens with ground flax seeds in a smoothie.
I feel very confident that adding refined oil to vegetables shouldn't be considered healthy, in the sense that the opportunity cost of 1 Tablespoon of olive oil is 120 calories, which is over a pound of spinach for example. Certainly it's difficult to eat that much spinach and it's probably unwise, but I just say that to illustrate that you can get a lot more nutrition from 120 calories than the oil will be adding, even if it makes the greens more bioavailable.
That said "healthy" is a complicated concept. If adding some oil to greens helps something eat greens they otherwise wouldn't eat for example, that's great.
ann-brown on eggsyntax's ShortformFor the first point, there's also the question of whether 'slightly superhuman' intelligences would actually fit any of our intuitions about ASI or not. There's a bit of an assumption in that we jump headfirst into recursive self-improvement at some point, but if that has diminishing returns, we happen to hit a plateau a bit over human, and it still has notable costs to train, host and run, the impact could still be limited to something not much unlike giving a random set of especially intelligent expert humans the specific powers of the AI system. Additionally, if we happen to set regulations on computation somewhere that allows training of slightly superhuman AIs and not past it ...
Those are definitely systems that are easier to negotiate with, or even consider as agents in a negotiation. There's also a desire specifically not to build them, which might lead to systems with an architecture that isn't like that, but still implementing sentience in some manner. And the potential complication of multiple parts and specific applications a tool-oriented system is likely to be in - it'd be very odd if we decided the language processing center of our own brain was independently sentient/sapient separate from the rest of it, and we should resent its exploitation.
I do think the drive/just a thing it does we're pointing at with 'what the model just does' is distinct from goals as they're traditionally imagined, and indeed I was picturing something more instinctual and automatic than deliberate. In a general sense, though, there is an objective that's being optimized for (predicting the data, whatever that is, generally without losing too much predictive power on other data the trainer doesn't want to lose prediction on).