LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
Decomposability seems like a fundamental assumption for interpretability and condition for it to succeed. E.g. from Toy Models of Superposition:
'Decomposability: Neural network activations which are decomposable can be decomposed into features, the meaning of which is not dependent on the value of other features. (This property is ultimately the most important – see the role of decomposition in defeating the curse of dimensionality.) [...]
The first two (decomposability and linearity) are properties we hypothesize to be widespread, while the latter (non-superposition and basis-aligned) are properties we believe only sometimes occur.'
If this assumption is true, it seems favorable to the prospects of safely automating large parts of interpretability work (e.g. using [V]LM agents like MAIA) differentially sooner than many other alignment research subareas and than the most consequential capabilities research (and likely before AIs become x-risky, e.g. before they pass many dangerous capabilities evals). For example, in a t-AGI framework [AF · GW], using an interpretability LM agent to search for the feature corresponding to a certain semantic direction should be much shorter horizon than e.g. coming up with a new conceptual alignment agenda or coming up with a new ML architecture (as well as having much faster feedback loops than e.g. training a SOTA LM using a new architecture).
slapstick on Thoughts on seed oilIf you only eat potatoes you wouldn't die from lack of sodium, the average person would probably become healthier only eating potatoes, it's been done, though I'm not endorsing that. Potatoes and water already have sodium in them, maybe not quite at the ideal ratio per average calorie but it's pretty close or maybe in that range depending on the person.
We certainly need some sodium/salt but I think the extent to which most people crave salt is a result of miscalibration due to overexposure and adaptations which aren't aligned with our current environment.
I minimize added sodium and I don't really have any cravings for salt anymore, unless you count the cravings I have generally for the food/nutrition I need to sustain myself, which contains roughly enough sodium naturally.
If someone is eating a varied diet of whole foods with no added salt it's possible that adding a very marginal amount of extra salt would be healthier in some cases, but that's very far from what is typical.
jkaufman on Extra Tall CribThe reason I want to stick with a crib over a bed or floor mattress (and I assume the reason most people use cribs) is that it keeps them in their bed during the time they're supposed to be sleeping.
jeffreycaruso on RobertM's ShortformHave you read this? https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/
"“You can’t have these AI companies jumping through hoops in each and every single different jurisdiction, and from our point of view of course our principal relationship is with the U.S. AI Safety Institute,” Meta’s president of global affairs Nick Clegg — a former British deputy prime minister — told POLITICO on the sidelines of an event in London this month."
"OpenAI and Meta are set to roll out their next batch of AI models imminently. Yet neither has granted access to the U.K.’s AI Safety Institute to do pre-release testing, according to four people close to the matter."
"Leading AI firm Anthropic, which rolled out its latest batch of models in March, has yet to allow the U.K. institute to test its models pre-release, though co-founder Jack Clark told POLITICO it is working with the body on how pre-deployment testing by governments might work.
“Pre-deployment testing is a nice idea but very difficult to implement,” said Clark."
slapstick on Thoughts on seed oilI agree that seed oils should be avoided yes. I am skeptical of explanations pointing to some element particular to seed oils that is the main source of obesity and health problems, and I'd be worried this might lead people to be less concerned about consuming other unhealthy things.
bogdan-ionut-cirstea on Bogdan Ionut Cirstea's ShortformSelected fragments (though not really cherry-picked, no reruns) of a conversation with Claude Opus on operationalizing something like Activation vector steering with BCI by applying the methodology of Concept Algebra for (Score-Based) Text-Controlled Generative Models
to the model from High-resolution image reconstruction with latent diffusion models from human brain activity (website with nice illustrations of the model).
My prompts bolded:
'Could we do concept algebra directly on the fMRI of the higher visual cortex?
Yes, in principle, it should be possible to perform concept algebra directly on the fMRI signals from the higher visual cortex. The key idea would be to identify subspaces in the fMRI signal space that correspond to specific semantic concepts, and then manipulate these subspaces to control the reconstructed image.
Here's a hypothetical workflow:
The main challenges would be:
Nonetheless, this is an intriguing direction that could potentially unite the biological understanding from Takagi et al.'s work with the interpretable semantic manipulation of concept algebra. It suggests a way to manipulate images directly from brain-space, without the need for text prompts. This could provide insights into how the brain itself represents and manipulates semantic concepts.'
'How might the above be relevant to the AI alignment problem?
The hypothetical approach of performing concept algebra directly on fMRI signals from the higher visual cortex could potentially offer some interesting insights relevant to the AI alignment problem:
However, it's important to note that these are speculative connections, and there would be significant technical and ethical challenges in realizing this approach. Nonetheless, exploring the parallels between the structure of human thought and the structure of AI representations could be a valuable direction for informing AI alignment research.'
'How would this be relevant in the context of this proposal? [note: the proposal being Activation vector steering with BCI]
The proposed concept algebra on fMRI signals could be highly relevant to the goals outlined in this funding proposal. Specifically:
In summary, while the original proposal does not explicitly mention concept algebra, the ideas discussed around applying concept algebra to fMRI signals could potentially enhance and extend the proposal's goals of connecting human brain states with AI representations, improving reward model generalization, and advancing scientific understanding of human-AI analogies. The proposed fMRI experiments could provide the necessary data, and the planned ML experiments could be extended to include concept algebra on the brain signals.'
'Above you wrote: 'Use these samples to estimate a "semantic subspace" for each dimension in the fMRI signal space, similar to how concept algebra estimates these subspaces in the latent space of the diffusion model. You could use techniques like PCA or linear discriminant analysis to identify the directions in fMRI space that best capture the variation along each semantic dimension.' What if instead I just took a large diverse dataset of fMRI of images of men and fMRI of images of women, took the means of the two, substracted them, and used that as the sex direction in concept algebra?
Your proposed approach of taking the difference between the mean fMRI signals for men and women to define the "sex direction" is a great idea and could indeed work as a simple alternative to more complex methods like PCA or LDA.
Here's why this approach makes sense:
To implement this:
The main advantages of this approach over PCA or LDA are simplicity and interpretability. It doesn't require any complex eigendecompositions or model fitting, and the resulting direction has a clear interpretation as the difference between men and women.
Potential limitations to keep in mind:
Despite these potential limitations, I think this is a promising and elegantly simple approach to try for implementing concept algebra directly on fMRI signals. It's a great starting point that could be refined with more sophisticated methods if needed.'
sparr on Cohabitive Games so Farfully cooperative: Hanabi, The Crew, The Captain Is Dead, Pandemic
partially cooperative: Red November, Betrayal at House on the Hill (original and Legacy), Dead of Winter, Gloomhaven
competitive [...] strongly mutually beneficial deals: Catan, Diplomacy, 18XX (and almost any other game where players own stock in each other's positions)
competitive [...] placement in each round matters: Power Grid, 18XX,
competitive [...] not usually permanent alliances are critical to victory: Diplomacy, Twilight Imperium (all of them), Cosmic Encounter
These are just the ones I've played in recent memory. I'd wager I can name 20 games in each category, with some overlap like above, with more time to think and research.
Thank you. We just had some writers join, who're, among other things, going to make an up to date About Us section. some out of date stuff is available on https://aiplans.substack.com
Something that we use internally is: https://docs.google.com/document/d/1wcVlWRTKJqiXOvKNl6PMHCBF3pQItCCcnYwlWvGgFpc/edit?usp=sharing
We're primarily focused on making a site rebuild atm, which has a lot of new and improved features users have been asking for. Preview (lots of form factor stuff broken atm) at: https://ai-plans-site.pages.dev/
Perhaps that can work depending on the circumstances. In the specific case of a toddler, at the risk of not giving him enough credit, I think that type of distinction is too nuanced. I suspect that in practice this will simply make him litigate every particular application of any given rule (since it gives him hope that it might work) which raises the cost of enforcement dramatically. Potentially it might also make him more stressed, as I think there's something very mentally soothing / non-taxing about bright line rules.
I think with older kids though, it's obviously a really important learning to understand that the letter of the law and the spirit of the law do not always coincide. There's a bit of a blackpill that comes with that though, once you understand that people can get away with violating the spirit as long as they comply with the letter, or that complying with the spirit (which you can grok more easily) does not always guarantee compliance with the letter, which puts you at risk of getting in trouble.