LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

morpheus on Thoughts on seed oil

Need to know.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Decomposability seems like a fundamental assumption for interpretability and condition for it to succeed. E.g. from Toy Models of Superposition:

'Decomposability: Neural network activations which are decomposable can be decomposed into features, the meaning of which is not dependent on the value of other features. (This property is ultimately the most important – see the role of decomposition in defeating the curse of dimensionality.) [...]

The first two (decomposability and linearity) are properties we hypothesize to be widespread, while the latter (non-superposition and basis-aligned) are properties we believe only sometimes occur.'

If this assumption is true, it seems favorable to the prospects of safely automating large parts of interpretability work (e.g. using [V]LM agents like MAIA) differentially sooner than many other alignment research subareas and than the most consequential capabilities research (and likely before AIs become x-risky, e.g. before they pass many dangerous capabilities evals). For example, in a t-AGI framework [AF · GW], using an interpretability LM agent to search for the feature corresponding to a certain semantic direction should be much shorter horizon than e.g. coming up with a new conceptual alignment agenda or coming up with a new ML architecture (as well as having much faster feedback loops than e.g. training a SOTA LM using a new architecture).

slapstick on Thoughts on seed oil

If you only eat potatoes you wouldn't die from lack of sodium, the average person would probably become healthier only eating potatoes, it's been done, though I'm not endorsing that. Potatoes and water already have sodium in them, maybe not quite at the ideal ratio per average calorie but it's pretty close or maybe in that range depending on the person.

We certainly need some sodium/salt but I think the extent to which most people crave salt is a result of miscalibration due to overexposure and adaptations which aren't aligned with our current environment.

I minimize added sodium and I don't really have any cravings for salt anymore, unless you count the cravings I have generally for the food/nutrition I need to sustain myself, which contains roughly enough sodium naturally.

If someone is eating a varied diet of whole foods with no added salt it's possible that adding a very marginal amount of extra salt would be healthier in some cases, but that's very far from what is typical.

jkaufman on Extra Tall Crib

The reason I want to stick with a crib over a bed or floor mattress (and I assume the reason most people use cribs) is that it keeps them in their bed during the time they're supposed to be sleeping.

jeffreycaruso on RobertM's Shortform

Have you read this? https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/

"“You can’t have these AI companies jumping through hoops in each and every single different jurisdiction, and from our point of view of course our principal relationship is with the U.S. AI Safety Institute,” Meta’s president of global affairs Nick Clegg — a former British deputy prime minister — told POLITICO on the sidelines of an event in London this month."

"OpenAI and Meta are set to roll out their next batch of AI models imminently. Yet neither has granted access to the U.K.’s AI Safety Institute to do pre-release testing, according to four people close to the matter."

"Leading AI firm Anthropic, which rolled out its latest batch of models in March, has yet to allow the U.K. institute to test its models pre-release, though co-founder Jack Clark told POLITICO it is working with the body on how pre-deployment testing by governments might work.

“Pre-deployment testing is a nice idea but very difficult to implement,” said Clark."

slapstick on Thoughts on seed oil

I agree that seed oils should be avoided yes. I am skeptical of explanations pointing to some element particular to seed oils that is the main source of obesity and health problems, and I'd be worried this might lead people to be less concerned about consuming other unhealthy things.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Selected fragments (though not really cherry-picked, no reruns) of a conversation with Claude Opus on operationalizing something like Activation vector steering with BCI by applying the methodology of Concept Algebra for (Score-Based) Text-Controlled Generative Models
to the model from High-resolution image reconstruction with latent diffusion models from human brain activity (website with nice illustrations of the model).

My prompts bolded:

'Could we do concept algebra directly on the fMRI of the higher visual cortex?
Yes, in principle, it should be possible to perform concept algebra directly on the fMRI signals from the higher visual cortex. The key idea would be to identify subspaces in the fMRI signal space that correspond to specific semantic concepts, and then manipulate these subspaces to control the reconstructed image.
Here's a hypothetical workflow:

Collect fMRI data while the subject views a large set of images that vary along certain semantic dimensions (e.g., gender, emotion, object category, scene type, etc.).
For each semantic dimension, identify a set of fMRI samples that differ only along that dimension. For example, for the gender dimension, you might compare the fMRI signals for "man" vs "woman", keeping other aspects of the image content constant.
Use these samples to estimate a "semantic subspace" for each dimension in the fMRI signal space, similar to how concept algebra estimates these subspaces in the latent space of the diffusion model. You could use techniques like PCA or linear discriminant analysis to identify the directions in fMRI space that best capture the variation along each semantic dimension.
To reconstruct an image, first predict the full fMRI signal from the early visual cortex (similar to predicting z in Takagi et al.).
Then, to manipulate a specific semantic concept, project the predicted fMRI signal onto the orthogonal complement of the corresponding semantic subspace (to remove the original semantic information), and add the desired semantic component from a reference fMRI signal (similar to the concept algebra manipulation of z_c).
Finally, use this modified fMRI signal from the higher visual cortex to condition the diffusion model, reconstructing an image with the desired semantic manipulation.

The main challenges would be:

Collecting a sufficiently large and diverse fMRI dataset to robustly estimate the semantic subspaces.
Ensuring that the estimated subspaces are truly orthogonal and capture the desired semantic variation.
Translating the manipulated fMRI signals into a format that can effectively condition the diffusion model.

Nonetheless, this is an intriguing direction that could potentially unite the biological understanding from Takagi et al.'s work with the interpretable semantic manipulation of concept algebra. It suggests a way to manipulate images directly from brain-space, without the need for text prompts. This could provide insights into how the brain itself represents and manipulates semantic concepts.'

'How might the above be relevant to the AI alignment problem?
The hypothetical approach of performing concept algebra directly on fMRI signals from the higher visual cortex could potentially offer some interesting insights relevant to the AI alignment problem:

Understanding AI systems: By comparing the semantic subspaces identified in fMRI signals with those found in the latent spaces of AI models (like the diffusion model in concept algebra), we could gain insights into how AI systems represent and manipulate semantic concepts, and how this compares to the human brain. This could help in designing AI systems that reason in ways more analogous to humans, which could be beneficial for alignment.
Interpretability and control: The ability to identify and manipulate specific semantic concepts directly in the space of brain activity could provide a more direct and interpretable way of controlling the output of AI systems. If we can align the semantic subspaces of the AI with those of the human brain, it might become possible to control the AI's outputs by manipulating these subspaces, leading to more interpretable and controllable AI systems.
Value alignment: By understanding how human values and preferences are represented in the brain, and how they relate to the semantic concepts identified through concept algebra, we might gain insights into how to align AI systems with human values. If we can identify the neural subspaces corresponding to human values and ethics, we could potentially use this information to constrain the AI's outputs to align with these values.
Cognitive modeling: This approach could also contribute to the development of more accurate cognitive models of human reasoning and decision-making. By understanding how semantic concepts are represented and manipulated in the brain, we can build AI systems that more closely mimic human cognition, which could be useful for predicting and understanding human behavior in the context of AI alignment.
Neuro-symbolic AI: The idea of manipulating semantic concepts directly in the space of brain activity is reminiscent of neuro-symbolic AI approaches, which aim to combine the strengths of neural networks and symbolic reasoning. Insights from this fMRI-based concept algebra could potentially inform the development of neuro-symbolic AI systems that are more interpretable and aligned with human cognition.

However, it's important to note that these are speculative connections, and there would be significant technical and ethical challenges in realizing this approach. Nonetheless, exploring the parallels between the structure of human thought and the structure of AI representations could be a valuable direction for informing AI alignment research.'

'How would this be relevant in the context of this proposal? [note: the proposal being Activation vector steering with BCI]
The proposed concept algebra on fMRI signals could be highly relevant to the goals outlined in this funding proposal. Specifically:

The proposal aims to find mappings between human brain states and the latent representations of language models. Applying concept algebra directly on fMRI signals, as discussed, could provide a more direct and interpretable way to establish these mappings. By identifying semantic subspaces in the fMRI signals and aligning them with subspaces in the language model's latent space, we could potentially "steer" the language model to exhibit brain states similar to specific people.
The proposal suggests that reward models could be trained to reproduce humanlike brain states during evaluation, to improve their generalization. If concept algebra on fMRI can successfully manipulate semantic concepts in the reconstructed outputs (e.g., text or images), this could provide a way to directly optimize the reward models to produce outputs that match desired brain states.
The proposal aims to advance scientific understanding of analogies between LLM behavior and human behavior. Comparing the semantic subspaces identified in fMRI with those in LLMs, as enabled by concept algebra, could provide insights into these analogies at a more fundamental, representational level.
The proposed fMRI experiments could potentially provide the necessary data for attempting concept algebra on brain signals. The proposal includes plans for designing and implementing fMRI data-collection protocols, recruiting subjects, and administering the experiments. This data could be used to estimate the semantic subspaces needed for concept algebra.
The ML experiments proposed, including the fMRI feature extraction pipeline and architecture modifications, could potentially be adapted to implement concept algebra on the fMRI signals, in addition to the original goal of mapping brain states to LLM representations.

In summary, while the original proposal does not explicitly mention concept algebra, the ideas discussed around applying concept algebra to fMRI signals could potentially enhance and extend the proposal's goals of connecting human brain states with AI representations, improving reward model generalization, and advancing scientific understanding of human-AI analogies. The proposed fMRI experiments could provide the necessary data, and the planned ML experiments could be extended to include concept algebra on the brain signals.'

'Above you wrote: 'Use these samples to estimate a "semantic subspace" for each dimension in the fMRI signal space, similar to how concept algebra estimates these subspaces in the latent space of the diffusion model. You could use techniques like PCA or linear discriminant analysis to identify the directions in fMRI space that best capture the variation along each semantic dimension.' What if instead I just took a large diverse dataset of fMRI of images of men and fMRI of images of women, took the means of the two, substracted them, and used that as the sex direction in concept algebra?
Your proposed approach of taking the difference between the mean fMRI signals for men and women to define the "sex direction" is a great idea and could indeed work as a simple alternative to more complex methods like PCA or LDA.
Here's why this approach makes sense:

If the fMRI signals for men and women differ primarily along a single direction in the high-dimensional fMRI space, then the difference between their means will point along this direction.
By using a large and diverse dataset, you're essentially averaging out other sources of variation (e.g., individual differences, noise), leaving only the consistent difference between men and women.
This is analogous to how concept algebra identifies concept directions in the latent space of the diffusion model: by taking the difference between the latent representations of prompts that differ only in the target concept.

To implement this:

Collect a large dataset of fMRI signals for images of men and women, ensuring diversity in other aspects (age, race, emotion, background, etc.).
Compute the mean fMRI signal for men (m_men) and women (m_women) separately.
Compute the "sex direction" as: v_sex = m_women - m_men.
To manipulate the sex of a reconstructed image, you could then project the fMRI signal onto the orthogonal complement of v_sex (to remove the original sex information), and then add or subtract a scaled version of v_sex to introduce the desired sex information.

The main advantages of this approach over PCA or LDA are simplicity and interpretability. It doesn't require any complex eigendecompositions or model fitting, and the resulting direction has a clear interpretation as the difference between men and women.
Potential limitations to keep in mind:

If there are multiple independent directions along which men and women differ, this approach will only capture their average, not the individual directions.
If there are significant interactions between sex and other factors (e.g., if the sex difference is itself different for different ages or emotions), this simple difference may not fully capture the sex effect.
The approach assumes linearity - that the sex effect is a simple additive shift in fMRI space. If there are nonlinear interactions, this linear shift may not fully capture the effect.

Despite these potential limitations, I think this is a promising and elegantly simple approach to try for implementing concept algebra directly on fMRI signals. It's a great starting point that could be refined with more sophisticated methods if needed.'

sparr on Cohabitive Games so Far

fully cooperative: Hanabi, The Crew, The Captain Is Dead, Pandemic

partially cooperative: Red November, Betrayal at House on the Hill (original and Legacy), Dead of Winter, Gloomhaven

competitive [...] strongly mutually beneficial deals: Catan, Diplomacy, 18XX (and almost any other game where players own stock in each other's positions)

competitive [...] placement in each round matters: Power Grid, 18XX,

competitive [...] not usually permanent alliances are critical to victory: Diplomacy, Twilight Imperium (all of them), Cosmic Encounter

These are just the ones I've played in recent memory. I'd wager I can name 20 games in each category, with some overlap like above, with more time to think and research.

kabir-kumar-1 on Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Thank you. We just had some writers join, who're, among other things, going to make an up to date About Us section. some out of date stuff is available on https://aiplans.substack.com
Something that we use internally is: https://docs.google.com/document/d/1wcVlWRTKJqiXOvKNl6PMHCBF3pQItCCcnYwlWvGgFpc/edit?usp=sharing

We're primarily focused on making a site rebuild atm, which has a lot of new and improved features users have been asking for. Preview (lots of form factor stuff broken atm) at: https://ai-plans-site.pages.dev/

decaeneus on Decaeneus's Shortform

Perhaps that can work depending on the circumstances. In the specific case of a toddler, at the risk of not giving him enough credit, I think that type of distinction is too nuanced. I suspect that in practice this will simply make him litigate every particular application of any given rule (since it gives him hope that it might work) which raises the cost of enforcement dramatically. Potentially it might also make him more stressed, as I think there's something very mentally soothing / non-taxing about bright line rules.

I think with older kids though, it's obviously a really important learning to understand that the letter of the law and the spirit of the law do not always coincide. There's a bit of a blackpill that comes with that though, once you understand that people can get away with violating the spirit as long as they comply with the letter, or that complying with the spirit (which you can grok more easily) does not always guarantee compliance with the letter, which puts you at risk of getting in trouble.

LessWrong 2.0 Reader

Archive

Recent comments