LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
I wasn't able to find the full video on the site you linked, but I found it here, if anyone else has the same issue:
the-gears-to-ascension on Deep Honestybeing able to credibly commit to doing this at appropriate times seems useful. I wouldn't want to commit to doing it at all times; becoming cooperatebot makes it rational for cooperative-but-preference-misaligned actors to exploit you. Shallow honesty seems like a good starting point for being able to say when you are attempting to be deep honest, perhaps. But for example, I would sure appreciate it if people could be less deeply honest about the path to ai capabilities. I do think the "deeply honest at the meta level" thing has some promise.
mikbp on Extra Tall Cribat that point you should just move to something optimized for being easy to get in and out of, like a bed
yes, yes. Exactly. Isn't it much more practical to put her in a bet/mattress on the floor? That's what we do. Just using the mattress from the crib, for example.
abhimanyu-pallavi-sudhir on Some Experiments I'd Like Someone To Try With An Amnesticit's extremely high immediate value -- it solves IP rights entirely.
It's the barbed wire for IP rights
yori-92 on Can Kauffman's NK Boolean networks make humans swarm?Nice! I actually had this as a loose idea in the back of my mind for a while, to have a network of people connected like this and have them signal to each other their track of the day, which could be actual fun. It is a feasible use case as well. The underlying reasoning is also that (at least for me) I would be more open to adopt an idea from a person with whom you feel a shared sense of collectivity, instead of an algorithm that thinks it knows me. Intrinsically, I want such an algorithm to be wrong, for the sake of my own autonomy :)
The way I see it, the relevance for alignment is to ask: what do we actually mean when saying that two intelligent agents are aligned? Are you and I aligned if we would make the same decision in a trolley problem? Or if we motivate our decisions in the same way? Or if we just don't kill each other? None of these are meaningful indicators of two people being aligned, let alone humans and AI. And with unreliable indicators, will we ever succeed in solving the issue? I'd say two agents are aligned when one agent's most rewarding decision results in a benefit of the other as well. Generalizing and scaling that alignment to many situations and many agents/people necessitates a 'theory of mind' mechanism, as well as a way to keep certain properties invariant under scaling and translation in complex networks. This is really a physicist's way of thinking about the problem and I am just slowly getting into the language that others in the AI/alignment fields use.
jacques-thibodeau on jacquesthibs's ShortformDo we expect future model architectures to be biased toward out-of-context reasoning (reasoning internally rather than in a chain-of-thought)? As in, what kinds of capabilities would lead companies to build models that reason less and less in token-space?
I mean, the first obvious thing would be that you are training the model to internalize some of the reasoning rather than having to pay for the additional tokens each time you want to do complex reasoning.
The thing is, I expect we'll eventually move away from just relying on transformers with scale. And so I'm trying to refine my understanding of the capabilities that are simply bottlenecked in this paradigm, and that model builders will need to resolve through architectural and algorithmic improvements. (Of course, based on my previous posts, I still think data is a big deal.)
Anyway, this kind of thinking eventually leads to the infohazardous area of, "okay then, what does the true AGI setup look like?" This is really annoying because it has alignment implications. If we start to move increasingly towards models that are reasoning outside of token-space, then alignment becomes harder. So, are there capability bottlenecks that eventually get resolved through something that requires out-of-context reasoning?
So far, it seems like the current paradigm will not be an issue on this front. Keep scaling transformers, and you don't really get any big changes in the model's likelihood of using out-of-context reasoning.
This is not limited to out-of-context reasoning. I'm trying to have a better understanding of the (dangerous) properties future models may develop simply as a result of needing to break a capability bottleneck. My worry is that many people end up over-indexing on the current transformer+scale paradigm (and this becomes insufficient for ASI), so they don't work on the right kinds of alignment or governance projects.
---
I'm unsure how big of a deal this architecture will end up being, but the rumoured xLSTM just dropped. It seemingly outperforms other models at the same size:
Maybe it ends up just being another drop in the bucket, but I think we will see more attempts in this direction.
Claude summary:
The key points of the paper are:
This work is important because it presents a path forward for scaling LSTMs to billions of parameters and beyond. By overcoming key limitations of vanilla LSTMs - the inability to revise storage, limited storage capacity, and lack of parallelizability - xLSTMs are positioned as a compelling alternative to transformers for large language modeling.
Instead of doing all computation step-by-step as tokens are processed, advanced models might need to store and manipulate information in a compressed latent space, and then "reason" over those latent representations in a non-sequential way.
The exponential gating with memory mixing introduced in the xLSTM paper directly addresses this need. Here's how:
In this way, the xLSTM takes a significant step towards the kind of "reasoning outside token-space" that I suggested would be important for highly capable models. The memory acts as a workspace for flexible computation that isn't strictly tied to the input token sequence.
Now, this doesn't mean the xLSTM is doing all the kinds of reasoning we might eventually want from an advanced AI system. But it demonstrates a powerful architecture for models to store and manipulate information in a latent space, at a more abstract level than individual tokens. As we scale up this approach, we can expect models to perform more and more "reasoning" in this compressed space rather than via explicit token-level computation.
joseph-bloom on Announcing Neuronpedia: Platform for accelerating research into Sparse AutoencodersNeuronpedia has an API (copying from a recent message Johnny wrote to someone else recently.):
"Docs are coming soon but it's really simple to get JSON output of any feature. just add "/api/feature/" right after "neuronpedia.org".for example, for this feature: https://neuronpedia.org/gpt2-small/0-res-jb/0
the JSON output of it is here: https://www.neuronpedia.org/api/feature/gpt2-small/0-res-jb/0
(both are GET requests so you can do it in your browser)note the additional "/api/feature/"i would prefer you not do this 100,000 times in a loop though - if you'd like a data dump we'd rather give it to you directly."
Feel free to join the OSMI slack and post in the Neuronpedia or Sparse Autoencoder channels if you have similar questions in the future :) https://join.slack.com/t/opensourcemechanistic/shared_invite/zt-1qosyh8g3-9bF3gamhLNJiqCL_QqLFrA
yori-92 on The Social Impact of Trolley ProblemsAlthough I somewhat agree with the comment about style, I feel that the point you're making could be received with some more enthusiasm. How well-recognized is this trolley problem fallacy? The way I see it, the energy spent on thinking about the trolley problem in isolation illustrates innate human short-sightedness and perhaps a clear limit of human intelligence as well. 'Correctly' solving one trolley problem does not prevent that you or someone else will be confronted with the next. My line of arguing is that the question of ethical decision making requires an agent to also have a proper 'theory of mind': if I am making this decision, what decision will a next person or agent have to deal with? If my car with four passengers chooses to avoid running over five people to just hit one, could it also put another oncoming car in the position where they have to choose between a collision with 8 people and evading and killing 5? And of course: whose decisions resulted in the trolley problem I'm currently facing and what is their responsibility? I recently contributed a piece that is essentially about propagating consequences of decisions and I'm curious how it will be received. Could it be that this is a bit of a blind spot in ethics and/or AI safety? Given the situations we've gotten ourselves in as a society, I feel this also is an area in which humans can very easily be outsmarted...
johannes-c-mayer on Atoms to Agents Proto-LecturesI made a slightly improved version that adds subtitles and skips silence.
johannes-c-mayer on Applied Linear Algebra Lecture SeriesMade a slightly improved version.