LessWrong 2.0 Reader
View: New · Old · Topnext page (older posts) →
next page (older posts) →
Unclear why this is supposed to be a scary result.
"If prompting a model to do something bad generalizes to it being bad in other domains, this is also evidence for the idea that prompting a model to do something good will generalize to it doing good in other domains" - Matthew Barnett
buck on Express interest in an "FHI of the West"I'll also note that if you want to show up anywhere in the world and get good takes from people on the "how aliens might build AGI" question, Constellation might currently be the best bet (especially if you're interested in decision-relevant questions about this).
eggsyntax on Transformers Represent Belief State Geometry in their Residual StreamI struggled with the notation on the figures; this comment tries to clarify a few points for anyone else who may be confused by it.
@Adam Shai [LW · GW] please correct me if I got any of that wrong!
Here's the details if you want them after you've understood the rest. Each node label represents some path that could be taken to that node (& not to other nodes), but there can be multiple such paths. For example, n_11 could also be labeled as n_010, because those are both sequences that could have left us in that state. So as we take some path through the Mixed-State Presentation, we build up a path. If we start at n_s and follow the 1 path, we get to n_1. If we follow the 0 path, we reach n_10. If we then follow the 0 path, the next node could be called n_100, reflecting the path we've taken. But in fact any path that goes through 00 will reach that node, so it's just labeled n_00. So initially it seems as though we can just append the symbol emitted by whichever path we take, but often there's some step where that breaks down and you get what initially seems like a totally random different label.
(I work out of Constellation and am closely connected to the org in a bunch of ways)
I think you're right that most people at Constellation aren't going to seriously and carefully engage with the aliens-building-AGI question, but I think describing it as a difference in culture is missing the biggest factor leading to the difference: most of the people who work at Constellation are employed to do something other than the classic FHI activity of "self-directed research on any topic", so obviously aren't as inclined to engage deeply with it.
I think there also is a cultural difference, but my guess is that it's smaller than the effect from difference in typical jobs.
wassname on Evolution did a surprising good job at aligning humans...to social statusWe establish institutions to channel and utilize status-seeking behavior by putting us in status conscious groups where we have ceremonies and titles that draw our attention to status. This work! Is it more effective to educate a child individually or in a group of peers? Is it easier to lead a solitary soldier or a whole squad? Do people seek a promotion or a pay rise?
From this perspective, our culture and inclination for seeking status have developed in tandem, making it challenging to determine which influences the other more. However, it appears that culture progresses more rapidly than genes, suggesting that culture conforms to our genes, rather than the reverse.
We also waste a lot of effort on status, which seems like a nonfunctional drive. People will compete for high status professions like musician, streamer, celebrity and most will fail, which makes it seem like an unwise investment of time. This seems misaligned, as it's not adaptive.
adam-shai on Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An ExplainerThanks John and David for this post! This post has really helped people to understand the full story. I'm especially interested in thinking more about plans for how this type of work can be helpful for AI safety. I do think the one you presented here is a great one, but I hope there are other potential pathways. I have some ideas, which I'll present in a post soon, but my views on this are still evolving.
adam-shai on Transformers Represent Belief State Geometry in their Residual StreamThanks! I'll have more thorough results to share about layer-wise reprsentations of the MSP soon. I've already run some of the analysis concatenating over all layers residual streams with RRXOR process and it is quite interesting. It seems there's a lot more to explore with the relationship between number of states in the generative model, number of layers in the transformer, residual stream dimension, and token vocab size. All of these (I think) play some role in how the MSP is represented in the transformer. For RRXOR it is the case that things look crisper when concatenating.
Even for cases where redundant info is discarded, we should be able to see the distinctions somewhere in the transformer. One thing I'm keen on really exploring is such a case, where we can very concretely follow the path/circuit through which redundant info is first distinguished and then is collapsed.
dagon on Experiment on repeating choicesFor me, these topics seem extremely contextual and variable with the situation and specifics of the tradeoff in the moment. For many of them, I do somewhat frequently explore consciously what it might feel like (and for cheap ones, try out) to make a different tradeoff, but those experiments don't generalize well.
I suspect that for the impactful ones (heavily repeated or large), your first two bullet points don't apply - feedback is delayed from the decision, and if harmful, it will be significant.
Still, it's VERY GOOD to be reminded that these decisions are mostly made by type-1 thinking, out of habit or instinct (aka deep/early learning) that deserves reconsideration from time to time.
lost-futures on AI #60: Oh the HumanityThe Devin mishap is a reminder of how tricky it often is for the general public to gauge what's currently possible and what isn't for AI. A lot of people, including myself, assumed the claimed performance was legitimate. No doubt many AI startups like Devin are waiting for the rising tide of improving foundational models to make their ideas feasible. I wonder how many are engaging in similar deceptive marketing tactics or will do so in the future.
romeostevensit on Raemon's ShortformCRPGs with a lot of open world dynamics might work, where the goal is for the person to identify the most important experiments to run in a limited time window in order to manmax certain stats.