LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

radford-neal-1 on AI #62: Too Soon to Tell

Regarding Cortez and the Aztecs, it is of interest to note that Cortez's indigenous allies (enemies of the Aztecs) actually ended up in a fairly good position afterwards.

From https://en.wikipedia.org/wiki/Tlaxcala

For the most part, the Spanish kept their promise to the Tlaxcalans. Unlike Tenochtitlan and other cities, Tlaxcala was not destroyed after the Conquest. They also allowed many Tlaxcalans to retain their indigenous names. The Tlaxcalans were mostly able to keep their traditional form of government.

raemon on Buck's Shortform

It implies that AI control is organizationally simpler, because most applications can be made trivially controlled.

I didn't get this from the premises fwiw. Are you saying it's trivial because "just don't use your AI to help you design AI" (seems organizationally hard to me), or did you have particular tricks in mind?

romeostevensit on How would you navigate a severe financial emergency with no help or resources?

Finding a loan to move to somewhere with jobs is probably your best bet. This may devolve to begging amongst any social circles as well, which is a big pride hit. Many probably won't believe you will wind up with the means to pay it back. Minimizing the cost of the move by getting rid of non essential belongings. It is probably somewhat easier these days to line up a far away job via zoom interviewing. Quantity over quality.

One thing that I think is non-obvious: if you lay out the case for the loan in detail, that demonstrates intelligence and conscientiousness and will increase people's sense that you are doing something useful and thus willingness to lend. Basically, treat it as a business case, but the business happens to just be getting you and your partner employed. Show past cash flow, expected future cash flow given salaries of the sorts of places you are applying to, estimates of how many jobs you can reasonably apply to, how much of a monthly payment you could pay back, etc. Use chatgpt for help with outlining this.

romeostevensit on Which skincare products are evidence-based?

'Evidence based' in the skincare industry mostly means a company paid a consultancy to do 'scientific testing.' Very little shows any signs of actual large effect sizes other than blocking UV damage, afaik. Moisturizer does seem to help, especially in low humidity climes. Which moisturizer your skin responds best to seems to be trial and error, ie the detectable/subjective metrics are all we really have. Retinoids are one exception, but come with tradeoffs that I don't fully understand.

I settled on snail mucin as a more natural alternative, but a significant fraction of people have mild allergic reactions to it apparently.

buck on Buck's Shortform

[epistemic status: I think I’m mostly right about the main thrust here, but probably some of the specific arguments below are wrong. In the following, I'm much more stating conclusions than providing full arguments. This claim isn’t particularly original to me.]

I’m interested in the following subset of risk from AI:

Early: That comes from AIs that are just powerful enough to be extremely useful and dangerous-by-default (i.e. these AIs aren’t wildly superhuman).
Scheming: Risk associated with loss of control to AIs that arises from AIs scheming
- So e.g. I exclude state actors stealing weights in ways that aren’t enabled by the AIs scheming, and I also exclude non-scheming failure modes. IMO, state actors stealing weights is a serious threat, but non-scheming failure modes aren’t (at this level of capability and dignity).
Medium dignity [EA · GW]: that is, developers of these AIs are putting a reasonable amount of effort into preventing catastrophic outcomes from their AIs (perhaps they’re spending the equivalent of 10% of their budget on cost-effective measures to prevent catastrophes).
Nearcasted: no substantial fundamental progress on AI safety techniques, no substantial changes in how AI works.

This subset of risk is interesting because I think it’s a natural scenario at which to target technical work on AI safety. (E.g. it’s the main scenario we’re targeting with our AI control agenda [LW · GW].)

I claim that the majority of this risk comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored.

Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about:

It’s very expensive to refrain from using AIs for this application.
There’s no simple way to remove affordances from the AI such that it’s very hard for the AI to take a small sequence of actions which plausibly lead quickly to loss of control. In contrast, most other applications of AI probably can be controlled just by restricting their affordances.

If I’m right that the risk from scheming early-transformative models is concentrated onto this pretty specific scenario, it implies a bunch of things:

It implies that work on mitigating these risks should focus on this very specific setting.
It implies that AI control is organizationally simpler, because most applications can be made trivially controlled.
It is pretty plausible to me that AI control is quite easy, because you actually can remove affordances from the AIs that are doing AI R&D such that it’s hard for them to cause problems.

adam-shai on Transformers Represent Belief State Geometry in their Residual Stream

This is a great question, and one of the things I'm most excited about using this framework to study in the future! I have a few ideas but nothing to report yet.

But I will say that I think we should be able to formalize exactly what it would mean for a transformer to create/discover new knowledge, and also to apply the structure from one dataset and apply it to another, or to mix two abstract structures together, etc. I want to have an entire theory of cognitive abilities and the geometric internal structures that support them.

adam-shai on Transformers Represent Belief State Geometry in their Residual Stream

If I'm understanding your question correctly, then the answer is yes, though in practice it might be difficult (I'm actually unsure how computationally intensive it would be, haven't tried anything along these lines yet). This is definitely something to look into in the future!

adam-shai on Transformers Represent Belief State Geometry in their Residual Stream

It's surprising for a few reasons:

The structure of the points in the simplex is NOT
- The next token prediction probabilities (ie. the thing we explicitly train the transformer to do)
- The structure of the data generating model (ie. the thing the good regulator theorem talks about, if I understand the good regulator theorem, which I might not)

The first would be not surprising because it's literally what our loss function asks for, and the second might not be that surprising since this is the intuitive thing people often think about when we say "model of the world." But the MSP structure is neither of those things. It's the structure of inference over the model of the world, which is quite a different beast than the model of the world.

Others might not find it as surprising as I did - everyone is working off their own intuitions.

edit: also I agree with what Kave said about the linear representation.

benito on LessOnline Festival Updates Thread

Launched a few days ago, the plan is:

Kids tickets are $50
There's daycare purchasable on-site from 10am to 7pm, for like $10/hour if you book ahead of time or $30/hour if you use it on-the-day
If you want connection to a nanny for outside of those hours we have a service that can help with that at $45/hour.

Happy to get feedback on this, still figuring out what exactly helps parents and how to set it up right.

habryka4 on Please stop publishing ideas/insights/research about AI

Sorry, what? I thought the fear was that we don't know how to make helpful AI at all. (And that people who think they're being helped by seductively helpful-sounding LLM assistants are being misled by surface appearances; the shoggoth underneath has its own desires that we won't like when it's powerful enough to persue them autonomously.) In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

My steelman of this (though to be clear I think your comment makes good points):

There is a large difference between a system being helpful and a system being aligned. Ultimately AI existential risk is a coordination problem where I expect catastrophic consequences because a bunch of people want to build AGI without making it safe. Therefore making technologies that in a naive and short-term sense just help AGI developers build whatever they want to build will have bad consequences. If I trusted everyone to use their intelligence just for good things, we wouldn't have anthropogenic existential risk on our hands.

Some of those technologies might end up useful for also getting the AI to be more properly aligned, or maybe to help with work that reduces the risk of AI catastrophe some other way, though my current sense is that kind of work is pretty different and doesn't benefit remotely as much from generically locally-helpful AI.

In-general I feel pretty sad about conflating "alignment" with "short-term intent alignment". I think the two problems are related but have really important crucial differences, I don't think the latter generalizes that well to the former (for all the usual sycophancy/treacherous-turn reasons), and indeed progress on the latter IMO mostly makes the world marginally worse because the thing it is most likely to be used for is developing existentially dangerous AI systems faster.

Edit: Another really important dimension to model here is also not just the effect of that kind of research on what individual researchers will do, but what effect this kind of research will have on what the market wants to invest in. My standard story of doom is centrally rooted in there being very strong short-term individual economic incentives to build more capable AGI, enabling people to make billions to trillions of dollars, while the downside risk is a distributed negative externality that is not at all priced into the costs of AI development. Developing applications of AI that make a lot of money without accounting for the negative extinction externalities therefore can be really quite bad for the world.

LessWrong 2.0 Reader

Archive

Recent comments