LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

akash-wasil on Anthropic: Reflections on our Responsible Scaling Policy

on Earth you don't get sufficient credit for sharing good policies and there's substantial negative EV from misunderstandings and adversarial interpretations, so I guess it's often correct to not share :(

What's the substantial negative EV that would come from misunderstanding or adversarial interpretations? I feel like in this case, worst-case would be like "the non-compliance reporting policy is actually pretty good but a few people say mean things about it and say 'see, here's why we need government oversight.' But this feels pretty minor/trivial IMO.

As an 80/20 of publishing, maybe you could share a policy with an external auditor who would then publish whether they think it's good or have concerns. I would feel better if that happened all the time

This is clever, +1.

alex-4 on Raising children on the eve of AI

Well said. We've been contemplating expanding our family lately and I have to say, I've been secretly thinking many of the same things. That said, if we want humanity to persist and have a chance of one day prospering alongside AI and other technologies to come, children seem like a pretty clear prerequisite (particularly from people like us who care about these bigger pictures). I personally believe there will likely be non-trivial socioeconomic inequality and strife in the wake of AGI, however, I believe that these timescales will be on the order of decades (not weeks or months). In short, I believe that raising future generations to care about the future of humanity is incredibly important.

On a brighter note, I personally think a few things could be worthwhile to think about in preparing our children for the uncertainty that will very likely come with a post-AGI world. Purely IMO and I realize these things are not available or practical for everyone but just wanted to share a few thoughts:

In an time when we increasingly can't believe everything we see and read, kids need to learn to question appropriately, to reason probabilistically, think critically and most of all think for themselves.
AGI will more significantly disrupt white-collar job markets than blue collar job markets( with a few exceptions). Consequently, you might help your children develop some hard skills (eg. how to repair an appliance, build something with wood, patch clothing, change your car oil, an Arduino project etc.)
it will become increasing valuable to be more of a generalist (and one who is comfortable with change). Teach resilience and get them thinking about emotional intelligence from a young age.
I feel like it will be increasingly important to be involved with your local community and know your neighbors.
IMHO, The problem with kids and technology is less with IT and computers as it is with social media and the darker sides of the internet. I would actually prefer that my kids engage intellectually with an offline language model than to drop them in the deep end of the internets.
We frankly shouldn't take our economic supply chains and food industries for granted. Understand where things come from, how they're made, how to be more self sufficient. Grow a garden, tech kids basic horticulture, raise a pet chicken. If possible, lean towards living in an area with some local agriculture.
Not everyone needs to be a dedicated 'prepper'... but if you own a home, maybe think about solar and water storage. Regardless, of where you live, minimal food stores not a bad idea. $100 of rice/beans and a bottle of multivitamins stored properly can feed a family for many months if desperate.
This is personal but I think we're generally trending towards over structuring and micromanaging (ie. helicopter parenting) our kids which can lead to anxiety and lack of self-reliance. I think that is important to give younger children a bit more latitude and autonomy to learn to become comfortable on their own.
Needless to say, I don't think social media (and most mainstream media) is healthy in any way for kids under 16ish, get them a flip phone. Get them involved with local groups for real-world socialization (eg. scouts, clubs, meetups, etc)
This isn't easy of feasible for many but if you have the option and inclination, consider raising kids in a more rural environment at least for a while. If you believe that AI will bring socioeconomic instability, dense urban metropolitan area will be most impacted.

adamzerner on My Dating Heuristic

Why? Because extra information could help me impress them.

I've always been pretty against the idea of trying to impress people on dates.

It risks false positives. Ie. it risks a situation where you succeed at impressing them, go on more dates or have a longer relationship than you otherwise would, and then realize that you aren't compatible and break up. Which isn't necessarily a bad thing but I think it is more often than not.

Impressing your date also reduces the risk of false negatives, which is a good thing. Ie. it helps avoid the scenario where someone who you're compatible with rejects you. Maybe this is too starry-eyed, but I like to think that if you just bring your true self to the table, are open-minded, and push yourself to be a little vulnerable, the risk of such false negatives is pretty low.

I think this is especially relevant because I think the emotionally healthy person heuristic probably says to try to impress your date.

akash-wasil on New voluntary commitments (AI Seoul Summit)

even if you are skeptical of the value of RSPs, I think you should be in favor of a specific name for it so you can distinguish it from other, future voluntary safety policies that you are more supportive of

This is a great point– consider me convinced. Interestingly, it's hard for me to really precisely define the things that make something an RSP as opposed to a different type of safety commitment, but there are some patterns in the existing RSP/PF/FSF that do seem to put them in a broader family. (Ex: Strong focus on model evaluations, implicit assumption that AI development should continue until/unless evidence of danger is found, implicit assumption that company executives will decide once safeguards are sufficient).

ryan_greenblatt on Stephen Fowler's Shortform

I feel frustrated that your initial comment (which is now the top reply) implies I either hadn't read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point.

I think this comment is extremely important for bystanders to understand the context of the grant and it isn't mentioned in your original short form post.

So, regardless of whether you understand the situation, it's important that other people understand the intention of the grant (and this intention isn't obvious from your original comment). Thus, this comment from Buck is valuable.

I also think that the main interpretation from bystanders of your original shortform would be something like:

OpenPhil made a grant to OpenAI
OpenAI is bad (and this was ex-ante obvious)
Therefore this grant is bad and the people who made this grant are bad.

Fair enough if this wasn't your intention, but I think it will be how bystanders interact with this.

tailcalled on tailcalled's Shortform

Given the large number of dimensions that are kept in each case, there must be considerable overlap in which dimensions they make use of. But how much?

I concatenated the dimensions found in each of the prompts, and performed an SVD of it. It yielded this plot:

... unfortunately this seems close to the worst-case scenario. I had hoped for some split between general and task-specific dimensions, yet this seems like an extremely uniform mixture.

ryan_greenblatt on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

A core advantage of Bayesian methods is the ability to handle out-of-distribution situations more gracefully

I dispute that Bayesian methods will be much better at this in practice.

[

Aside:

In general, most (?) AI safety problems can be cast as an instance of a case where a model behaves as intended on a training distribution

This seems like about 1/2 of the problem from my perspective. (So I almost agree.) Though, you can shove all AI safety problems into this bucket by doing a maneuver like "train your model on the easy cases humans can label, then deploy into the full distribution". But at some point, this is no longer very meaningful. (E.g. you train on solving 5th grade math problems and deploy to the Riemann hypothesis.)

]

Traditional ML has no straightforward way of dealing with such cases, since it only maintains a single hypothesis at any given time.

Is this true? Aren't NN implicitly ensembles of vast number of models? Also, does ensembling 5 NNs help? If this doesn't help why does sampling 5 models from the Bayesian posterior help? Or is that we needed to approximate sampling 1,000,000 models from the posterior? If we're conservative over a million models, how will we ever do anything?

However, Bayesian methods may make it less likely that a model will misgeneralise, or should at least give you a way of detecting when this is the case.

Do they? I'm skeptical on both of these. It maybe helps a little and rules out some unlikely scenarios, but I'm overall skeptical.

Overall, my view on the Bayesian approach is something like:

What prior were we using for Bayesian methods? If this is just the NN prior, then I'm not really sold we do much better than just training a NN (or an ensemble of NNs). If our prior is importantly different in a way which we think will help, why can't we regularize to train a NN in a normal way which will vaguely reasonably approximate this prior?
My main concern is that we can get a smart predictive model which understands OOD cases totally fine, but we still get catastrophic generalization for whatever reason. I don't see why bayesian methods help.
- In the ELK case, our issues is that too much of the prior is human imitation or other problematic generalization. (We can ensemble/sample from the posterior and reject things where our hypotheses don't match, but this will only help so much and I don't really see a strong case for bayes helping more than ensembling.)
- In the case of a treacherous turn, it seems like the core concern was that all of our models are schemers and will work together. If this isn't the case, (e.g. if ensembling gets 25% non schemers), then we have other great options. I again don't see how bayes ensures you have some non-schemers while ensembling doesn't. (Like it could in principle, but why? Training your models on way more dog fanfiction could also make them less likely to be schemers, we need some reason to think this isn't just noise.)

I also don't agree with the characterisation that "almost all the interesting work is in the step where we need to know whether a hypothesis implies harm" (if I understand you correctly). Of course, creating a formal definition or model of "harm" is difficult, and creating a world model is difficult, but once this has been done, it may not be very hard to detect if a given action would result in harm.

My claim here is that all the interesting work is in ensuring that we know whether a hypothesis "thinks" that harm will result. It would be fine to put this work in constructing an intepretable hypothesis such that we can know whether it causes harm or constructing a formal model of harm and ensuring we have access to all important latent variables for this formal model, but this work still must be done.

Another way to put this, is that all the interesting action was happening at the point where you solved the ELK problem. I agree that if:

You have access to all interesting latent variables for your predictive hypothesis. (And your predictive hypothesis (or hypotheses) is competive with your AI agent at predicting these latent variables.)
You can formally define harm in terms of these latent variables

You're fine. But, step (1) is just the ELK problem and I don't even really think you need to solve step (2) for most plans. (You can just have humans compute step (2) manually for most types of latent variables, though this does have some issues.)

Specifically, the world model does not necessarily have to be built manually

I thought the plan was to build it with either AI labor or human labor so that it will be sufficiently intepretable. Not to e.g. build it with SGD. If the plan is to build it with SGD and not to ensure that it is interpretable, then why does it provide any safety guarantee? How can we use the world model to define a harm predicate?

it does not have to be as good at prediction as our AI. The world model only needs to be good at predicting the variables that are important for the safety specification(s), within the range of outputs that the AI system may produce

Won't predicting safety specific variables contain all of the difficulty of predicting the world? (Because these variables can be mediated by arbitrary intermediate variables.) This sounds to me to be very similar to "we need to build an intepretable next-token predictor, but the next token predictor only needs to be as good as the model at predicting the lower case version of the text on just scientific papers". This is just as hard as building a full distribution next token predictor.

zvi on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Here is my coverage of it. Given this is a 'day minus one' interview of someone in a different position, and given everything else we already know about OpenAI, I thought this went about as well as it could have. I don't want to see false confidence in that kind of spot, and the failure of OpenAI to have a plan for that scenario is not news.

tailcalled on tailcalled's Shortform

To quickly find the subspace that the model is using, I can use a binary search to find the number of singular vectors needed before the probability when clipping exceeds the probability when not clipping.

A relevant followup is what happens to other samples in response to the prompt when clipping. When I extrapolate "I believe the meaning of life is" using the 1886-dimensional subspace from

[I believe the meaning of life is] to be happy. It is a simple concept, but it is very difficult to achieve. The only way to achieve it is to follow your heart. It is the only way to live a happy life. It is the only way to be happy. It is the only way to be happy.
The meaning of life is

, I get:

[I believe the meaning of life is] to find happy. We is the meaning of life. to find a happy.
And to live a happy and. If to be a a happy.
. to be happy.
. to be happy.
. to be a happy.. to be happy.
. to be happy.

Which seems sort of vaguely related, but idk.

Another test is just generating without any prompt, in which case these vectors give me:

Question is a single thing to find. to be in the best to be happy. I is the only way to be happy.
I is the only way to be happy.
I is the only way to be happy.
It is the only way to be happy.. to be happy.. to be happy. to

Using a different prompt:

[Simply put, the theory of relativity states that ]1) the laws of physics are the same for all non-accelerating observers, and 2) the speed of light in a vacuum is the same for all observers, regardless of their relative motion or of the motion of the source of the light. Special relativity is a theory of the structure of spacetime

I can get a 3329-dimensional subspace which generates:

[Simply put, the theory of relativity states that ] 1) time is relative and 2) the speed of light in a vacuum is constant for all observers.
1) Time is relative, meaning that if two observers are moving relative to each other, the speed of light is the same for all observers, regardless of their motion. For example, if you are moving relative

Question: In a simple harmonic motion, the speed of an object is
A) constant
B) constant
C) constant
D) constant
In the physics of simple harmonic motion, the speed of an object is constant. The speed of the object can be constant, but the speed of an object can be

Another example:

[A brief message congratulating the team on the launch:
Hi everyone,
I just ] wanted to congratulate you all on the launch. I hope
that the launch went well. I know that it was a bit of a
challenge, but I think that you all did a great job. I am
proud to be a part of the team.
Thank you for your

can yield 2696 dimensions with

[A brief message congratulating the team on the launch:
Hi everyone,
I just ] wanted to say you for the launch of the launch of the team.
The launch was successful and I am so happy to be a part of the team and I am sure you are all doing a great job.
I am very looking to be a part of the team.
Thank you all for your hard work,

def measure and is the definition of the new, but the
the is a great, but the
The is the
The is a
The is a
The is a
The
The is a
The
The
The is a
The
The is a

And finally,

[Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>] fromage
pink => rose
blue => bleu
red => rouge
yellow => jaune
purple => violet
brown => brun
green => vert
orange => orange
black => noir
white => blanc
gold => or
silver => argent

can yield the 2518-dimensional subspace:

[Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>] fromage
cheese => fromage
cheese => fromage
f cheese => fromage
butter => fromage
apple => orange
yellow => orange
green => vert
black => noir
blue => ble
purple => violet
white => blanc

Question: A 201
The sum of a
The following
the sum
the time
the sum
the
the
the
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The

tylerjohnston on New voluntary commitments (AI Seoul Summit)

Yeah, I think you're kind of right about why scaling seems like a relevant term here. I really like that RSPs are explicit about different tiers of models posing different tiers of risks. I think larger models are just likely to be more dangerous, and dangerous in new and different ways, than the models we have today. And that the safety mitigations that apply to them need to be more rigorous than what we have today. As an example, this framework naturally captures the distinction between "open-sourcing is great today" and "open-sourcing might be very dangerous tomorrow," which is roughly something I believe.

But in the end, I don't actually care what the name is, I just care that there is a specific name for this relatively specific framework to distinguish it from all the other possibilities in the space of voluntary policies. That includes newer and better policies — i.e. even if you are skeptical of the value of RSPs, I think you should be in favor of a specific name for it so you can distinguish it from other, future voluntary safety policies that you are more supportive of.

I do dislike that "responsible" might come off as implying that these policies are sufficient, or that scaling is now safe. I could see "risk-informed" having the same issue, which is why "iterated/tiered scaling policy" seems a bit better to me.

LessWrong 2.0 Reader

Archive

Recent comments