LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

zach-stein-perlman on New voluntary commitments (AI Seoul Summit)

Quoting me last time you said this:

The label "RSP" isn't perfect but it's kinda established now. My friends all call things like this "RSPs." . . . I predict change in terminology will happen ~iff it's attempted by METR or multiple frontier labs together. For now, I claim we should debate terminology occasionally but follow standard usage when trying to actually communicate.

jenniferrm on Scientific Notation Options

I see it. If you try to always start with a digit, then always follow with a decimal place, then the rest implies measurement precision, and the mantissa lets you ensure a dot after the first digit <3

The most amusing exceptional case I could think of: "0.1e1" :-D

This would be like "I was trying to count penguins by eyeball in the distance against the glare of snow and maybe it was a big one, or two huddled together, or maybe it was just a weirdly shaped rock... it could have been a count of 0 or 1 or 2."

wei-dai on OpenAI: Exodus

I'd like to hear from people who thought that AI companies would act increasingly reasonable (from an x-safety perspective) as AGI got closer. Is there still a viable defense of that position (e.g., that SamA being in his position / doing what he's doing is just uniquely bad luck, not reflecting what is likely to be happening / will happen at other AI labs)?

Also, why is there so little discussion of x-safety culture at other AI labs? I asked on Twitter and did not get a single relevant response. Are other AI company employees also reluctant to speak out, if so that seems bad (every explanation I can think of seems bad, including default incentives + companies not proactively encouraging transparency).

sdm on New voluntary commitments (AI Seoul Summit)

If you go with an assumption of good faith then the partial, gappy RSPs we've seen are still a major step towards having a functional internal policy to not develop dangerous AI systems because you'll assume the gaps will be filled in due course. However, if we don't assume a good faith commitment to implement a functional version of what's suggested in a preliminary RSP without some kind of external pressure, then they might not be worth much more than the paper they're printed on.

But, even if the RSPs aren't drafted in good faith and the companies don't have a strong safety culture (which seems to be true of OpenAI judging by what Jan Leike said), you can still have the RSP commitment rule be a foundation for actually effective policies down the line.

For comparison, if a lot of dodgy water companies sign on to a 'voluntary compact' to develop some sort of plan to assess the risk of sewage spills then probably the risk is reduced by a bit, but it also makes it easier to develop better requirements later, for example by saying "Our new requirement is the same as last years but now you must publish your risk assessment results openly" and daring them to back out. You can encourage them to compete on PR by making their commitments more comprehensive than their opponents and create a virtuous cycle, and it probably just draws more attention to the plans than there was before.

mesaoptimizer on Overconfidence

I do think that systematic self-delusion seems useful in multi-agent environments (see the commitment races problem [LW · GW] for an abstract argument, and Sarah Constantin's essay "Is Stupidity Strength?" for a more concrete argument.

I'm not certain that this is the optimal strategy we have for dealing with such environments, and note that systematic self-delusion also leaves you (and the other people using a similar strategy to coordinate) vulnerable to risks that do not take into account your self-delusion. This mainly includes existential risks such as misaligned superintelligences, but also extinction-level asteroids.

Its a pretty complicated picture and I don't really have clean models of these things, but I do think that for most contexts I interact in, the long-term upside of having better models of reality is significantly higher compared to the benefit of systematic self-delusion.

tylerjohnston on New voluntary commitments (AI Seoul Summit)

My only concern with "voluntary safety commitments" is that it seems to encompass too much, when the RSPs in question here are a pretty specific framework with unique strengths I wouldn't want overlooked.

I've been using "iterated scaling policy," but I don't think that's perfect. Maybe "evaluation-based scaling policy"? or "tiered scaling policy"? Maybe even "risk-informed scaling policy"?

michael-chen on Is deleting capabilities still a relevant research question?

I think unlearning model capabilities is definitely not a solved problem! See Eight Methods to Evaluate Robust Unlearning in LLMs and Rethinking Machine Unlearning for Large Language Models and the limitations sections of more recent papers like the WMDP Benchmark and SOPHON.

mesaoptimizer on Overconfidence

According to Eliezar Yudkowsky [? · GW], your thoughts should reflect reality.

I expect that the more your beliefs track reality, the better you'll get at decision making, yes.

According to Paul Graham, the most successful people are slightly overconfident.

Ah but VCs benefit from the ergodicity of the startup founders! From the perspective of the founder, its a non-ergodic situation. Its better to make Kelly bets instead if you prefer to not fall into gambler's ruin, given whatever definition of the real world situation maps onto the abstract concept of being 'ruined' here.

It usually pays to have a better causal model of reality than relying on what X person says to inform your actions.

Can you think of anyone who has changed history who wasn’t a little overconfident?

Survivorship bias.

It is advantageous to be friends with the kind of people who do things and never give up.

I think I do things and never give up in general, while I can be pessimistic about specific things and tasks I could do. You can be generally extremely confident in yourself and your ability to influence reality, while also being specifically pessimistic about a wide range of existing possible things you could be doing.

Here's a Nate post that provides his perspective on this specific orientation to reality that leads to a sort of generalized confidence that has social benefits. [LW · GW]

gyrodiot on What are some infohazards?

Welcome! One gateway for you might be the LW Concepts page about it [? · GW]!

Most of the posts discuss, of course, infohazard policy and properties of information that would be harmful to know, or think about. Directly sharing blatantly harmful information would be irresponsible.

jacques-thibodeau on jacquesthibs's Shortform

I would find it valuable if someone could gather an easy-to-read bullet point list of all the questionable things Sam Altman has done throughout the years.

I usually link to Gwern’s comment thread (https://www.lesswrong.com/posts/KXHMCH7wCxrvKsJyn/openai-facts-from-a-weekend?commentId=toNjz7gy4rrCFd99A [LW(p) · GW(p)]), but I would prefer if there was something more easily-consumable.

LessWrong 2.0 Reader

Archive

Recent comments