LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes
Andrea_Miotti (AndreaM) · 2023-02-24T23:03:04.917Z · comments (7)

Retrospective on the 2022 Conjecture AI Discussions
Andrea_Miotti (AndreaM) · 2023-02-24T22:41:13.131Z · comments (5)

How popular is ChatGPT? Part 1: more popular than Taylor Swift
Harlan · 2023-02-24T22:30:04.340Z · comments (0)

Are you stably aligned?
Seth Herd · 2023-02-24T22:08:23.098Z · comments (0)

Puzzle Cycles
Screwtape · 2023-02-24T21:35:09.052Z · comments (2)

[link] Sam Altman: "Planning for AGI and beyond"
LawrenceC (LawChan) · 2023-02-24T20:28:00.430Z · comments (54)

A Proposed Test to Determine the Extent to Which Large Language Models Understand the Real World
Bruce G · 2023-02-24T20:20:22.582Z · comments (7)

[link] Meta "open sources" LMs competitive with Chinchilla, PaLM, and code-davinci-002 (Paper)
LawrenceC (LawChan) · 2023-02-24T19:57:24.402Z · comments (19)

[link] Relationship Orientations
DaystarEld · 2023-02-24T19:43:41.463Z · comments (1)

The alien simulation meme doesn't make sense
[deleted] · 2023-02-24T19:27:11.916Z · comments (1)

[link] Exit Duty Generator by Matti Häyry
Oldphan · 2023-02-24T18:35:58.502Z · comments (0)

2023 Stanford Existential Risks Conference
elizabethcooper · 2023-02-24T18:35:39.663Z · comments (0)

How major governments can help with the most important century
HoldenKarnofsky · 2023-02-24T18:20:08.530Z · comments (0)

Consent Isn't Always Enough
jefftk (jkaufman) · 2023-02-24T15:40:05.048Z · comments (16)

[question] Training for corrigability: obvious problems?
Ben Amitay (unicode-70) · 2023-02-24T14:02:38.420Z · answers+comments (6)

Death and Desperation
Ustice · 2023-02-24T12:43:36.259Z · comments (3)

[question] Are there rationality techniques similar to staring at the wall for 4 hours?
trevor (TrevorWiesinger) · 2023-02-24T11:48:45.944Z · answers+comments (8)

The fast takeoff motte/bailey
lc · 2023-02-24T07:11:10.392Z · comments (7)

AGI systems & humans will both need to solve the alignment problem
Jeffrey Ladish (jeff-ladish) · 2023-02-24T03:29:21.043Z · comments (14)

A poor but certain attempt to philosophically undermine the orthogonality of intelligence and aims
Jay95 · 2023-02-24T03:03:57.927Z · comments (1)

I wanna Gandalf here
Igor Timofeev (igor-timofeev-1) · 2023-02-24T01:22:06.964Z · comments (4)

[link] [Link] A community alert about Ziz
DanielFilan · 2023-02-24T00:06:00.027Z · comments (126)

Teleosemantics!
abramdemski · 2023-02-23T23:26:15.894Z · comments (26)

AI that shouldn't work, yet kind of does
Donald Hobson (donald-hobson) · 2023-02-23T23:18:55.194Z · comments (8)

The AGI Optimist’s Dilemma
kaputmi · 2023-02-23T20:20:22.507Z · comments (1)

Searching for a model's concepts by their shape – a theoretical framework
Kaarel (kh) · 2023-02-23T20:14:46.341Z · comments (0)

[link] Why I'm Skeptical of De-Extinction
Niko_McCarty (niko-2) · 2023-02-23T19:42:52.618Z · comments (1)

[question] What causes randomness?
lotsofquestions · 2023-02-23T18:50:31.315Z · answers+comments (12)

Somerville Roads Getting More Dangerous?
jefftk (jkaufman) · 2023-02-23T18:20:03.354Z · comments (1)

EIS XII: Summary
scasper · 2023-02-23T17:45:55.973Z · comments (0)

How to survive in an AGI cataclysm
RomanS · 2023-02-23T14:34:53.998Z · comments (3)

Covid 2/23/23: Your Best Possible Situation
Zvi · 2023-02-23T13:10:01.887Z · comments (9)

Full Transcript: Eliezer Yudkowsky on the Bankless podcast
remember · 2023-02-23T12:34:19.523Z · comments (89)

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results
Esben Kran (esben-kran) · 2023-02-23T10:48:08.766Z · comments (0)

[question] How to estimate a pre-aligned value for a common discussion ground?
EL_File4138 · 2023-02-23T10:38:18.489Z · answers+comments (12)

Interpersonal alignment intuitions
TekhneMakre · 2023-02-23T09:37:22.603Z · comments (18)

[link] Hello, Elua.
Tamsin Leake (carado-1) · 2023-02-23T05:19:07.246Z · comments (18)

Big Mac Subsidy?
jefftk (jkaufman) · 2023-02-23T04:00:03.996Z · comments (24)

[question] What moral systems (e.g utilitarianism) are common among LessWrong users?
hollowing · 2023-02-23T03:33:05.811Z · answers+comments (9)

AGI is likely to be cautious
PonPonPon · 2023-02-23T01:16:02.296Z · comments (14)

Short Notes on Research Process
Shoshannah Tekofsky (DarkSym) · 2023-02-22T23:41:45.279Z · comments (0)

[link] Video/animation: Neel Nanda explains what mechanistic interpretability is
DanielFilan · 2023-02-22T22:42:45.054Z · comments (7)

A Telepathic Exam about AI and Consequentialism
alkexr · 2023-02-22T21:00:21.994Z · comments (4)

[question] Injecting noise to GPT to get multiple answers
bipolo · 2023-02-22T20:02:13.644Z · answers+comments (1)

EIS XI: Moving Forward
scasper · 2023-02-22T19:05:52.723Z · comments (2)

Building and Entertaining Couples
Jacob Falkovich (Jacobian) · 2023-02-22T19:02:24.928Z · comments (11)

[link] Can submarines swim?
jasoncrawford · 2023-02-22T18:48:18.530Z · comments (14)

Is there a ML agent that abandons it's utility function out-of-distribution without losing capabilities?
Christopher King (christopher-king) · 2023-02-22T16:49:01.190Z · comments (7)

The male AI alignment solution
TekhneMakre · 2023-02-22T16:34:12.414Z · comments (24)

[link] Progress links and tweets, 2023-02-22
jasoncrawford · 2023-02-22T16:23:56.159Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

towards_keeperhood on Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

I think research on what you propose should definitely not be public and I'd recommend against publicly trying to push this alignment agenda.

towards_keeperhood on Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

(I think) Planck found the formula that matched the empirically observed distribution, but had no explanation for why it should hold. Einstein found the justification for this formula.

filip-sondej on An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

What if we constrain v to be in some subspace that is actually used by the MLP? (We can get it from PCA over activations on many inputs.)

This way v won't have any dormant component, so the MLP output after patching also cannot use that dormant pathway.

duschkopf on Beauty and the Bets

„Whether or not your probability model leads to optimal descision making is the test allowing to falsify it.“

Sure, I don‘t deny that. What I am saying is, that your probability model don‘t tell you which probability you have to base on a certain decision. If you can derive a probability from your model and provide a good reason to consider this probability relevant to your decision, your model is not falsified as long you arrive at the right decision. Suppose a simple experiment where the experimenter flips a fair coin and you have to guess if Tails or Heads, but you are only rewarded for the correct decision if the coin comes up Tails. Then, of course, you should still entertain unconditional probabilities P(Heads)=P(Tails)=1/2. But this uncertainty is completely irrelevant to your decision. What is relevant, however, is P(Tails/Tails)=1 and P(Heads/Tails)=0, concluding you should follow the strategy always guessing Tails. Another way to arrive at this strategy is to calculate expected utilities setting U(Heads)=0 as you would propose. But this is not the only reasonable solution. It’s just a different route of reasoning to take into account the experimental condition that your decision counts only if the coin lands Tails.

„The model says that P(Heads|Red) = 1/3 P(Heads|Blue) = 1/3 but P(Heads|Red or Blue) = 1/2 Which obviosly translates in a betting scheme: someone who bets on Tails only when the room is Red wins 2/3 of times and someone who bets on Tails only when the room is Blue wins 2/3 of times, while someone who always bet on Tails wins only 1/2 of time.“

A quick translation of the probabilities is:

P(Heads/Red)=1/3: If your total evidence is Red, then you should entertain probability 1/3 for Heads.

P(Heads/Blue)=1/3: If your total evidence is Blue, then you should entertain probability 1/3 for Heads.

P(Heads/Red or Blue)=1/2: If your total evidence is Red or Blue, which is the case if you know that either red or blue or both, but not which exactly, you should entertain probalitity 1/2 for Heads.

If the optimal betting sheme requires you to rely on P(Heads/Red or Blue)=1/2 when receiving evidence Blue, then the betting sheme demands you to ignore your total evidence. Ignoring total evidence does not necessarily invalidate the probability model, but it certainly needs justification. Otherwise, by strictly following total evidence your model will let you also run foul of the Reflection Principle, since you will arrive at probability 1/3 in every single experimental run.

Going one step back, with my translation of the conditional probabilities above I have made the implicit assumption that the way the agent learns evidence is not biased towards a certain hypothesis. But this is obviously not true for the Beauty: Due to the memory loss Beauty is unable to learn evidence „Red and Blue“ regardless of the coin toss. This in combination with her sleep on Tuesday if Heads, she is going to learn „Red“ and „Blue“ (but not „Red and Blue“) if Tails while she is only going to learn either „Red“ or „Blue“ if Heads, resulting in a bias towards the Tails-hypothesis.

I admit that P(Heads/Red)=P(Heads/Blue)=1/3, but P(Heads/Red or Blue)=1/2 hints you towards the existence of that information selection bias. However, this is just as little a feature of your model as a flat tire is a feature of your car because it hints you to fix it. It is not your probability model that guides you to adopt the proper betting strategy by ignoring total evidence. In fact, it is just the other way around that your knowledge about the bias guides you to partially dismiss your model. As mentioned above, this does not necessarily invalidate your model, but it shows that directly applying it in certain decision scenarios does not guarantee optimal decisions but can even lead to bad decisions and violating Reflection Principle.

Therefore, as a halfer, I would prefer an updating rule that takes into account the bias and telling me P(Heads/Red)=P(Heads/Blue)=P(Red or Blue)=1/2. While offering me the possibility of a workaround to arrive at your betting sheme. One possible workaround is that Beauty runs a simulation of another experiment within her original Technicolor Experiment in which she is only awoken in a Red room. She can easily simulate that and the same updating rule that tells her P(Heads/Red)=1/2 for the original experiment tells her P(Heads/Red)=1/3 for the simulated experiment.

„This leads to a conclusion that observing event "Red" instead of "Red or Blue" is possible only for someone who has been expecting to observe event "Red" in particular. Likewise, observing HTHHTTHT is possible for a person who was expecting this particular sequence of coin tosses, instead of any combination with length 8. See Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events“

I have already refuted this way of reasoning in the comments of your post.

steve2152 on Does reducing the amount of RL for a given capability level make AI safer?

Right, and that wouldn’t apply to a model-based RL system that could learn an open-ended model of any aspect of the world and itself, right?

I think your “it is nearly impossible for any computationally tractable optimizer to find any implementation for a sparse/distant reward function” should have some caveat that it only clearly applies to currently-known techniques. In the future there could be better automatic-world-model-builders, and/or future generic techniques to do automatic unsupervised reward-shaping for an arbitrary reward, such that AIs could find out-of-the-box ways to solve hard problems without handholding.

algon on Some Experiments I'd Like Someone To Try With An Amnestic

@habryka [LW · GW] this comment has an anomalous amount of karma. It showed up on popular comments, I think, and I'm wondering if people liked the comment when they saw it there which lead to a feedback loop of more eyeballs on the comment, more likes, more eyeball etc. If so, is that the intended behaviour of the popular comments feature? It seems like it shouldn't be.

david-james on If You Demand Magic, Magic Won't Help

Let's step back. This thread of the conversation is rooted in this claim: "Let's be honest: all fiction is a form of escapism.". Are we snared in the [Disputing Definitions](https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-definitions [LW · GW]) trap? To quote from that LW article:

> if the issue arises, both sides should switch to describing the event in unambiguous lower-level constituents, like acoustic vibrations or auditory experiences. Or each side could designate a new word, like 'alberzle' and 'bargulum', to use for what they respectively used to call 'sound'; and then both sides could use the new words consistently. That way neither side has to back down or lose face, but they can still communicate. And of course you should try to keep track, at all times, of some testable proposition that the argument is actually about.

I propose that we recognize several lower-level testable claims, framed as questions. How many people read fiction to ...

1. entertain?
2. distract from an unpleasant reality?
3. understand the human condition (including society)?
4. think through alternative scenarios?

Now I will connect the conversation to these four points:

* Luke_A_Somers: "Why would I ever want to escape from my wonderful life to go THERE?" See point 2.

* Which points is thomblake referring to? Consider this quote from [The Philosophy of Horror](https://www.kentuckypress.com/9780813136554/the-philosophy-of-horror/). "Whether serious, kitschy, frightening, or ridiculous, horror not only arouses the senses but also raises profound questions about fear, safety, justice, and suffering. From literature and urban legends to film and television, horror's ability to thrill has made it an integral part of modern entertainment." From this, it seems they are emphasizing points 1 and 3.

* JonInstall pulls out the dictionary in the hopes of "settling" the debate. He refers to point 1.

* To add some texture to the discussion: when I read the [embedded story](https://en.wikipedia.org/wiki/Story_within_a_story) [The Tale of the Omegas](https://www.reddit.com/r/singularity/comments/12dmufv/the_most_important_short_story_everyone_here/) in [Life 3.0](https://www.penguinrandomhouse.com/books/530584/life-30-by-max-tegmark/), include me in point 4.

Does this sound about right?

ustice on What is a community that has changed their behaviour without strife?

I’m not quite sure what you mean by “deeply painful process.” There is often a segment of any community that resists any change. That’s not to say that it has to be a fight, but community practices have an inertia to them. Sometimes that a shift that’s happens over time.

For instance, when I was a kid (1980s), “gay” was a common pejorative. While there have been plenty of painful events that have happened in the lives of LGBT folk, I don’t think that this was due to some process that is deeply painful, other than people slowly changing their minds over time.

I’ve seen the polyamorous community shift best practices over time. Again though, I don’t think that this is due to some inherently painful process. One could argue that the collective pain that we experience as we’re making mistakes is that process, but I suspect that isn’t what you mean here.

I think that change is generally hard, but it naturally happens over time.

davidmanheim on Biorisk is an Unhelpful Analogy for AI Risk

I agree that we do not have an exact model for anything in immunology, unlike physics, and there is a huge amount of uncertainty. But that's different than saying it's not well-understood; we have clear gold-standard methods for determining answers, even if they are very expensive. This stands in stark contrast to AI, where we don't have the ability verify that something works or is safe at all without deploying it, and even that isn't much of a check on its later potential for misuse.

But aside from that, I think your position is agreeing with mine much more than you imply. My understanding is that we have newer predictive models which can give uncertain but fairly accurate answers to many narrow questions. (Older, non-ML methods also exist, but I'm less familiar with them.) In your hypothetical case, I expect that the right experts can absolutely give indicative answers about whether a novel vaccine peptide is likely or unlikely to have cross-reactivity with various immune targets, and the biggest problem is that it's socially unacceptable to assert confidence in anything short of tested and verified case. But the models can get, in the case of the Zhang et al paper above, 70% accurate answers, which can help narrow the problem for drug or vaccine discovery, then they do need to be followed with in vitro tests and trials.

mo-putera on AIs teams will probably be more superintelligent than individual AIs

Amateur hour question, if you don't mind: how does your "future of AI teams" compare/contrast with Drexler's CAIS model [LW · GW]?