LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Seeking submissions for short AI-safety course proposals
Sergio (sergio-abriola) · 2022-12-01T00:32:40.816Z · comments (0)

Reestablishing Reliable Sources: A System for Tagging URLs
Riley Mueller (rileymueller) · 2022-12-01T02:27:18.629Z · comments (1)

Notes on Caution
David Gross (David_Gross) · 2022-12-01T03:05:21.490Z · comments (0)

SBF's comments on ethics are no surprise to virtue ethicists
c.trout (ctrout) · 2022-12-01T04:18:25.877Z · comments (30)

[link] Did ChatGPT just gaslight me?
ThomasW (ThomasWoodside) · 2022-12-01T05:41:46.560Z · comments (45)

Safe Development of Hacker-AI Countermeasures – What if we are too late?
Erland Wittkotter (Erland) · 2022-12-01T07:59:11.862Z · comments (0)

Theories of impact for Science of Deep Learning
Marius Hobbhahn (marius-hobbhahn) · 2022-12-01T14:39:46.062Z · comments (0)

Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
JanB (JanBrauner) · 2022-12-01T14:55:23.569Z · comments (3)

[link] [LINK] - ChatGPT discussion
JanB (JanBrauner) · 2022-12-01T15:04:45.257Z · comments (8)

[link] ChatGPT: First Impressions
specbug (rishit-vora) · 2022-12-01T16:36:19.592Z · comments (2)

Covid 12/1/22: China Protests
Zvi · 2022-12-01T17:10:00.839Z · comments (2)

The Machine Stops (Chapter 9)
Justin Bullock (justin-bullock) · 2022-12-01T19:20:26.031Z · comments (0)

Finding gliders in the game of life
paulfchristiano · 2022-12-01T20:40:04.230Z · comments (7)

The Plan - 2022 Update
johnswentworth · 2022-12-01T20:43:50.516Z · comments (37)

The LessWrong 2021 Review: Intellectual Circle Expansion
Ruby · 2022-12-01T21:17:50.321Z · comments (55)

Re-Examining LayerNorm
Eric Winsor (EricWinsor) · 2022-12-01T22:20:23.542Z · comments (12)

Take 1: We're not going to reverse-engineer the AI.
Charlie Steiner · 2022-12-01T22:41:32.677Z · comments (4)

Playing with Aerial Photos
jefftk (jkaufman) · 2022-12-01T22:50:04.609Z · comments (0)

A challenge for AGI organizations, and a challenge for readers
Rob Bensinger (RobbBB) · 2022-12-01T23:11:44.279Z · comments (33)

[link] Understanding goals in complex systems
Johannes C. Mayer (johannes-c-mayer) · 2022-12-01T23:49:49.321Z · comments (0)

Lumenators for very lazy British people
shakeelh · 2022-12-02T00:18:36.876Z · comments (3)

Against meta-ethical hedonism
Joe Carlsmith (joekc) · 2022-12-02T00:23:26.039Z · comments (4)

Quick look: cognitive damage from well-administered anesthesia
Elizabeth (pktechgirl) · 2022-12-02T00:40:01.344Z · comments (0)

Update on Harvard AI Safety Team and MIT AI Alignment
Xander Davies (xanderdavies) · 2022-12-02T00:56:45.596Z · comments (4)

[link] Mastering Stratego (Deepmind)
[deleted] · 2022-12-02T02:21:56.672Z · comments (0)

New Feature: Collaborative editing now supports logged-out users
RobertM (T3t) · 2022-12-02T02:41:52.297Z · comments (0)

Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout · 2022-12-02T02:43:20.915Z · comments (22)

Deconfusing Direct vs Amortised Optimization
beren · 2022-12-02T11:30:46.754Z · comments (17)

Jailbreaking ChatGPT on Release Day
Zvi · 2022-12-02T13:10:00.860Z · comments (77)

[question] Is ChatGPT rigth when advising to brush the tongue when brushing teeth?
ChristianKl · 2022-12-02T14:53:02.123Z · answers+comments (14)

[link] NeurIPS Safety & ChatGPT. MLAISU W48
Esben Kran (esben-kran) · 2022-12-02T15:50:16.938Z · comments (0)

[ASoT] Finetuning, RL, and GPT's world prior
Jozdien · 2022-12-02T16:33:41.018Z · comments (8)

Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis
NickGabs · 2022-12-02T19:14:59.825Z · comments (2)

Apply for the ML Upskilling Winter Camp in Cambridge, UK [2-10 Jan]
hannah wing-yee (hannah-erlebach) · 2022-12-02T20:45:10.768Z · comments (0)

Brun's theorem and sieve theory
Ege Erdil (ege-erdil) · 2022-12-02T20:57:39.956Z · comments (1)

Three Fables of Magical Girls and Longtermism
Ulisse Mini (ulisse-mini) · 2022-12-02T22:01:30.225Z · comments (11)

Research Principles for 6 Months of AI Alignment Studies
Shoshannah Tekofsky (DarkSym) · 2022-12-02T22:55:17.165Z · comments (3)

Subsets and quotients in interpretability
Erik Jenner (ejenner) · 2022-12-02T23:13:34.204Z · comments (1)

D&D.Sci December 2022: The Boojumologist
abstractapplic · 2022-12-02T23:39:49.398Z · comments (9)

Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.
Charlie Steiner · 2022-12-03T00:54:03.059Z · comments (1)

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC (LawChan) · 2022-12-03T00:58:36.973Z · comments (35)

Causal scrubbing: Appendix
LawrenceC (LawChan) · 2022-12-03T00:58:45.850Z · comments (4)

Causal scrubbing: results on a paren balance checker
LawrenceC (LawChan) · 2022-12-03T00:59:08.078Z · comments (2)

Causal scrubbing: results on induction heads
LawrenceC (LawChan) · 2022-12-03T00:59:18.327Z · comments (1)

Great Cryonics Survey of 2022
Mati_Roy (MathieuRoy) · 2022-12-03T05:10:14.536Z · comments (0)

MrBeast's Squid Game Tricked Me
lsusr · 2022-12-03T05:50:02.339Z · comments (1)

[question] Is school good or bad?
tailcalled · 2022-12-03T13:14:22.737Z · answers+comments (76)

Our 2022 Giving
jefftk (jkaufman) · 2022-12-03T15:40:01.678Z · comments (0)

Utilitarianism is the only option
aelwood · 2022-12-03T17:14:19.532Z · comments (7)

[link] Chat GPT's views on Metaphysics and Ethics
Cole Killian (cole-killian) · 2022-12-03T18:12:19.290Z · comments (3)

next page (older posts) →

Archive

Recent comments

dx26 on Coherence of Caches and Agents

It might be relevant to note that the meaningfulness of this coherence definition depends on the chosen environment. For instance, in an deterministic forest MDP where an agent at a state can never return to $s$ for any $s$ and there is only one path between any two states, suppose we have a deterministic policy $π$ and let $s_{1} = π (s)$ , $s_{2} = π (s_{1})$ , etc. Then for the zero-current-payoff Bellman equations, we only need that $V (s_{1}) > V (s^{'})$ for any successor $s^{'}$ from $s$ , $V (s_{2}) > V (s^{''})$ for any successor $s^{''}$ from $s^{'}$ , etc. We can achieve this easily by, for example, letting all values except $V (s_{i})$ be near-zero; since $s_{j}$ is a successor of $s_{i}$ iff $j = i + 1$ (as otherwise there would be a cycle), this fits our criterion. Thus, every $π$ is coherent in this environment. (I haven't done the explicit math here, but I suspect that this also works for non-deterministic $π$ and non-stochastic MDPs.)

Importantly, using the common definition of language models in an RL setting where each state represents a sequence of tokens and each action adds a token to the end of a sequence of length $t$ to produce a sequence of length $t + 1$ , the environment is a deterministic forest, as there is only one way to "go between" two sequences (if one is a prefix of the other, choose the remaining tokens in order). Thus, any language model is coherent, which seems unsatisfying. We could try using a different environment, but this risks losing stochasticity (as the output logits of an LM is determined by its input sequence) and gets complicated pretty quickly (use natural abstractions/world model as states?).

viliam on Viliam's Shortform

I suspect that in practice many people use the word "prioritize" to mean:

think short-term
only do legible things
remove slack

viliam on An explanation of evil in an organized world

“For my thoughts are not your thoughts, neither are your ways my ways,” declares the Lord.
-- Isaiah 55:8

This probably also implies: "your values are not my values".

chris_leong on Please stop publishing ideas/insights/research about AI

In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

If I'm being honest, I don't find this framing helpful.

If you believe that things will go well if certain actors gain access to advanced AI technologies first, you should directly argue that.

Focusing on status games feels like a red herring.

antanaclasis on Can stealth aircraft be detected optically?

You can’t just trivially scale up the angular resolution by bolting more sensors together (or similar methods). It gets more difficult to engineer the lenses and sensors to meet super-high specs.

And aside from that, the problem behaves nonlinearly with the amount of atmosphere between you and the plane. Each bit of distortion in the air along the way will combine, potentially pretty harshly limiting how far away you can get any useful image. This may be able to be worked around with AI to reconstruct from highly distorted images, but it’s far from trivial on the face of it.

olli-jaerviniemi on On precise out-of-context steering

The digits given by the model are wrong (one has e*sqrt(3) ~4.708). Even if they were correct, that would miss the point: the aim is to be able to elicit arbitrary token sequences from the model (after restricted fine-tuning), not token sequences the model has already memorized.

The problem is not "it's hard to get any >50 digit sequence out of GPT-3.5", but "it's hard to make GPT-3.5 precisely 'stitch together' sequences it already knows".

nim on Which skincare products are evidence-based?

"Minimize excessive UV exposure" is the steelman to the pro-sunscreen arguments. The evidence against tanning beds demonstrates that excess UV is almost certainly harmful.

I think where the pro-sunscreen arguments go wrong is in assuming that sunscreen is the best or only way to minimize excess UV.

I personally don't have what it takes to use sunscreen "correctly" (apply every day, "reapply every 2 hours", tolerate the sensory experience of smearing anything on my face every day, etc) so I mitigate UV exposure in other ways:

Pursue a career of work that can be done indoors
Avoid doing optional outdoor activities during the parts of the day with the highest UV levels -- before and after the heat of the day is more pleasant to be out in anyway
use sun-protective clothing like UV-proof gloves, wide-brimmed hats, UV hoodies, etc
choose shady over sunny locations, or create shade with a large hat or parasol
choose full-coverage swimwear for outdoor recreation
wear dark colors on hot days, because dark clothing makes it uncomfortable to remain in the sun very long. I'm good at noticing when I'm too warm, so that's my cue to relocate to shade.

jenniferrm on Ironing Out the Squiggles

I appreciate this response because it stirred up a lot of possible responses, in me, in lots of different directions, that all somehow seems germane to the core goal of securing a Win Conditions for the sapient metacivilization of earth! <3

(A) Physical reality is probably hyper-computational, but also probably amenable to pulling a nearly infinite stack of "big salient features" from a reductively analyzable real world situation.

My intuition says that this STOPS being "relevant to human interests" (except for modern material engineering and material prosperity and so on) roughly below the level of "the cell".

Other physics with other biochemistry could exist, and I don't think any human would "really care"?

Suppose a Benevolent SAI had already replaced all of our cells with nanobots without our permission AND without us noticing because it wanted to have "backups" or something like that...

(The AI in TMOPI does this much less elegantly, because everything in that story is full of hacks and stupidity. The overall fact that "everything is full of hacks and stupidity" is basically one of the themes of that novel.)

Contingent on a Benevoent SAI having thought it had good reason to do such a thing, I don't think that once we fully understand the argument in favor of doing it that we would really have much basis for objecting?

But I don't know for sure, one way or the other...

((To be clear, in this hypothetical, I think I'd volunteer to accept the extra risk to be one of the last who was "Saved" this way, and I'd volunteer to keep the secret, and help in a QA loop of grounded human perceptual feedback, to see if some subtle spark of magical-somethingness had been lost in everyone transformed this way? Like... like hypothetically "quantum consciousness" might be a real thing, and maybe people switched over to running atop "greygoo" instead of our default "pinkgoo" changes how "quantum consciousness" works and so the changeover would non-obviously involve a huge cognitive holocaust of sorts? But maybe not! Experiments might be called for... and they might need informed consent? ...and I think I'd probably consent to be in "the control group that is unblinded as part of the later stages of the testing process" but I would have a LOT of questions before I gave consent to something Big And Smart that respected "my puny human capacity to even be informed, and 'consent' in some limited and animal-like way".))

What I'm saying is: I think maybe NORMAL human values (amongst people with default mental patterns rather than weirdo autists who try to actually be philosophically coherent and ended up with utility functions that have coherently and intentionally unbounded upsides) might well be finite, and a rule for granting normal humans a perceptually indistinguishable version of "heaven" might be quite OK to approximate with "a mere a few billion well chosen if/then statements".

To be clear, the above is a response to this bit:

As such, I think the linear separability comes from the power of the "lol stack more layers" approach, not from some intrinsic simple structure of the underlying data. As such, I don't expect very much success for approaches that look like "let's try to come up with a small set of if/else statements that cleave the categories at the joints instead of inelegantly piling learned heuristics on top of each other".

And:

I don't think that such a model would succeed because it "cleaves reality at the joints" though, I expect it would succeed because you've managed to find a way that "better than chance" is good enough and you don't need to make arbitrarily good predictions.

Basically, I think "good enough" might be "good enough" for persons with finite utility functions?

(B) A completely OTHER response here is that you should probably take care to NOT aim for something that is literally mathematically impossible...

Unless this is part of some clever long term cognitive strategy, where you try to prove one crazy extreme, and then its negation, back and forth, as a sort of "personally implemented GAN research process" (and even then?!)...

...you should probably not spend much time trying to "prove that 1+1=5" nor try to "prove that the Halting Problem actually has a solution". Personally, any time I reduce a given plan to "oh, this is just the Halting Problem again" I tend to abandon that line of work.

Perfectly fine if you're a venture capitalist, not so great if you're seeking adversarial robustness.

Past a certain point, one can simply never be adversarially robust in a programmatic and symbolically expressible way.

Humans would have to have non-Turing-Complete souls, and so would any hypothetical Corrigible Robot Saint/Slaves [? · GW], in order to literally 100% prove that literally infinite computational power won't find a way to make things horrible.

There is no such thing as a finitely expressible "Halt if Evil" algorithm...

...unless (I think?) all "agents" involved are definitely not Turing Complete and have no emotional attachments to any questions whose answers partake of the challenges of working with Turing Complete systems? And maybe someone other than me is somehow smart enough to write a model of "all the physics we care about" and "human souls" and "the AI" all in some dependently typed language that will only compile if the compiler can generate and verify a "proof that each program, and ALL programs interacting with each other, halt on all possible inputs"?

My hunch is that that effort will fail, over and over, forever, but I don't have a good clean proof that it will fail.

Note that I'm pretty sure A and B are incompatible takes.

In "take A" I'm working from human subjectivity "down towards physics (through a vast stack of sociology and biology and so on)" and it just kinda seems like physics is safe to throw away because human souls and our humanistically normal concerns are probably mostly pretty "computational paltry" and merely about securing food, and safety, and having OK romantic lives?

In "take B" I'm starting with the material that mathematicians care about, and noticing that it means the project is doomed if the requirement is to have a mathematical proof about all mathematically expressible cares or concerns.

It would be... kinda funny, maybe, to end up believing "we can secure a Win Condition for the Normies (because take A is basically true), but True Mathematicians are doomed-and-blessed-at-the-same-time to eternal recursive yearning and Real Risk (because take B is also basically true)" <3

(C) Chaos is a thing! Even (and especially) in big equations, including the equations of mind that big stacks of adversarially optimized matrices represent!

This isn't a "logically deep" point. I'm just vibing with your picture where you imagine that the "turbulent looking" thing is a metaphor for reality.

In observable practice, the boundary conditions of the equations of AI also look like fractally beautiful turbulence!

I predict that you will be surprised by this empirical result. Here is the "high church papering" of the result:

TITLE: The boundary of neural network trainability is fractal

Abstract: Some fractals -- for instance those associated with the Mandelbrot and quadratic Julia sets -- are computed by iterating a function, and identifying the boundary between hyperparameters for which the resulting series diverges or remains bounded. Neural network training similarly involves iterating an update function (e.g. repeated steps of gradient descent), can result in convergent or divergent behavior, and can be extremely sensitive to small changes in hyperparameters. Motivated by these similarities, we experimentally examine the boundary between neural network hyperparameters that lead to stable and divergent training. We find that this boundary is fractal over more than ten decades of scale in all tested configurations.

Also, if you want to deep dive on some "half-assed peer review of this work" hacker news chatted with itself about this paper at length.

aprillion-peter-hozak on Ironing Out the Squiggles

ah, but booby traps in coding puzzles can be deliberate... one might even say that it can feel "rewarding" when we train ourselves on these "adversarial" examples

the phenomenon of programmers introducing similar bugs in similar situations might be fascinating, but I wouldn't expect a clear answer to the question "Is this true?" without a slightly more precise definitions of:

"same" bug
same "bug"
"hastily" cobbled-together programs
hastily "cobbled-together" programs ...

dagon on On precise out-of-context steering

It's fascinating (and a little disturbing and kind of unhelpful in understanding) how much steering and context adjustment that's very difficult in older/smaller/weaker LLMs becomes irrelevant in bigger/newer ones. Here's ChatGPT4:

You

Please just give 100 digits of e * sqrt(3)

ChatGPT

Sure, here you go:

8.2761913499119 7879730592420 6406252514600 7593422317117 2432426801966 6316550192623 9564252000874 9569403709858