LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (17)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

going-durden on When do "brains beat brawn" in Chess? An experiment

A related thought: an intelligence can only work on the information that it has, regardless of its veracity, and it can only work on information that actually exists.

My hunch is that the plan of "AI boostraps itself to superintelligence, then superpower, then wipes out humanity" relies on it having access to information that is too well hidden to divine through sheer calculation and infogathering, regardless of its intelligence (ex: the location of all the military bunkers, and nuclear submarines humanity has), or simply does not exist (ex: future Human strategic choices based on coin-flips).

Most AI Apocalypse scenarios depend not only on the AI being superhumanly smart, but being inexplicably Omniscient about things that nobody could be plausibly Omniscient about.

lone17 on Refusal in LLMs is mediated by a single direction

Thanks for the insight on the locality check experiment.

For inducing refusal, I used the code from the demo notebook provided in your post. It doesn't have a section on inducing refusal but I just invert the difference-in-means vector and set the intervention layer to the single layer where said vector was extracted. I believe this has the same effect as what you described, which is to apply the intervention to every token at a single layer. Will checkout your repo to see if I missed something. Thank you for the discussion.

going-durden on Bitter lessons about lucid dreaming

this might not actually be always beneficial. Lucid dreaming also means you remember much more from the dreams, which can extend the lifespan of your recurring nightmares. Not to mention, if you dream lucidly, your consciousness is not resting, and intrusive thoughts will pile up.

christian-z-r on D&D.Sci September 2022: The Allocation Helm

Just putting a guess in here, before I go check if it is true:

Actually the 'Houses' have no effect, they are just the names of the different groups. In order to get a good rating, the members of each house should be as close as possible in Stat-space, or perhaps all be high in one stat (still experimenting with this). Since the early students were all placed by a functioning hat, each house had a well defining place in Stat space that it would carry on with. But since all current students have been randomly selected, we don't have to worry about this historical data. Instead, we should try to get the new students as close as possible to the randomly generated spot in Stat space for the current students. As such, I think Serpentyne might become the new House of Integrity. (I do believe a strange thing like this is also happening in real life, and is one of the main ways that political parties gradually change their positions in Stat space).

going-durden on Bitter lessons about lucid dreaming

My hypothesis is that a lot of things that seem impossible or very hard in a dream, are simply too boring to focus on. Its totally possible to consciously dream up a page of text, but who would really want to waste precious dreamtime to type?

tiago-macedo on Conservation of Expected Evidence and Random Sampling in Anthropics

But Heads outcome in Incubator Sleeping Beauty is not. You are not randomly selected among two immaterial souls to be instantiated. You are a sample of one. And as there is no random choice happening, you are not twice as likely to exist when the coin is Tails and there is no new information you get when you are created.

I am twice as likely to exist when the coin is Tails! After all, if the coin is Tails, then there are two of me. I understand how this can lead to a thirder conclusion:

Heads implies one chance for me to exist.
Tails implies two chances for me to exist.
I observe that I exist. This is predicted "twice as much" by the coin being Tails then Heads, so the probability of Tails is 2/3.

However, this there is a mistake happening in this reasoning. The correct one is the following:

Heads implies the the number of "mes" will be 1.
Tails implies the number of "mes" will be 2.
I observe that I exist. Does this mean that there is 1 of me, or 2 of me? I don't know.

So we can't extract information from my existence, and we're back to normalcy: 1/2 chance of Head or Tails.

going-durden on Bitter lessons about lucid dreaming

I have a suspicion that "flying dreams" have more to do with the state of your physical body than just your mind. I noticed I only dream of flight (or rather, levitation) if my muscles are very relaxed, like after a good massage, long hot bath, or good stretching. If im physically tense, either from effort or from stress, then I either cannot fly in a dream at all, or I keep losing the ability and falling, often with enough distress to wake myself up.

going-durden on Bitter lessons about lucid dreaming

In my experience, conscious Daydreaming can achieve the same results but more consistently. But then again, my imagination is extremely visual, I tend to "think in VR movies", so Lucid Daydreaming comes easier than Lucid Dreaming, and is far more controllable.

going-durden on Bitter lessons about lucid dreaming

I noticed that the ability to LD is strongly correlated with the condition known as "Maladaptive Daydreaming" (the "maladaptive" part here is subjective and situational, but it basically means the ability and need to have very addctive, vivid, VR-like daydreams that obscure waking reality).

I used to suffer from MD, until I learned to control it well enough to just be benign Daydreaming. Simultaneously, I achieved the ability to LD, which works on very similar principles to controlled Daydreaming.

The trick to LD if you are a person who daydreams visually, is to focus on plausibility. Trying to consciously train your daydreaming mind to enforce realistic, plausible daydream scenarios leads to the same mental need to "fix" unrealistic dreams, which either wakes you up from the dream or makes it Lucid.

Now, all that being said, LDs rarely approach the quality of Daydreams. Its extremely hard to make a Lucid Dream realistic and detailed enough not to feel trippy. Moreover, while most Daydreamers can make their Daydreams simulate tactile sensations, you cannot do the same in an actual dream. For one, erotic Lucid Dreaming is almost always pointless, because your lucid mind cannot force your sleeping body to actually experience sexual pleasure, let alone orgasm. If you are a bio male, it is likely you won't even achieve erection, so LD sex feels like trying to play pool with a rope.

The only good use I ever got from LDs is that it lets you remember bits of your dreams better and use it as raw footage to edit into your Daydreams.

khafra on The salt in pasta water fallacy

Note also that there are several free parameters in this example. E.g., I just moved to Germany, and now have wimpy German burners on my stove. If I put on a large container with 6L or more of water, and I do not cover it, the water will never go beyond bubble formation into a light simmer, let alone a rolling boil. If I cover the container at this steady state, it reaches a rolling boil in about another 90s.