LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

Individually incentivized safe Pareto improvements in open-source bargaining
Nicolas Macé (NicolasMace) · 2024-07-17T18:26:43.619Z · comments (2)

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

[link] [Paper] Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee (bruce-lee) · 2024-02-22T18:52:32.237Z · comments (23)

D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset]
abstractapplic · 2024-01-22T19:20:05.001Z · comments (7)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (60)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (9)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

going-durden on When do "brains beat brawn" in Chess? An experiment

A related thought: an intelligence can only work on the information that it has, regardless of its veracity, and it can only work on information that actually exists.

My hunch is that the plan of "AI boostraps itself to superintelligence, then superpower, then wipes out humanity" relies on it having access to information that is too well hidden to divine through sheer calculation and infogathering, regardless of its intelligence (ex: the location of all the military bunkers, and nuclear submarines humanity has), or simply does not exist (ex: future Human strategic choices based on coin-flips).

Most AI Apocalypse scenarios depend not only on the AI being superhumanly smart, but being inexplicably Omniscient about things that nobody could be plausibly Omniscient about.

lone17 on Refusal in LLMs is mediated by a single direction

Thanks for the insight on the locality check experiment.

For inducing refusal, I used the code from the demo notebook provided in your post. It doesn't have a section on inducing refusal but I just invert the difference-in-means vector and set the intervention layer to the single layer where said vector was extracted. I believe this has the same effect as what you described, which is to apply the intervention to every token at a single layer. Will checkout your repo to see if I missed something. Thank you for the discussion.

going-durden on Bitter lessons about lucid dreaming

this might not actually be always beneficial. Lucid dreaming also means you remember much more from the dreams, which can extend the lifespan of your recurring nightmares. Not to mention, if you dream lucidly, your consciousness is not resting, and intrusive thoughts will pile up.

christian-z-r on D&D.Sci September 2022: The Allocation Helm

Just putting a guess in here, before I go check if it is true:

Actually the 'Houses' have no effect, they are just the names of the different groups. In order to get a good rating, the members of each house should be as close as possible in Stat-space, or perhaps all be high in one stat (still experimenting with this). Since the early students were all placed by a functioning hat, each house had a well defining place in Stat space that it would carry on with. But since all current students have been randomly selected, we don't have to worry about this historical data. Instead, we should try to get the new students as close as possible to the randomly generated spot in Stat space for the current students. As such, I think Serpentyne might become the new House of Integrity. (I do believe a strange thing like this is also happening in real life, and is one of the main ways that political parties gradually change their positions in Stat space).

going-durden on Bitter lessons about lucid dreaming

My hypothesis is that a lot of things that seem impossible or very hard in a dream, are simply too boring to focus on. Its totally possible to consciously dream up a page of text, but who would really want to waste precious dreamtime to type?

tiago-macedo on Conservation of Expected Evidence and Random Sampling in Anthropics

But Heads outcome in Incubator Sleeping Beauty is not. You are not randomly selected among two immaterial souls to be instantiated. You are a sample of one. And as there is no random choice happening, you are not twice as likely to exist when the coin is Tails and there is no new information you get when you are created.

I am twice as likely to exist when the coin is Tails! After all, if the coin is Tails, then there are two of me. I understand how this can lead to a thirder conclusion:

Heads implies one chance for me to exist.
Tails implies two chances for me to exist.
I observe that I exist. This is predicted "twice as much" by the coin being Tails then Heads, so the probability of Tails is 2/3.

However, this there is a mistake happening in this reasoning. The correct one is the following:

Heads implies the the number of "mes" will be 1.
Tails implies the number of "mes" will be 2.
I observe that I exist. Does this mean that there is 1 of me, or 2 of me? I don't know.

So we can't extract information from my existence, and we're back to normalcy: 1/2 chance of Head or Tails.

going-durden on Bitter lessons about lucid dreaming

I have a suspicion that "flying dreams" have more to do with the state of your physical body than just your mind. I noticed I only dream of flight (or rather, levitation) if my muscles are very relaxed, like after a good massage, long hot bath, or good stretching. If im physically tense, either from effort or from stress, then I either cannot fly in a dream at all, or I keep losing the ability and falling, often with enough distress to wake myself up.

going-durden on Bitter lessons about lucid dreaming

In my experience, conscious Daydreaming can achieve the same results but more consistently. But then again, my imagination is extremely visual, I tend to "think in VR movies", so Lucid Daydreaming comes easier than Lucid Dreaming, and is far more controllable.

going-durden on Bitter lessons about lucid dreaming

I noticed that the ability to LD is strongly correlated with the condition known as "Maladaptive Daydreaming" (the "maladaptive" part here is subjective and situational, but it basically means the ability and need to have very addctive, vivid, VR-like daydreams that obscure waking reality).

I used to suffer from MD, until I learned to control it well enough to just be benign Daydreaming. Simultaneously, I achieved the ability to LD, which works on very similar principles to controlled Daydreaming.

The trick to LD if you are a person who daydreams visually, is to focus on plausibility. Trying to consciously train your daydreaming mind to enforce realistic, plausible daydream scenarios leads to the same mental need to "fix" unrealistic dreams, which either wakes you up from the dream or makes it Lucid.

Now, all that being said, LDs rarely approach the quality of Daydreams. Its extremely hard to make a Lucid Dream realistic and detailed enough not to feel trippy. Moreover, while most Daydreamers can make their Daydreams simulate tactile sensations, you cannot do the same in an actual dream. For one, erotic Lucid Dreaming is almost always pointless, because your lucid mind cannot force your sleeping body to actually experience sexual pleasure, let alone orgasm. If you are a bio male, it is likely you won't even achieve erection, so LD sex feels like trying to play pool with a rope.

The only good use I ever got from LDs is that it lets you remember bits of your dreams better and use it as raw footage to edit into your Daydreams.

khafra on The salt in pasta water fallacy

Note also that there are several free parameters in this example. E.g., I just moved to Germany, and now have wimpy German burners on my stove. If I put on a large container with 6L or more of water, and I do not cover it, the water will never go beyond bubble formation into a light simmer, let alone a rolling boil. If I cover the container at this steady state, it reaches a rolling boil in about another 90s.