LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

Love, Reverence, and Life
Elizabeth (pktechgirl) · 2023-12-12T21:49:04.061Z · comments (7)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery (arjun-panickssery) · 2024-01-15T21:21:03.962Z · comments (0)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

Boston Solstice 2023 Retrospective
jefftk (jkaufman) · 2024-01-02T03:10:05.694Z · comments (0)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

Important open problems in voting
Closed Limelike Curves · 2024-07-01T02:53:44.690Z · comments (1)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

[link] Vacuum: Theory and Technologies
ethanmorse · 2024-01-21T17:23:49.257Z · comments (0)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov (igor-ivanov) · 2024-01-24T15:45:08.795Z · comments (4)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

Basics of Handling Disagreements with People
Camille Berger (Camille Berger) · 2024-11-12T17:55:08.143Z · comments (4)

What AI companies should do: Some rough ideas
Zach Stein-Perlman · 2024-10-21T14:00:10.412Z · comments (10)

Empathy/Systemizing Quotient is a poor/biased model for the autism/sex link
tailcalled · 2024-11-04T21:11:57.788Z · comments (0)

[link] Concrete benefits of making predictions
Jonny Spicer (jonnyspicer) · 2024-10-17T14:23:17.613Z · comments (5)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (12)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on Making a conservative case for alignment

I agree with everything you've said. The advantages are primarily from not aligning to values but only to following instructions rather than using RL or any other process to infer underlying values. Instruction-following AGI is easier and more likely than value aligned AGI [LW · GW].

I think creating real AGI based on an LLM aligned to be helpful, harmless and honest would probably be the end of us, as carrying the set of value implied by RLHF to their logical conclusions outside of human control would probably be pretty different from our desired values. Instruction-following provides corrigibililty.

fabien-roger on AI Control: Improving Safety Despite Intentional Subversion

That would work if the untrusted model is better at inserting subtle backdoors than the "human" high-quality labor used to generate the synthetic backdoors (used to validate that the untrusted model doesn't just lie).

In our work we make the assumption that the "human" labor and the untrusted model are as capable (in fact we use GPT-4 as a substitute for "human" labor), so if I understood your suggestion correctly this doesn't quite work in the context of the paper, though this can be used to slightly improve the red-team untrusted monitor - if we did not already tell it to flag obvious backdoors (I don't remember exactly what we did).

It's unclear how much this assumption will hold in practice for the untrusted AIs I care most about (the first AIs able to speed up alignment research by ≥20x). I am optimistic it will hold because humans will be able to spend a lot of time crafting backdoors, while untrusted AIs will only have a forward pass.

atillayasar on AtillaYasar's Shortform

Cognitive workspaces, not end-to-end algorithms

It's summer 2020 and gpt3 is superhuman in many ways and probably could take over the world but only when its alien goo flows through the right substrate, channeled into the right alchemical chambers, then forked and rejoined and reshaped in the right way, but somehow nobody can build it.

Robustness via pauseability

The system can be mode robust by letting you stop execution when it's going wrong, and edit a previous output, or edit the code that generated it, and retry.

Neat things like perfect JSON-schema following, if the overarching system is implemented correctly, should speed up any given cognitive task, simply by making it less likely that you have to edit and retry the steps along the way. It just makes the step of `prompt + object shape --> object` run flawlessly, as a single "step", rather than a bunch of things you have to do manually.

The same goes for how strong of a model you use. Increasing model strength should be a trade of dollars for retry count and tinker time and number of intermediate steps, it shouldn't enable qualitatively different tasks.

(At least not at every model upgrade step. There are (probably) some things that you can't do with GPT-2, that you can do with GPT-4, no matter how powerful your UI and workflows are.)

A cool way of seeing it is that you turn the time dimension, which is a single point (0-D) in end-to-end algorithms, into a 1-D line, where each point along the path contains a pocket dimension you can go into. (because you can pause, then edit and retry)

Practical example

I'll start with a simple task and a naive solution, then by showing the problems that come up, I'll show the engineering challenges (or "app specs") that exist.

I personally learned these ideas by doing exactly this, by trying to build apps that help me, because *surely*, a general intelligence that is basically superhuman (I thought this even with gpt3) can completely transform my life and the world! Right?! No.

I believe that such problems are fundamental to tasks where you're trying to get AI to help you, that that's why an interactive UI-based approach is the right one, as opposed to an end-to-end like approach.

Task: you have an article with untitled paragraphs (shoutout Paul Graham) and you want AI to give it titles.

1)
Strong model solves it one shot, returning the full article.

1 -- problem)
Okay, fine.
But what if it's a really long article and you don't want the AI wasting 95% of its tokens on regurgitating paragraphs, or you're very particular about your titles and you don't wanna retry more than 5 times.

2) You build a simple script that splits it into paragraphs, prompts each instance of gpt3 to give a title, outputs your full article, and ignores certain paragraphs based on indices that are stored in a single variable.
It solves paragraph regurgitation, you selectively regenerate only titles parts you want to change, so iteration cost is much lower, and you can even have custom prompts for certain titles.

2 -- problem)
You actually had to write the script, for one. Now, what if you wanted to do this for more than just 1 article. You need to manually write the indices of paragraphs to ignore (also if you just change the shape of the article). If you want it to automatically keep track of which paragraph to ignore, you need to build a whole UI for it that contains the text, where you can toggle which ones are ignored. Or you create a mini scripting language where you can maybe add an $ignore$ line, which the parser stores, and deletes from the final article when it outputs it with the added titles.

3 and beyond)
This just goes on and on, as the scope of the task expands.
Say you're not titling paragraphs, but doing something of the shape, "I write something, code turns it into something, AI helps with some of the parts, code turns back into the format it was originally in". Now you have a way broader thing that a script may not be enough for, and that a paragraph-titler-UI is not enough for.
And why shouldn't the task scope expand? We want AI to solve all problems and create utopia, why can't they? They're smarter than me and almost everyone I know (most of the time anyway) they should be like 1000x more useful. Even if they're not *that* smart, they're still 100x faster at reading and writing, and ~~1 million x more knowledgeable.
The task scope should expand, but it can't because we have skill issues. Not because we need to build 100 billion dollar compute clusters powered by nuclear reactors.

martin-randall on The Online Sports Gambling Experiment Has Failed

No, the effect size on bankruptcies is about 10x larger than expected. So while offline gambling may be comparable to alcohol, smartphone gambling is in a different category if we trust this research.

afeller08 on Happiness and Goodness as Universal Terminal Virtues

I'm just reading this for random reasons either for the first time or for the first time that I have a response. I think what I see differently from you is not happiness but motivation. And that at the time I wrote this, I believe your process of making decisions was more future-oriented than mine was. (I believe I have converged towards you in how my motivation works in the ten years since I wrote this.) When I wrote the above, my past was clinging to me in many ways that were adverse to happiness. (Trauma) What I wasn't quite saying in my previous comment, is that I was at the time (like many other people I know) holding onto trauma in stupid ways because I needed to hold onto it to make it feel like that part of my life had a reason to have happened that wasn't purely just worthless-badness happens. Holding onto trauma was a core part of my identity and was a core part of many other people's identities. On the silliest extreme of people holding onto trauma are the people who keep playing games that they expect to keep losing with the hope of eventually winning in a way that makes all their past losses up to them. (Before 2016, being a Cubs fan was a particularly lighthearted example of people behaving this way; whereas, gambling addictions are a much less lighthearted one. Many gambling addicts are deeply aware that their hope to one day recover all they've lost through continuing to gamble is not founded in reality. I've never been a gambling addict. But I've had several relationships where I was holding onto the baseless hope that someone would change or that someone's true colors were never the colors that actually came out, etc. and I think that's more or less psychologically the same thing as a very potent gambling addiction. I think the psychological mechanism of staying in an abusive relationship is almost exactly the same as the psychological mechanism of being addicted to playing slots, but much, much stronger because abusive people are like intelligent slot machines that are studying you to make you maximally addicted to interacting with them. My overall thesis is that the brain has an enormous number of traits that I would say seem more like bugs to me than like features and that I believe your article is a much more accurate description of how human motivation should work than an accurarate explanation of how human motivation does work.) And I used silly examples to express this because I didn't want to talk about any of the real ones.

ltm on Making a conservative case for alignment

I agree with your points about avoiding political polarisation and allowing people with different ideological positions to collaborate on alignment. I'm not sure about the idea that aligning to a single group's values (or to a coherent ideology) is technically easier than a more vague 'align to humanity's values' goal.

Groups rarely have clearly articulated ideologies - more like vibes which everyone normally gets behind. An alignment approach from clearly spelling out what you consider valuable doesn't seem likely to work. Looking to existing models which have been aligned to some degree through safety testing, the work doesn't take the form of injecting a clear structured value set. Instead, large numbers of people with differing opinions and world views continually correct the system until it generally behaves itself. This seems far more pluralistic than 'alignment to one group' suggests.

This comes with the caveat that these systems are created and safety tested by people with highly abnormal attitudes when compared to the rest of their species. But sourcing viewpoints from outside seems to be an organisational issue rather than a technical one.

martin-randall on The Online Sports Gambling Experiment Has Failed

Of course some of those can be influenced by gambling, eg it is a type of overspending. Even so, Claude estimated that legalized online gambling would raise the bankruptcy rate by 2-3% and agreed that 28% is surprising.

daniel-tan on LLMs Look Increasingly Like General Reasoners

Nice article! I'm still somewhat concerned that the performance increase of o1 can be partially attributed to the benchmarks (blockworld, AGI-ARC) having existed for a while on the internet, and thus having made their way into updated training corpora (which of course we don't have access to). So an alternative hypothesis would simply be that o1 is still doing pattern matching, just that it has better and more relevant data to pattern-match towards here. Still, I don't think this can fully explain the increase in capabilities observed, so I agree with the high-level argument you present.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Looking for specific tips and tricks to break AI out of formal/corporate writing patterns. Tried style mimicry ('write like Hemingway') and direct requests ('be more creative') - both fell flat. What works?

Should I be using different AI models ( I am using GPT and Claude)? The base models output an enormous creative storm, but somehow the RLHF has partially lobotomized LLMs such that they always seem to output either cheesy stereotypes or overly verbose academise/corporatespeak.

anthonyc on AI #90: The Wall

I'm not sure where I'm proposing bureaucracy? The value is in making sure a conversation efficiently adds value for both parties, by not having to spend time rehashing things that are much faster absorbed in advance. This avoids the friction of needing to spend much of the time rehashing 101-level prerequisites. A very modest amount of groundwork beforehand maximizes the rate of insight in discussion.

I'm drawing in large part from personal experience. A significant part of my job is interviewing researchers, startup founders, investors, government officials, and assorted business people. Before I get on a call with these people, I look them (and their current and past employers, as needed) up on LinkedIn and Google Scholar and their own webpages. I briefly familiarize myself with what they've worked on and what they know and care about and how they think, as best I can anticipate, even if it's only for 15 minutes. And then when I get into a conversation, I adapt. I'm picking their brain to try and learn, so I try to adapt to their communication style and translate between their worldview and my own. If I go in with an idea of what questions I want answered, and those turn out to not be the important questions, or this turns out to be the wrong person to discuss it with, I change direction. Not doing this often leaves everyone involved frustrated at having wasted their time.

Also, should I be thinking of this as a debate? Because that's very different than a podcast or interview or discussion. These all have different goals. A podcast or interview is where I think the standard I am thinking of is most appropriate. If you want to have a deep discussion, it's insufficient, and you need to do more prep work or you'll never get into the meatiest parts of where you want to go. I do agree that if you're having a (public-facing) debate where the goal is to win, then sure, this is not strictly necessary. The history of e.g. "debates" in politics, or between creationists and biologists, shows that clearly. I'm not sure I'd consider that "meaningful" debate, though. Meaningful debates happen by seriously engaging with the other side's ideas, which requires understanding those ideas.