LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

[link] Conflict in Posthuman Literature
Martín Soto (martinsq) · 2024-04-06T22:26:04.051Z · comments (1)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (59)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

[link] Some rules for life (v.0,0)
Neil (neil-warren) · 2023-08-17T00:43:57.913Z · comments (13)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

Startup Roundup #1: Happy Demo Day
Zvi · 2023-09-12T13:20:03.883Z · comments (5)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (9)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset]
abstractapplic · 2024-01-22T19:20:05.001Z · comments (7)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Debate series: should we push for a pause on the development of AI?
Xodarap · 2023-09-08T16:29:51.367Z · comments (1)

[link] Seth Explains Consciousness
Jacob Falkovich (Jacobian) · 2023-08-22T18:06:42.653Z · comments (125)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on Why I’m not a Bayesian

Idk, I guess the more fundamental issue is this treats the goal as simply being assigning probabilities to statements in predicate logic, whereas his point is more about whether one can do compositional reasoning about relationships while dealing with nebulosity, and it's this latter thing that's the issue.

cousin_it on OpenAI defected, but we can take honest actions

The situation where AI is a good tool for manipulating public opinion, and the leading AI company has a bad reputation, seems unstable. Maybe AI just needs to get a little better, and then AI-written arguments in favor of AI will win public opinion decisively? This could "lock in" our trajectory even worse than now, and could happen long before AGI.

cubefox on A brief theory of why we think things are good or bad

If I believe eating meat is bad because I engage in motivated reasoning, then this is, like all forms of motivated reasoning, just an irrational belief. But if I believe eating meat is bad because I believe it creates a disproportionate amount of suffering, there is nothing irrational about that belief. So motivated reasoning can only explain some irrational beliefs. However, when something being bad means that it decreases some sort of welfare in some general way, then we don't have this problem. Now, what exactly does "welfare" etc mean? That's a question that normative ethicists try to figure out. For example via various proposed theories of utilitarianism. If philosophers are analyzing a subject matter, it's safe to assume they are analyzing some of concept. Now, what's a concept? It's a meaning of a word. Like "good" or "bad".

remmelt-ellen on OpenAI defected, but we can take honest actions

Sam Altman demonstrating what kind of actions you can get away with in front of everyone's eyes seems problematic.

Very much agreeing with this.

remmelt-ellen on OpenAI defected, but we can take honest actions

Appreciating your inquisitive question!

One way to think about it:

For OpenAI to scale more toward “AGI”, the corporation needs more data, more automatable work, more profitable uses for working machines, and more hardware to run those machines.

If you look at how OpenAI has been increasing those four variables, you can notice that there are harms associated with each. This tends to result in increasing harms.

One obvious example: if they increase hardware, this also increases pollution (from mining, producing, installing, and running the hardware).

Note that the above is not a claim that the harms outweigh the benefits. But if OpenAI & co continue down their current trajectory, I expect that most communities would look back and say that the harms to what they care about in their lives were not worth it.

I wrote a guide to broader AI harms meant to emotionally resonate for laypeople here.

viliam on Information vs Assurance

Related: StackExchange: Why are estimates treated like deadlines?

There is even a comment perfectly illustrating that some people will interpret all information as an assurance (and insist that it is your fault that they do so), unless you explicitly tell them not to:

I will consider your estimate to be an educated projection based your experience, skills and qualification. [...] if you made an estimate, you need to stand by it. [...] if you say a time then that time becomes yours to live and die by. You said it, you own it. I recalculated my projections, my finances and my resources based on it. You should have said a range of estimates but you didn't. Estimates becomes deadlines because of your own fault.
If a Doctor gave an estimate to a paitent of 6-12 months of life remaining and the actual amount was 1 month we would investigate why the Doctor was so incorrect. If a builder estimates X tons of concrete and the requirements are double we do not blame the owner of the house. [...] I would be perfectly comfortable raising a legal counter-charge against a contractor for poor estimates.
If your estimate is off you are either fired, charged or will fail to be re-hired. When a carpenter makes an estimate of between X and Y inches he is expected to be right. I apply the same scrutiny to developers regardless of voodoo regarding statistics. If you cannot make accurate estimates you are either not very good at your job (I doubt that) or are unwilling to accept responsibility.
[If you respond to this attitude by padding your estimates] you would be out of a contract for padding estimates by a ridiculous factor margin. Not to mention a counter-charge against you or your agency for misconduct. [...] don't think for one second a good development team can hide behind the word estimate or any agile principle as some sort of sandbag against criticism. [...] It sounds like you have no confidence in your own abilities.

(I guess this person wouldn't even have to fire me for making a wrong estimate; if I worked with them I would happily quit.)

jeremy-gillen on (Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need

I think the problem might be that you've given this definition of heuristic:

A heuristic is a local, interpretable, and simple function (e.g., boolean/arithmetic/lookup functions) learned from the training data. There are multiple heuristics in each layer and their outputs are used in later layers.

Taking this definition seriously, it's easy to decompose a forward pass into such functions.

But you have a much more detailed idea of a heuristic in mind. You've pointed toward some properties this might have in your point (2), but haven't put it into specific words.

Some options: A single heuristic is causally dependent on <5 heuristics below and influences <5 heuristics above. The inputs and outputs of heuristics are strong information bottlenecks with a limit of 30 bits. The function of a heuristic can be understood without reference to >4 other heuristics in the same layer. A single heuristic is used in <5 different ways across the data distribution. A model is made up of <50 layers of heuristics. Large arrays of parallel heuristics often output information of the same type.

Some combination of these (or similar properties) would turn the heuristics intuition into a real hypothesis capable of making predictions.

If you don't go into this level of detail, it's easy to trick yourself into thinking that (2) basically kinda follows from your definition of heuristics, when it really really doesn't. And that will lead you to never discover the value of the heuristics intuition, if it is true, and never reject it if it is false.

gokceozantoptas on Advice on Communicating Concisely

I also have suffered from this (and still do, really). I will share some lessons that I have picked along the way, followed by a couple of book recommendations.

The lessons:

Focus: Most of the time the issue is you are trying to communicate way too many things. Now I try to contain my message to one single point. This helped me immensely.
Message House: A framework in branding and PR, I advise you to do a quick web search on this. With my previous bullet, I "construct" my message house with these components whenever possible: (1) Anecdote, preferably a personal one. Kicking it with a very short story that is central to your idea helps capture people's attention immediately; (2) why this matters, why I am telling you about this; (3) Sizzle, or a very quotable quote. If people wanted to tweet one sentence from your "speech" this is it; (4) Data point or one last anecdote to enforce the central theme.
Fluency: You might know what you are talking about, but if you are not fluent about it, it takes a lot of time to put stuff together, and it feels like you don't know what you are talking about from the outside. To overcome this, you can drill things down. Anecdotes or data points you occasionally use; you can create word blocks of your world view that signals where you are looking at things from; you can have identity related short blocks and exercise on them constantly to get yourself fluent on very specific little blocks, which then you can use as springboards or solid middle- or end-points in your speech.

Book recommendations:

Smart Brevity by Jim VandeHei, Mike Allen, and Roy Schwartz
You've Got 8 Seconds by Paul Hellman

cubefox on Information vs Assurance

There is a sort of opposite to assurance, where someone communicates their intention to do something (not merely a prediction that they will do it), but without creating any responsibility to follow through. This is usually done via non-verbal communication, like tone of voice, facial expression and "body language". The fact that the intention was communicated non-verbally creates plausible deniability. This happens, for example, in relationships. E.g. when asking after a date if they want to go to his/her room.

amalthea on OpenAI defected, but we can take honest actions

I'd agree the OpenAI product line is net positive (though not super hung up on that). Sam Altman demonstrating what kind of actions you can get away with in front of everyone's eyes seems problematic.