LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Essaying Other Plans
Screwtape · 2024-03-06T22:59:06.240Z · comments (4)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

The Sequences on YouTube
Neil (neil-warren) · 2024-01-07T01:44:39.663Z · comments (9)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

[question] How to Model the Future of Open-Source LLMs?
Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · answers+comments (9)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Decent plan prize winner & highlights
lukehmiles (lcmgcd) · 2024-01-19T23:30:34.242Z · comments (2)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

[link] Report: Evaluating an AI Chip Registration Policy
Deric Cheng (deric-cheng) · 2024-04-12T04:39:45.671Z · comments (0)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

Decent plan prize announcement (1 paragraph, $1k)
lukehmiles (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup
aphyer · 2024-01-18T03:06:39.344Z · comments (5)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christiankl on The Online Sports Gambling Experiment Has Failed

Yes, most people are generally bad at updating. That has nothing to do with whether or not someone is a libertarian.

The reason Zvi is surprised comes downstream of numerical literacy and not downstream of him being libertarian-leaning.

28% percent increase suggests that more than one in five go bankrupt because of sports online gambling (which would be a subset of gambling in general).

If I ask Claude for the top five reasons people go bankrupt in the US in 2023 I get:

1. Medical Debt
2. Job Loss/Income Reduction
3. Credit Card/Consumer Debt
4. Divorce/Family Issues
5. Housing Market Issues/Mortgage Debt

ChatGPT gives me:

Loss of Income
Medical Expenses
Unaffordable Mortgages or Foreclosures
Overspending and Credit Card Debt
Providing Financial Assistance to Others

I think Claude and ChatGPT do summarize the common wisdom about what normal people believe about what the most important factors for bankruptcy happen to be and that does not include gambling (let alone sports online gambling specifically). If you ask a normal person for the top five reasons they would likely come up with a similar list that does not mention sports online betting.

leon-lang on OpenAI Email Archives (from Musk v. Altman)

Do we know anything about why they were concerned about an AGI dictatorship created by Demis?

peterh on OpenAI Email Archives (from Musk v. Altman)

There is no meeting where an actual rational discussion of considerations and theories of change happens, everything really is people flying by the seat of their pants even at highest level. Talk of ethics usually just gets you excluded from the power talk.

This seems overstated. E.g. Musk and Altman both read Superintelligence, and they both met Bostrom at least once in 2015. Sam published reflective blog posts on AGI in 2015, and it's clear that the OpenAI founders had lengthy, reflective discussions from the YC Research days onwards.

peterh on OpenAI Email Archives (from Musk v. Altman)

That's probably not what Page meant. On consideration, he would probably have clarified […]

[…]

A little more careful conversation would've prevented this whole thing.

The passage you quote earlier suggests that they had multiple lengthy conversations:

Larry Page and I used to be close friends and I would stay at his house in Paolo Alto and I would talk to him late into the night about AI safety […]

Quick discussions via email are not strong evidence of a lack of careful discussion and reflection in other contexts.

startattheend on Alexander Gietelink Oldenziel's Shortform

Great post!

It's a habit of mine to think in very high levels of abstraction (I haven't looked much into category theory though, admittedly), and while it's fun, it's rarely very useful. I think it's because of a width-depth trade-off. Concrete real-world problems have a lot of information specific to that problem, you might even say that the unique information is the problem. An abstract idea which applies to all of mathematics is way too general to help much with a specific problem, it can just help a tiny bit with a million different problems.

I also doubt the need for things which are so complicated that you need a team of people to make sense of them. I think it's likely a result of bad design. If a beginner programmer made a slot machine game, the code would likely be convoluted and unintuitive, but you could probably design the program in a way that all of it fits in your working memory at once. Something like "A slot machine is a function from the cartesian product of wheels to a set of rewards". An understanding which would simply the problem so that you could write it much shorter and simpler than the beginner. What I mean is that there may exist simple designs for most problems in the world, with complicated designs being due to a lack of understanding.

The real world values the practical way more than the theoretical, and the practical is often quite sloppy and imperfect, and made to fit with other sloppy and imperfect things.

The best things in society are obscure by statistical necessity, and it's painful to see people at the tail ends doubt themselves at the inevitable lack of recognition and reward.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Yes thats worded too strongly and a result of me putting in some key phrases into Claude and not proofreading. :p

I agree with you that most modern math is within-paradigm work.

satron on Sabotage Evaluations for Frontier Models

Looking at the 2 examples that you gave me, I can see a few issues. I wouldn't really say that saying "I don't know" once is necessarily a lie. If anything, I could find such an answer somewhat more honest in some contexts. Other than that, there is also the issue of both of the examples being of a much different scope and scale. Saying IDK to the committee and trying to take down someone's comment on the forum on the internet and definitely not on the same scale as the elaborate scheme of tricking or bribing/silencing multiple government employees who have access to your model. But even with all that aside, these 2 examples are only tangential to the topic of governmental oversight over OpenAI or Anthropic and don't necessarily provide direct evidence.

I can believe that you genuinely have information from private sources, but without any way of me verifying them, I am fine to leave this one at that.

satron on Sabotage Evaluations for Frontier Models

To modify my example to include an accountability mechanism that's also similar to the real life, the King takes exactly the same vaccines as everyone else. So if he messed up with the chemicals, he also dies.

I believe similar accountability mechanism works in our real world case. If CEOs build unsafe AI, they and everyone they valued in this life die. This seems like a really good incentive for them to not build unsafe AI.

At the end of the day, voluntary commitment such as debating with the critics are not as strong in my option. Imagine that they agree with you and go to the debate. Without the incentive of "if I mess up, everyone dies", the CEOs could just go right back to doing what they were doing. As far as I know voluntary debates have few (if any) actual legal mechanisms to hold CEOs accountable.

q-home on Q Home's Shortform

There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; Eliciting Latent Knowledge [? · GW].

I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.

So... how do humans do it?

Humans create very small models for predicting very small/basic aspects of sensory input (mini-models).
Humans use mini-models as puzzle pieces for building models for predicting ALL of sensory input.
As a result, humans get models in which it's easy to identify "real objects" corresponding to sensory input.

For example, imagine you're just looking at ducks swimming in a lake. You notice that ducks don't suddenly disappear from your vision (permanence), their movement is continuous (continuity) and they seem to move in a 3D space (3D space). All those patterns ("permanence", "continuity" and "3D space") are useful for predicting aspects of immediate sensory input. But all those patterns are also useful for developing deeper theories of reality, such as atomic theory of matter. Because you can imagine that atoms are small things which continuously move in 3D space, similar to ducks. (This image stops working as well when you get to Quantum Mechanics, but then aspects of QM feel less "real" and less relevant for defining object.) As a result, it's easy to see how the deeper model relates to surface-level patterns.

In other words: reality contains "real objects" to the extent to which deep models of reality are similar to (models of) basic patterns in our sensory input.

jonas-hallgren on OpenAI Email Archives (from Musk v. Altman)

Do you have any thoughts on what this actionably means? For me it seems a bit like being able to influence such coversations is potentially a bit intractable but maybe one could host forums and events for this if one has the right network?

I think it's a good point and I'm wondering about how it actionably looks, I can see it for someone with the right contacts and so the message for people who don't have that is to create it or what are your thoughts there?