LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

My AI Predictions 2023 - 2026
HunterJay · 2023-10-16T00:50:52.968Z · comments (28)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (17)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

OpenAI’s new Preparedness team is hiring
leopold · 2023-10-26T20:42:35.966Z · comments (2)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kvmanthinking on The Treacherous Path to Rationality

And I’m not even mentioning the strange sexual dynamics

Is this a joke? I'm confused.

milan-w on Thoughts after the Wolfram and Yudkowsky discussion

I asked GPT4o to perform a web search for podcast appearances by Yudkowsky. It dug up these two lists (apparently, autogenerated from scrapped data). When I asked it to base use these lists as a starting point to look for high quality debates and after some further elicitation and wrangling, the best we could find was this moderated panel discussion featuring Yudkowsky, Liv Boeree, and Joscha Bach. There's also the Yudkowsky v/s George Hotz debate on Lex Fridman, and the time Yudkowsky explained AI risk to the streamer and political commentaror known as Destiny. I have watched none of the three debates I just mentioned; but I know that Hotz is a heavily vibes-based (rather than object-level-based) thinker, and that Destiny has no background in AI risk, but has good epistemics. I think he probably offered reasonable-at-first-approximation-yet-mostly-uninformed pushback.

cole-wyeth on Heresies in the Shadow of the Sequences

Perhaps Legg-Hutter intelligence.
I'm not sure how much the goal matters - probably the details depend on the utility function you want to optimize. I think you can do about as well as possible by carving out a utility function module and designing the rest uniformly to pursue the objectives of that module. But perhaps this comes at a fairly significant cost (i.e. you'd need a somewhat larger computer to get the same performance if you insist on doing it this way).
...And yes, there does exist a computer program which is remarkably good at just chess and nothing else, but that's not the kind of thing I'm talking about here.
Yes, the I/O channels should be fixed along with the hardware.

olli-jaerviniemi on The Parable of the Dagger

dagon on nikola's Shortform

Hmm. I think there are two dimensions to the advice (what is a reasonable distribution of timelines to have, vs what should I actually do). It's perfectly fine to have some humility about one while still giving opinions on the other. "If you believe Y, then it's reasonable to do X" can be a useful piece of advice. I'd normally mention that I don't believe Y, but for a lot of conversations, we've already had that conversation, and it's not helpful to repeat it.

tomcatfish on Drawing Less Wrong: Should You Learn to Draw?

People seeing this in the future: Check out Draw a Box for some low-level mechanical stuff.

milan-w on Heresies in the Shadow of the Sequences

Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise.

What does "most intelligent agent" mean?
Don't you think we'd also need to specify "for a fixed (basket of) tasks"?
Are the I/O channels fixed along with the hardware?

yams on yams's Shortform

I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don't 'think differently', in a structural sense, from their forebears; they just don't believe in that God anymore.

The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.

They're no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can't really think about those levels).

You should still be nice to them, and honest with them, but you should understand what you're getting into.

The mere biographical detail of having a religious background or being religious isn't a strong mark against someone's thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don't be so surprised when the usual jutsu falls flat!

mako-yass on nikola's Shortform

Timelines are a result of a person's intuitions about a technical milestone being reached in the future, it is super obviously impossible for us to have a consensus about that kind of thing.

Talking only synchronises beliefs if you have enough time to share all of the relevant information, with technical matters, you usually don't.

alok-singh on Derivative AT a discontinuity

added some open circles