LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Efficiency and resource use scaling parity
Ege Erdil (ege-erdil) · 2023-08-21T00:18:01.243Z · comments (0)

Barbieheimer: Across the Dead Reckoning
Zvi · 2023-08-01T13:00:05.700Z · comments (17)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

Job listing: Communications Generalist / Project Manager
Gretta Duleba (gretta-duleba) · 2023-11-06T20:21:03.721Z · comments (7)

AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

AI #26: Fine Tuning Time
Zvi · 2023-08-24T15:30:06.626Z · comments (6)

Basic Mathematics of Predictive Coding
Adam Shai (adam-shai) · 2023-09-29T14:38:28.517Z · comments (6)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

AI #24: Week of the Podcast
Zvi · 2023-08-10T15:00:04.438Z · comments (5)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

Public Weights?
jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · comments (19)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter · 2023-11-08T11:37:43.997Z · comments (0)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Wrong answer bias
lukehmiles (lcmgcd) · 2024-02-01T20:05:38.573Z · comments (24)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

AISC 2024 - Project Summaries
NickyP (Nicky) · 2023-11-27T22:32:23.555Z · comments (3)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

Making Bad Decisions On Purpose
Screwtape · 2023-11-09T03:36:59.611Z · comments (8)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

“Why can’t you just turn it off?”
Roko · 2023-11-19T14:46:18.427Z · comments (25)

Experiments as a Third Alternative
Adam Zerner (adamzerner) · 2023-10-29T00:39:31.399Z · comments (21)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

On the lethality of biased human reward ratings
Eli Tyre (elityre) · 2023-11-17T18:59:02.303Z · comments (10)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

The Handbook of Rationality (2021, MIT press) is now open access
romeostevensit · 2023-10-10T00:30:05.589Z · comments (4)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

What is the next level of rationality?
lsusr · 2023-12-12T08:14:14.846Z · comments (24)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

Competitive, Cooperative, and Cohabitive
Screwtape · 2023-09-28T23:25:52.723Z · comments (12)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

[link] Spaced repetition for teaching two-year olds how to read (Interview)
Chipmonk · 2023-11-26T16:52:58.412Z · comments (9)

[link] Every Mention of EA in "Going Infinite"
KirstenH · 2023-10-07T14:42:32.217Z · comments (0)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

[link] Urging an International AI Treaty: An Open Letter
Olli Järviniemi (jarviniemi) · 2023-10-31T11:26:25.864Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on Chris_Leong's Shortform

I have no interest in honor if it's celebrated on a field of the dead. Virtue ethics is fine, as long as it's not an excuse to not figure out what needs doing and how it's going to get done.

Doing ones own part and trusting that the other parts are done by anonymous unknown others is a very silly coordination strategy. We need plans that amount to success, not just everyone doing whatever sounds nice to them.

christiankl on Why I’m not a Bayesian

the thing a logician would call a "logic" or possibly a "logic augmented with some probabilities"

The main point of the article is that once you add probabilities you can't do predicate calculus anymore. It's a mathematical operation that's not defined for the entities that you get when you do your augmentation.

seth-herd on When is reward ever the optimization target?

Interesting. There's certainly a lot going on in there, and some of it very likely is at least vague models of future word occurrences (and corresponding events). The definition of model-based gets pretty murky outside of classic RL, so it's probably best to just directly discuss what model properties give rise to what behavior, e.g. optimizing for reward.

Model-free systems can produce goal-directed behavior. The do this if they have seen some relevant behavior that achieves a given goal, and their input or some internal representation includes the current goal, and they can generalize well enough to apply what they've experienced to the current context. (This is by the neuroscience definition of habitual vs goal-directed: behavior changes to follow the current goal, usually hungry, thirsty or not).

So if they're strong enough generalizers, I think even a model-free system actually optimizes for reward.

I think the claim should be stronger: for a smart enough RL system, reward is the optimization target.

sherrinford on Open Thread Fall 2024

So you think that looking up "random idiots" helps me find "arguments related to or a more detailed discussion about this disagreement"?

ablue on Chris_Leong's Shortform

You want to help? Figure out what kind of incremental changes you can begin to introduce in any of them, in order to begin extinguishing the sort of problems you've now elevated to the rank of "saving-worthy" in your own head. Note that, in all likelihood, by extinguishing one you will merrily introduce a whole bunch of others - something you won't get to discover until much later one. Yet that is, realistically, what you can actually go on to accomplish.

I read this paragraph as saying ~the same thing as the original post in a different tone

jenniferrm on Monthly Roundup #23: October 2024

It is true that there are some favorable properties that many systems other than the best system has compared to FPTP.

I like methods that are cloneproof and which can't be spoofed by irrelevant alternatives, and if there is ONLY a choice between "something mediocre" and "something mediocre with one less negative feature" then I guess I'll be in favor of hill climbing since "some mysterious force" somehow prevents "us" from doing the best thing.

However, I think cloning and independence are "nice to haves" whereas the condorcet criterion is probably a "need to have"

((The biggest design fear I have is actually the "participation criterion". One of the very very few virtues of FPTP is that it at least satisfies the criterion where someone showing up and "wasting their vote on a third party" doesn't cause their least preferred candidate to jump ahead of a more preferred candidate. But something similar can happen in every method I know of that reliably selects the Condorcet Winner when one exists :-(

Mathematically, I've begun to worry that maybe I should try to prove that Condorcet and Participation simply cannot both be satisfied at the same time?

Pragmatically, I'm not sure what it looks like to "attack people's will to vote" (or troll sad people into voting in ways that harm their interests and have the sad people fight back righteously by insisting that they shouldn't vote, because voting really will net harm their interests).

One can hope that people will simply "want to vote" because it make civic sense, but it actually looks like a huge number of humans are biased to feel like a peasant, and to have a desire to be ruled? Or something? And maybe you can just make it "against the law to not vote" (like in Australia) but maybe that won't solve the problems that could hypothetically "sociologically arise" from losing the participation criterion in ways that might be hard to foresee.))

In general, I think people should advocate for the BEST thing. The BEST thing I currently know of for picking an elected civilian commander in chief is "Ranked Pairs tabulation over Preference Ballots (with a law that requires everyone to vote during the two day Voting Holiday)".

thane-ruthenis on LLMs can learn about themselves by introspection

What definition of introspection do you have in mind and how would you test for this?

"Prompts involving longer responses" seems like a good start. Basically, if the model could "reflect on itself" in some sense, this presumably implies the ability to access some sort of hierarchical self-model, i. e., make high-level predictions about its behavior, without actually engaging in that behavior. For example, if it has a "personality trait" of "dislikes violent movies", then its review of a slasher flick would presumably be negative – and it should be able to predict the sentiment of this review as negative in advance, without actually writing this review or running a detailed simulation of itself-writing-its-review.

The ability to engage in "self-simulation" already implies the above ability: if it has a model of itself detailed enough to instantiate it in its forward passes and then fetch its outputs, it'd presumably be even easier for it to just reason over that model without running a detailed simulation. (The same way, if you're asked to predict whether you'd like a movie from a genre you hate, you don't need to run an immersive mental simulation of watching the movie – you can just map the known self-fact "I dislike this genre" to "I would dislike this movie".)

richard_kennaway on Open Thread Fall 2024

Look up anti-natalism, and the Voluntary Human Extinction Movement. And random idiots everywhere saying "well maybe we all deserve to die", "the earth would be better off without us", "evolution made a huge mistake in inventing consciousness", etc.

seth-herd on the case for CoT unfaithfulness is overstated

My guess was that the primary reason OAI doesn't show the scratchpad/CoT is to prevent competitors from training on those CoTs and replicating much of o1s abilities without spending time and compute on the RL process itself.

But now that you mention it, their not wanting to show the whole CoT when it's not necessarily nice or aligned in itself. I guess it's like you wouldn't want someone reading your thoughts even if you intended to be mostly helpful to them.

raemon on OODA your OODA Loop

I like having more historical context here.

Part of this sounds like either I didn't successfully communicate something about my goals, or I haven't successfully understood something you meant to say.

I'm not super well versed in the original military use of OODA, but I definitely didn't mean OODA was a simple cycle. I hadn't explicitly dwelled on how each part feeds into the other parts but I do think I was doing that implicitly.

I'm also not particularly using in a "competitive" domain so those aspects of it don't seem particularly relevant.