LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Crafting Polysemantic Transformer Benchmarks with Known Circuits
Evan Anders (evan-anders) · 2024-08-23T22:03:15.288Z · comments (0)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

how to rapidly assimilate new information
dhruvmethi · 2024-10-24T02:18:00.648Z · comments (3)

Derivative AT a discontinuity
Alok Singh (OldManNick) · 2024-10-24T02:48:24.573Z · comments (3)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (3)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

[question] What are some of the proposals for solving the control problem?
Dakara (chess-ice) · 2024-08-14T23:04:44.863Z · answers+comments (0)

Simultaneous Footbass and Footdrums II
jefftk (jkaufman) · 2024-08-11T23:50:01.982Z · comments (0)

[link] Apply to Aether - Independent LLM Agent Safety Research Group
RohanS · 2024-08-21T09:47:11.493Z · comments (0)

A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)

[question] What do you expect AI capabilities may look like in 2028?
nonzerosum · 2024-08-23T16:59:53.007Z · answers+comments (5)

[question] Are UV-C Air purifiers so useful?
JohnBuridan · 2024-09-04T14:16:01.310Z · answers+comments (0)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (9)

Thinking About a Pedalboard
jefftk (jkaufman) · 2024-10-08T11:50:02.054Z · comments (2)

[question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal
doomyeser · 2024-09-11T18:07:19.385Z · answers+comments (9)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

[link] Molecular dynamics data will be essential for the next generation of ML protein models
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-26T14:50:23.790Z · comments (0)

Open letter to young EAs
Leif Wenar · 2024-10-11T19:49:10.818Z · comments (10)

[link] Why do firms choose to be inefficient?
Nicholas D. (nicholas-d) · 2024-08-28T18:39:41.664Z · comments (4)

Will AI and Humanity Go to War?
Simon Goldstein (simon-goldstein) · 2024-10-01T06:35:22.374Z · comments (4)

Rationalist Gnosticism
tailcalled · 2024-10-10T09:06:34.149Z · comments (10)

Electric Mandola
jefftk (jkaufman) · 2024-09-21T13:40:04.772Z · comments (0)

The Other Existential Crisis
James Stephen Brown (james-brown) · 2024-09-21T01:16:38.011Z · comments (24)

[link] Testing Genetic Engineering Detection with Spike-Ins
jefftk (jkaufman) · 2024-10-22T17:20:54.947Z · comments (0)

AGI's Opposing Force
SimonBaars (simonbaars) · 2024-08-16T04:18:06.900Z · comments (2)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

Thinking About Propensity Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:23:55.091Z · comments (0)

[link] Michael Streamlines on Buddhism
Chris_Leong · 2024-08-09T04:44:52.126Z · comments (0)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (24)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

The Geometric Importance of Side Payments
StrivingForLegibility · 2024-08-07T01:38:04.635Z · comments (4)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double · 2024-09-05T00:35:39.504Z · answers+comments (9)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (7)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tsvibt on Scissors Statements for President?

IDK, but I'll note that IME, calling for empathy for "the other side" (in either direction) is received with incuriosity / indifference at best, often hostility.

One thing that stuck with me is one of those true crime Youtube videos, where at some stage of the interrogation, the investigator stops being nice, and instead will immediately and harshly contradict anything that the suspect Bob is saying to paint a story where he's innocent. The commentator claimed that the reason the investigator does this is to avoid giving Bob confidence: if Bob's statements hung in the air unchallenged, Bob might think he's successfully creating a narrative and getting that narrative bought. Even if the investigator is not in danger of being fooled (e.g. because she already has video evidence contradicting some of Bob's statements), Bob might get more confident and spend more time lying instead of just confessing.

A conjecture is that for Susan, empathizing with Robert seems like giving room for him to gain more political steam; and the deeper the empathy, the more room you're giving Robert.

annasalamon on Scissors Statements for President?

If we can get good enough models of however the scissors-statements actually work, we might be able to help more people be more in touch with the common humanity of both halves of the country, and more able to heal blind spots.

E.g., if the above model is right, maybe we could tell at least some people "try exploring the hypothesis that Y-voters are not so much in favor of Y, as against X -- and that you're right about the problems with Y, but they might be able to see something that you and almost everyone you talk to is systematically blinded to about X."

We can build a useful genre-savviness about common/destructive meme patterns and how to counter them, maybe. LessWrong is sort of well-positioned to be a leader there: we have analytic strength, and aren't too politically mindkilled.

moisentinel on Going Beyond "immaturity"

Do you have any tips on improving my writing, or my worldview?

nunosempere on Why I’m not a Bayesian

Maybe you could address these problems, but could you do so in a way that is "computationally cheap"? E.g., for forecasting on something like extinction, it is much easier to forecast on a vague outcome than to precisely define it.

martinkunev on Correspondence visualizations for different interpretations of "probability"

frequentist correspondence is the only type that has any hope of being truly objective

I'd counter this.

If I have enough information about an event and enough computation power, I get only objectively true and false statements. There are limits to my knowledge of the laws of the universe, the event in question (e.g. due to measurement limits) and limits to my computational power. The situation is further complicated by being embedded in the universe and epistemic concerns (e.g. do I trust my eyes and cognition?).

The need for a concept "probability" comes from all these limits. There is nothing objective about it.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

That's why I said: "In expectation", "win or lose"

That the coinflip came out one way rather than another doesnt prove the guy had actual inside knowledge. He bought a large part of the shares at crazy odds because his market impact moved the price so much.

But yes, he could be a sharp in sheeps clothings. I doubt it but who knows.

Point is that the winners contribute epistemics and the losers contribute money. The real winner is society [if the questions are about socially-relevant topics].

nunosempere on Survival without dignity

I have a writeup on solar storm risk here [LW · GW] that could be of interest

charlie-steiner on How to put California and Texas on the campaign trail!

So a proportional vote interstate compact? :)

I like it - I think one could specify an automatic method for striking a fair bargain between states (and only include states that use that method in the bargain). Then you could have states join the compact asynchronously.

E.g. if the goal is to have the pre-campaign expected electors be the same, and Texas went 18/40 Biden in 2020 while California went 20/54 Trump in 2020, maybe in 2024 Texas assigns all its electors proportionally, while California assigns 49 electors proportionally and the remaining 5 by majority. That would cause the numbers to work out the same (plus or minus a rounding error).

Suppose Connecticut also wants to join the compact, but it's also a blue state. I think the obvious thing to do is to distribute the expected minority electors proportional to total elector count - if Connecticut has 7 electors, it's responsible for balancing 7/61 of the 18 minority electors that are being traded, or just about exactly 2 of them.

But the rounding is sometimes awkward - if we lived in a universe where Connecticut had 9 electors instead, it would be responsible for just about exactly 2.5 minority electors, which is super awkward especially if a lot of small states join and start accumulating rounding errors.

What you could do instead is specify a loss function: you take the variance of the proportion of electors assigned proportionally among the states that are on the 'majority' side of the deal, multiply that by a constant (probably something small like 0.05, but obviously you do some simulations and pick something more informed), add the squared rounding error of expected minority electors, and that's your measure for how imperfect the assignment of proportional electors to states is. Then you just pick the assignment that's least imperfect.

Add in some automated escape hatches in case of change of major parties, change of voting system, or being superseded by a more ambitious interstate compact, and bada bing.

q-home on Stable Pointers to Value II: Environmental Goals

I don't understand Model-Utility Learning [? · GW] (MUL) section, what pathological behavior does AI do?

Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.

So it's like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that "playing piano" means "playing piano in a green room" or "playing piano in a room which would be chosen for training me in the past"?

Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.

But "sensory data being a certain way" is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn't guarantee to solve misgeneralization in any way?

If the answer to my questions is "yes", what did we even hope for with MUL?

benito on Are Your Enemies Innately Evil?

Do you think this is typical of people you know?