LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Gratitudes: Rational Thanks Giving
Seth Herd · 2024-11-29T03:09:47.410Z · comments (2)

[question] Why are there no interesting (1D, 2-state) quantum cellular automata?
Optimization Process · 2024-11-26T00:11:37.833Z · answers+comments (13)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

Acknowledging Background Information with P(Q|I)
JenniferRM · 2024-12-24T18:50:25.323Z · comments (8)

Disagreement on AGI Suggests It’s Near
tangerine · 2025-01-07T20:42:43.456Z · comments (15)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)

NYC Congestion Pricing: Early Days
Zvi · 2025-01-14T14:00:07.445Z · comments (0)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (8)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L (LRudL) · 2024-10-28T21:02:51.215Z · comments (0)

Aligning AI Safety Projects with a Republican Administration
Deric Cheng (deric-cheng) · 2024-11-21T22:12:27.502Z · comments (1)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

[link] Five Recent AI Tutoring Studies
Arjun Panickssery (arjun-panickssery) · 2025-01-19T03:53:47.714Z · comments (0)

[question] Which things were you surprised to learn are metaphors?
Gordon Seidoh Worley (gworley) · 2024-11-22T03:46:02.845Z · answers+comments (18)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Is AI Alignment Enough?
Aram Panasenco (panasenco) · 2025-01-10T18:57:48.409Z · comments (6)

Mini Go: Gateway Game
jefftk (jkaufman) · 2025-01-14T03:30:02.020Z · comments (1)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

First Solo Bus Ride
jefftk (jkaufman) · 2024-12-03T12:20:02.344Z · comments (1)

Two flavors of computational functionalism
EuanMcLean (euanmclean) · 2024-11-25T10:47:04.584Z · comments (9)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vladimir_nesov on quetzal_rainbow's Shortform

There is a difference in external behavior only if you need to communicate knowledge about the environment and the other players explicitly. If this knowledge is already part of an agent (or rock), there is no behavior of learning it, and so no explicit dependence on its observation. Yet still there is a difference in how one should interact with such decision-making algorithms.

I think this describes minds/models better (there are things they've learned long ago in obscure ways and now just know) than learning that establishes explicit dependence of actions on observed knowledge in behavior (which is more like in-context learning).

ouguoc on meemi's Shortform

This is extremely informative, especially the bit about the holdout set. I think it'd reassure a lot of people about the FrontierMath's validity to know more here. Have you used it to assess GPT-4o or o1? If so, how, and what were the results?

bodry on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

On the spectrum from stochastic parrot to general reasoner I'm at 70%. We're definitely closer to a general reasoner than a parrot.

I don't have a clear answer as to what I expect the outcome to be. I was a physics major and I wish there were less discrete jumps in physics. Special relativity and general relativity seem like giant jumps in terms of their difficulty to derive and there aren't any intermediate theories. When this paper comes out we'll probably be saying something the AI is 50% between being being able to derive this law and a conceptually harder one.

With respect to something analogous to Newtonian Mechanics I think that it does heavily depend on what kind of information can be observed. If a model can directly observe the equivalents of forces and acceleration than I believe most current models could derive it. If a model can only observe the equivalent of distances between objects and the equivalents of corresponding times and has to derive a second-order relationship from that I suspect that only o3 could do that. In six months, I believe that all frontier models will be able to do that.

Given that Terrence Tao described o1 as a mediocre graduate student it probably won't be long until frontier models are actually contributing to research and that will be the most valuable feedback. I say all this with a lot of uncertainty and if I'm wrong this project will prove that. Likewise there's going to be a long period of time where some people will insist that AI can do legitimate automated R&D and others who insist that it can't. At that point this will be a useful test to argue one way or another.

It would be interesting to vary the amount of information an AI is given until can derive the whole set of equations. For example, see if it can solve for the Maxwell equation given the other 3 and the ability to perform experiments or can it solve for the dynamic version of the equations given only the static ones and the ability to perform experiments.

matthew-barnett on meemi's Shortform

I can't make any confident claims or promises in my personal capacity right now, but my best guess is that we will make sure this new benchmark stays entirely private and under Epoch's control, to the extent this is feasible for us. However, I want to emphasize that by saying this, I'm not making a public commitment on behalf of Epoch.

daniel-kokotajlo on meemi's Shortform

Well, I'd sure like to know whether you are planning to give the dataset to OpenAI or any other frontier companies! It might influence my opinion of whether this work is net positive or net negative.

daniel-kokotajlo on MIRI 2024 Communications Strategy

Here's how I'd deal with those examples:

Theory X: Jesus will come again: Presumably this theory assigns some probability mass >0 to observing Jesus tomorrow, whereas theory Y assigns ~0. If jesus is not observed tomorrow, that's a small amount of evidence for theory Y and a small amount of evidence against theory X. So you can say that theory X has been partially falsified. Repeat this enough times, and then you can say theory X has been fully falsified, or close enough. (Your credence in theory X will never drop to 0 probably, but that's fine, that's also true of all sorts of physical theories in good standing e.g. all the major theories of cosmology and cognitive science, which allow for tiny probabilities of arbitrary sequences of experiences happening in e.g. Boltzmann Brains)

With the sky color example:
My way of thinking about falsifiability is, we say two theories are falsifiable relative to each other if there is evidence we expect to encounter that will distinguish them / cause us to shift our relative credence in them.

In the case of Theory Z, there is an implicit theory Z2 which is "NOT blah blah, and therefore the sky could be green or not green." (Presumably that's what you are holding in the back of your mind as the alternative to Z, when you imagine updating for or against Z on the basis of seeing blue sky, and decide that you wouldn't?) Because the theory Z3 "NOT blah blah and therefore the sky is blue" would be confirmed by seeing a blue sky, and if somehow you were splitting your credence between Z and Z3, then you would decrease your credence in Z if you saw a blue sky.

lesswronguser123 on Don’t ignore bad vibes you get from people

Great post, I was inner simulating [LW · GW] you posting something about bad vibes from people given your multiagent model of the mind and baggage which comes with it, and here it is.

quetzal_rainbow on quetzal_rainbow's Shortform

More technical definition of "fairness" here is that environment doesn't distinguish between algorithms with same policies, i.e. mappings <prior, observation_history> -> action? I think it captures difference between CooperateBot and FairBot.

As I understand, "fairness" was invented as responce to statement that it's rational to two-box and Omega just rewards irrationality.

jacob-g-w on Building intuition with spaced repetition systems

So far, I have trouble because it lacks some form of spacial structure, and the algorithm feels too random to build meaningful connections btw different cards

Hmm, I think that after doing a lot of anki, my brain kind of formed it's own spatial structure [LW · GW], but I don't think this happens to everyone.

I just use basic card type for math with some latex. Here are some examples:

I find that doing fancy card types is kind of like premature optimization. Doing the reviews is the most important part. On the other hand, it's really important that the cards themselves are written well. This essay contained my most refined views on card creation. Some other nice ones are the 20 rules of knowledge formulation and How to write good prompts: using spaced repetition to create understanding. Hope this answer helped!

vladimir_nesov on quetzal_rainbow's Shortform

What distinguishes a cooperate-rock from an agent that cooperates in coordination with others is the decision-making algorithm. Facts about this algorithm also govern the way outcome can be known in advance or explained in hindsight, how for a cooperate-rock it's always "cooperate", while for a coordinated agent it depends on how others reason, on their decision-making algorithms.

So in the same way that Newcomblike problems are the norm [LW · GW], so is the "unfair" interaction with decision-making algorithms. I think it's just a very technical assumption that doesn't make sense conceptually and shouldn't be framed as "unfairness".