LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (5)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (8)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

will_pearson on Will_Pearson's Shortform

Unearthing my old dissertation. Still think there is something to it

https://docs.google.com/document/d/1-lmOXSfUXYvbhlFcs04VAzl-mKB8ZJfR/edit?usp=drivesdk&ouid=113969196762487274190&rtpof=true&sd=true

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Epistemic status: at least somewhat rant-mode.

I find it pretty ironic that many in AI risk mitigation would make asks for if-then committments/RSPs from the top AI capabilities labs, but they won't make the same asks for AI safety orgs/funders. E.g.: if you're an AI safety funder, what kind of evidence ('if') will make you accelerate how much funding you deploy per year ('then')?

eternallyblissful on Self-administered EMDR without a therapist is very useful for a lot of things!

Thank you so much for taking the time to write this comment! It feels really good to read this! :))
You might want to check out my other mental health posts as well!

ignatzmouse on Thoughts after the Wolfram and Yudkowsky discussion

I agree with the frustration. Wolfram was being deliberately obtuse. Eliezer summarised it well toward the end, something like "I am telling you that the forest is on fire and you are telling me that we first need to define what do we mean by fire". I understand that we need definitions for things like "agency" or technology "wanting" something and even why do we mean by a "human" in the year 2070. But Wolfram went a bit too far. A naive genius that did not want to play along in the conversation. Smart teenagers talk like that.

Another issue with this conversation was that, even though they were listening to each other, Wolfram was too keen to go back to his current pet ideas. Eliezer's argument is (not sure) independent on whether we think the AIs will fall under computational "irreducibilteh", but he kept going back to this over and over.

I blame the ineffective exchange primarily on Wolfram in this case. Eliezer is also somewhat responsible for the useless rabbitholes in this conversation. He explains his ideas vividly and clearly. But there is something about his rhetoric style that does not persuade those who have not spent time engaging with his ideas beforehand, even someone as impressive as Wolfram. He also goes on too long on some detail or some contrived example rather than ensuring that the interlocutor in the same epistemological plane.

Anyway, fun thing to listen to

leon-lang on Leon Lang's Shortform

"Scaling breaks down", they say. By which they mean one of the following wildly different claims with wildly different implications:

When you train on a normal dataset, with more compute/data/parameters, subtract the irreducible entropy [LW(p) · GW(p)] from the loss, and then plot in a log-log plot: you don't see a straight line anymore.
Same setting as before, but you see a straight line; it's just that downstream performance doesn't improve .
Same setting as before, and downstream performance improves, but: it improves so slowly that the economics is not in favor of further scaling this type of setup instead of doing something else.
A combination of one of the last three items and "btw., we used synthetic data and/or other more high-quality data, still didn't help".
Nothing in the realm of "pretrained models" and "reasoning models like o1" and "agentic models like Claude with computer use" profits from a scale-up in a reasonable sense.
Nothing which can be scaled up in the next 2-3 years, when training clusters are mostly locked in, will demonstrate a big enough success to motivate the next scale of clusters costing around $100 billion [LW(p) · GW(p)].

Be precise. See also.

jonas-hallgren on Jonas Hallgren's Shortform

Okay, so I don't have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who's at least spent 10 deep work hours trying to understand Wolfram's persepective of the world.

With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.

From the perspective of a phycisist morality could be seen as an emergent property of physical laws.

Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it's own speed. It's like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that's a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.

Another way to say it is that there's only modelling and modelled. An intuition from this frame is that there's only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.

IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn't engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.

Now, I personally believe there's something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I'm personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.

viliam on The Humanitarian Economy

Sometimes it feels like the society is a big computer program, and it doesn't matter if you have the general idea right, as long as there is a syntax error in line 1013, the program is not going too work. (Running a company seems to be the same thing, on a much smaller scale.) Some errors can be fixed by adding a missing semicolon. Sometimes merely fixing an error in one place introduces a related error in a different place, so many places need to be changed in sync.

On top of that, it is a living system. People try to find new exploits all the time. Plus there is a cultural momentum, so that things that work okay in one country will completely fail in a different one; or the things that worked okay a few decades ago no longer work now. The simple model is that people follow the incentives, but in addition to the formal incentives, you have informal ones, such as the opinion of your neighbors. (Sometimes the fear of being rejected by your neighbors is stronger than the fear of legal consequences. And depending on your neighbors, sometimes they push you towards obeying the law, and sometimes they push you towards breaking it.) Now consider that half of the population has IQ 100 or less, some people are psychopaths or drug addicts, so even in the hypothetically optimal system, you will still get people who hurt themselves or others for no good reason, just because the idea occurred to them at the moment.

Also, unlike the situation with programming, there is no clear distinction between the programmer and the system that is programmed. Your attempts to change the system, even for the better, will be actively rejected by those who profit from the way the things currently are, plus everyone who falls for their propaganda. Also, all idealists who have a different vision. Even if you are a dictator, your situation is actually not much better (from the perspective of social engineering), because now you have to keep your army and foreign allies happy, and prevent the population from rebelling against you, which may dramatically limit your options.

...in summary, sometimes it feels to me like magic that things work at all, considering the number of reasons why they should not. I guess it's because there are also millions of people who try to improve things, mostly locally, and they push back against the forces of entropy. But they are often uncoordinated individuals; and also, as individuals, sometimes they die, or burn out, or start a family and no longer have time for their previous activities; and in such cases, sometimes there is a replacement for them, and sometimes there is not and then the local good things fall apart again.

The reason I am writing this is that I don't want to discourage you, but really the devil is in the details.

One typical problem when trying to design a society is: "who will guard the guards themselves?" Like, if you propose an "army of inspectors" to check the business, the obvious next question is who will check this army of inspectors. If you don't have a good answer, sooner or later the inspectors will naturally start doing things for their own benefit, rather than to make the system work as intended. Two typical ways are taking bribes, and trying to make their own work as easy as possible. Taking bribes may motivate them to lobby for making the regulations as strict as possible; seemingly for the benefit of the customers (it will be easy to get a popular support for such proposal), but in fact to give more opportunities to take bribes. (From their perspective, the perfect outcome is when the regulation is so difficult that it is virtually impossible to comply with, or at least so difficult that it would be impossible to make a profit while complying with it, so everyone need to pay a bribe to get approved.) Optimizing for less work means that whenever the business owner proposes a small change, the answer is an automatic no; no one has an incentive to actually think about the proposal. To address this, you would need a second army of meta-inspectors who would check the inspectors, but then the problem might reappear at another level.

And this is not just empty speculation, you can see it at many places. (For example, you need police to reduce crime, but now USA has a problem with criminal policemen protected by the police unions.) I grew up on socialist Czechoslovakia, which in theory was supposed to be a paradise for workers and peasants, governed by wise and benevolent people in the Party. (We typically called it "the Party", because there was only one.) In theory, it was a perfect opportunity to make everything work great. In practice, that didn't happen. Not only was the entire economy mismanaged (the proverbial shortages of toilet paper), but practically all aspects of life were dysfunctional somehow.

The housing situation... well, you applied for a waiting list, waited for a decade or more, and then you were assigned a place to live (you couldn't choose the part of the city; you were happy that you were allowed to stay in the same city because sometimes even that wasn't guaranteed). During that decade or two, you had to stay with your parents, or on your friend's couch; I think there was not an opportunity to rent. (Technically, you could stay in a hotel all the time, but most people didn't have enough money for that.) If you were lucky, there was a job opportunity offering temporary free housing for their employees. So, even if money technically wasn't the problem, the housing still was.

Food... was cheap (heavily subsidized) and available, but only the basic forms. If you walk in a supermarket today, imagine that you would have to choose a subset of maybe 15% of the stuff that is there, and that will be all that is ever available, in the entire country (except for a few super expensive luxury shops). Forget about things like "yogurt with fruit flavor" or "low-fat yogurt". Be happy to buy the yogurt if they have one in the shop; there is only one kind, so it's easy to choose. One kind of bread, two kinds of milk, etc. All restaurants in the country cook the same set of meals, based on the government-approved book of recipes, and the inspectors check that they never deviate from a recipe, even if the customers would really prefer something different. But, yeah, unlike in Soviet Union, at least nobody was starving.

Before you object to comparison with socialism, my point is that this (as far as I know) didn't happen on purpose. The ruling party might have had its ideological objections against the ways markets work, but they had no reason to prevent the workers from getting housing soon or eating tasty meals. Actually, considering that most workers mostly care about their houses and food and beer, improving the housing and meals would increase the stability of the regime. And yet. The lesson is that things can easily go wrong even with good intentions, if you regulate a bit too much.

myyycroft on Orthogonal's Formal-Goal Alignment theory of change

I endorse alignment proposals which aim to be formally grounded; however, I'd like to know some concrete ideas on how to handle the common hard subproblems.

In the beginning of the post, you say that you want to 1) build a formal goal which leads to good worlds when pursued and 2) design an AI which pursues this goal.

It seems to me that 1) includes some form of value learning (since we speak about good worlds). Can you give a high-level overview on how concretely you plan to deal with complexity and fragility of value?
Now suppose 1) is solved. Can you give a high-level overview on how do you plan to design the AI? In particular, how to make it aimable [LW · GW]?

cubefox on Dalcy's Shortform

This approach goes back to Hans Reichenbach's book The Direction of Time. I think the problem is that the set of independencies alone is not sufficient to determine a causal and temporal order. For example, the same independencies between three variables could be interpreted as the chains and $A \leftarrow B \leftarrow C$ . I think Pearl talks about this issue in the last chapter.

taras-kutsyk on Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?

Thanks for the insight! I expect the same to hold though for Gemma 2B base (pre-trained) vs Gemma 2B Instruct models? Gemma-2b-Python-codes is just a full finetune on top of the Instruct model (probably produced without a large number of update steps), and previous work that studied Instruct models [LW · GW] indicated that SAEs don't transfer to the Instruct Gemma 2B either.