LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

[link] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (13)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

How to use bright light to improve your life.
Nat Martin (nat-martin) · 2024-11-18T19:32:10.667Z · comments (10)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb · 2024-10-28T17:10:04.272Z · comments (3)

[link] College technical AI safety hackathon retrospective - Georgia Tech
yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · comments (2)

[link] Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout · 2024-11-19T18:36:20.721Z · comments (5)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (13)

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

[link] FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Tamay · 2024-11-14T06:13:22.042Z · comments (0)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (4)

[question] Are You More Real If You're Really Forgetful?
Thane Ruthenis · 2024-11-24T19:30:55.233Z · answers+comments (25)

What happens next?
Logan Zoellner (logan-zoellner) · 2024-12-29T01:41:33.685Z · comments (19)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (9)

Analysis of Global AI Governance Strategies
Sammy Martin (SDM) · 2024-12-04T10:45:25.311Z · comments (10)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

Litigate-for-Impact: Preparing Legal Action against an AGI Frontier Lab Leader
Sonia Joseph (redhat) · 2024-12-07T21:42:29.038Z · comments (7)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (39)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

Doing Research Part-Time is Great
casualphysicsenjoyer (hatta_afiq) · 2024-11-22T19:01:15.542Z · comments (7)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[link] Locally optimal psychology
Chipmonk · 2024-11-25T18:35:11.985Z · comments (7)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (37)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (8)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (11)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

thane-ruthenis on Shortform

We don't know yet. I expect so.

quetzal_rainbow on Shortform

No current AI system could generate a research paper that would receive anything but the lowest possible score from each reviewer

Is it true in case of o3?

simon on D&D.Sci Dungeonbuilding: the Dungeon Tournament

Looking like I'll not have figured this out before the time limit despite the extra time, what I have so far:

I'm modeling this as follows, but haven't fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct

the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room is)
the score is dependent only on the rooms they pass through (but again, am getting complications)
I'm assuming a simple addition of scores to start with, but then adding epicycles (which so far have been based on the previous room, generally)
there is some randomness in the individual score contributions from each encounter.

For the dungeon generation: dungeon generation seems to treat rooms 1-8 equally (room 9 is different and tends to have harder encounters). Encounters of the same types (and some related "themes") tend to be correlated. Scores in each tournament seem to be whole numbers from each judge and averaged between 3 or 4 judges; I am not sure if any tournaments are judged by 2 or 1, but if so they're relatively less common.

In theory, I'd like to plug in a preference model and a score model to a simulator and iterate to refine, but I'm not there yet, still working out plausible scores and preferences.

One possibility for the scores and preference order:

baseline average scores:

Nothing: 0; Goblins: 1.5 (1d2?); Whirling Blade Trap 3; Orcs 3; Hag 4; Boulder Trap 4.5; Clay Golem 6, Dragon 6?, Steel Golem 7.5

With Goblins and Orcs being increased (doubled?) if following goblins/orcs/any trap? (edit - or golems?)

Plus with the adventurers seemingly avoiding Orcs and Hags more than their difficulty warrants? (I found them to be relatively late in the preference order, then found that they were in practice lower in score, so am having to ad hoc adjust if I keep the assumption that the score contribution and prefrence order are related. 1.5 multiplier? 2x multiplier? fixed addition?) (I'm assuming a 1.5x multiplier atm since I initially had Hag avoided over anything but orcs, but found one dungeon that looks suspiciously like, but does not prove, Hag being chosen over Dragon) (I suppose +2 would also work)

Assuming the above is correct, and I'm pretty sure it isn't but hopefully has some relationship with reality, one strategy might be:

CHN/WON/BOD <--- my current answer

where the idea is to use the encounters the adventurers avoid too much relative to their actual score contributions (Hag, Orcs) to herd the adventurers away from the Nothing rooms. One of the Orcs is left in after a Boulder Trap in the belief that will make it score higher than the hag. WBT is left in the preferred path to lead the adventurers along, don't immediately see a way to avoid this.

EV if above model is correct: 6+3+4.5+6+6=25.5

How I've gotten here (mainly used Claude and Claude-written code, including the analysis tool which is good for prototyping if you don't mind javascript):

found initial basic encounter score contribution estimates from linear regression on whole dungeon
after determining that rooms 1-8 were interchangeable as far as dungeon generation is concerned, looked at room importance to score, guessed the basic model based on that iirc (might have been more complicated than this) (I do remember considering and rejecting a model where each room is selected one at a time from the full set of available rooms, and rejecting any "symmetrical" model based on working out the full path in advance)
initially assumed that adventurers preferred easier encounters based on the inital score estimates
refined preference order based on minimizing variance between same-predicted-sequence-of-encounters dungeons
tried to work out how scores actually work by filtering for specific predicted sequences of encounters and finding their scores
found epicycles from that and started refining model, including preference order adjustments
haven't really finished the above step, epicycles might be because model is wrong/incomplete?
hypothetical todo: apply model to entire dataset, also develop model for variations in score from each encounter, compare to known 3-judge and 4-judge tournaments for full Bayes assessment, refine further with this as feedback

post-posting this edit: I've now read other people's comments; I did not notice any 1-point jump in scores (didn't check for it), not sure if i would have noticed if it is a judging difference as opposed to a strategy change? (wouldn't notice if just strategy change). Also I did not notice anything special about Steel Golems at the entrance vs. other spots, did not check for any change in distribution of 3 vs 4 judge tournaments, etc.

dr_s on quila's Shortform

I feel like this is a bit incorrect. There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy without immediately exploding into an utopian (or dystopian) singularity. The question is whether we are liable to build such things before we build the exploding singularity kind, or if the latter is in some sense easier to build and thus stumble upon first. Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.

quwgri on I would like to try double crux.

We can leave theology. It is not so important. I am more concerned with the questions of finitism and infinitism in relation to paradox of sets.

Finitism is logically consistent. However, it seems to me that it suffers from the same problem as the ontological proof of the existence of God. It is an attempt to make a global prediction about the nature of the Universe based on a small thought experiment. Predictions like "Time cannot be infinite", "Space cannot be infinite" follow directly from finitism. It turns out that we make these predictions based on our mathematical problems with the paradox of sets. At the same time, the paradox of sets itself resembles the paradox "I'm telling a lie now". and, it seems, should look for a solution somewhere in the same area. If we think off the cuff, it seems to me naively that the very concept of "ordinary set" is composed in such a way as to lead to paradoxes. This is the problem of the concept of "ordinary set". This is not the problem of the existence/non-existence of physical infinity.

Oh, okay. I don't really understand this topic. But as far as I know, not all mathematicians are finitists. So it seems that the proofs of finitism are not flawless.

On the other hand, how is the problem of the set paradox solved in cosmological infinitism? Something like "The Infinite Universe may exist, but it is forbidden to talk about it as an object"? Because any attempt to do so will bring you back to the set paradox, if you take it seriously. "Talk about any particular part of the Universe as much as you like, but don't even think about the Universe as a whole"? This risks forming a somewhat patchwork model of the worldview. "It may exist, but you cannot think about it intelligently and rationally." One is reminded of Zeno's attempts to prove that one cannot think about motion without contradictions.

roland-pihlakas on Building AI safety benchmark environments on themes of universal human values

Thank you for your question!

I agree that the simulations need to have sufficient complexity. Indeed, that was one of main motivations I became interested in creating multi-objective benchmarks in the past. Various AI safety toy problems seemed to me so much simplified that they lacked essential objectives and other decisive nuances. This motivation is still very much one of my main driving motivations.

That being said, complexity has also downsides:
1) The complexity introduces confounding factors. When a model fails such a benchmark, it is not clear whether it was because it did not have required perceptual capabilities (so it is a capabilities problem), or it is using a model/framework that is unsuitable for alignment (so it is an alignment problem).
2) Running the simulations will be more time consuming and it would make the research elitist in the sense that various people would not be able to afford it.

My plan is to try to start with preference towards simple, but not simpler than necessary. And then gradually make it more complex. That means trying to use the gridworlds and introducing as many symbols as is needed to represent the important objectives, objects, other concepts and phenomena, and their interactions.

I believe symbolic approaches should not be entirely dismissed. As a illustrative metaphor, I am thinking of books - they contains symbols, yet we consider them as a cornerstone of our civilization. Similarly to the current dilemma with benchmarks, we may then worry whether books are too simple and symbol based - or perhaps one should prefer watching movies instead, since they represent reality in more detail. But would that claim be necessarily true? It does not seem so obvious after all.

In case more complexity is needed, there are currently at least five ideas:
1) Adding more feature layers to the gridworld. I did not mention it before, but the observation format already supports multiple concurrent observable layers on top of each other. One of the layers could be for example facial expressions, or any other observable and unobservable metrics relevant to objects they accompany.
2) Adding textual messages between agents as a side panel to the gridworlds.
3) Making the environment bigger, so there are more objects and more phenomena.
4) Making the environment bigger and making also the objects bigger so that they cover multiple cells in the grid. Thus the objects will become composite, consisting of sub-parts with their own dynamics.
5) Using some other framework, for example Sims.

Curious, how do these thoughts and considerations land with you?

lgs on Some arguments against a land value tax

The value extractable is rent on both the land and the improvement. LVT taxes only the former. E.g. if land can earn $10k/month after an improvement of $1mm, and if interest is 4.5%, and if that improvement is optimal, a 100% LVT is not $10k/mo but $10k/mo minus $1mm*0.045/12=$3,750. So 100% LVT would be merely $6,250.

If your improvement can't extract $6.3k from the land, preventing you from investing in that improvement is a feature, not a bug.

tailcalled on Capital Ownership Will Not Prevent Human Disempowerment

Very powerful AIs may very well be created in order to defend the current capitalist system.

Like the most plausible proposal for what distinguishes bounded tool AI vs dangerous AI is that dangerous AI does adversarial/minimax-like reasoning whereas bounded tool AI mostly just assumes the world will allow it to do whatever it tries, so it doesn't need to try very hard.

This means the main people who will be forced to create dangerous AI are the ones working in hardcore adversarial contexts, which will especially be the military and police (as well as their opponents, including rogue states and gangsters). But the military and police have as their primary goal to maintain the current system.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

I probably shouldnt have used the free energy terminology. Does complexity accuracy tradeoff work better ?

To be clear, I very much dont mean these things as a metaphor. I am thinking there may be an actual numerical complexity - accuracy that is some elaboration of Watanabe s "free energy" formula that actually describes these tendencies.

dagon on Some arguments against a land value tax

You can't sell the improvements if they're tied to land that is taxed higher than the improvements bring in (due to mistakes in improvement or changed environment that has increased the land value and the improvements haven't stayed optimal). The land is taxed at it's full theoretical value, less than the improvements bring in, and the improvements are literally connected to it.