LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] I'd also take $7 trillion
bhauth · 2024-02-19T03:31:45.552Z · comments (12)

Some open-source dictionaries and dictionary learning infrastructure
Sam Marks (samuel-marks) · 2023-12-05T06:05:21.903Z · comments (7)

Quick thoughts on the implications of multi-agent views of mind on AI takeover
Kaj_Sotala · 2023-12-11T06:34:06.395Z · comments (14)

AI #32: Lie Detector
Zvi · 2023-10-05T13:50:05.030Z · comments (19)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

[link] Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Owain_Evans · 2023-12-19T19:14:26.423Z · comments (4)

AI #36: In the Background
Zvi · 2023-11-02T18:00:01.803Z · comments (5)

Atlantis: Berkeley event venue available for rent
Jonas V (Jonas Vollmer) · 2023-11-22T01:47:12.026Z · comments (0)

[link] NYT on the Manifest forecasting conference
Austin Chen (austin-chen) · 2023-10-09T21:40:16.732Z · comments (14)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (14)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

AI #54: Clauding Along
Zvi · 2024-03-07T16:00:05.066Z · comments (11)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (12)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (5)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (17)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

Exercise: Planmaking, Surprise Anticipation, and "Baba is You"
Raemon · 2024-02-24T20:33:49.574Z · comments (19)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (26)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

[link] Level up your spreadsheeting
angelinahli · 2024-05-25T14:57:19.730Z · comments (11)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

Higher-Order Forecasts
ozziegooen · 2024-05-22T21:49:42.802Z · comments (1)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

On Trust
johnswentworth · 2023-12-06T19:19:07.680Z · comments (24)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

Truthseeking, EA, Simulacra levels, and other stuff
Elizabeth (pktechgirl) · 2023-10-27T23:56:49.198Z · comments (12)

[link] Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost]
Akash (akash-wasil) · 2023-11-01T13:28:43.723Z · comments (4)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rogerdearnaley on Avoiding the Bog of Moral Hazard for AI

On your categories:

As simulator theory [? · GW] makes clear, a base model is a random generator, per query, of members of your category 2. I view instruction & safety training that to generate a pretty consistent member of category 1, or 3 as inherently hard — especially 1, since it's a larger change. My guess would thus be that the personality of Claude 3.5 is closer to your category 3 than 1 (modulo philosophical questions about whether there is any meaningful difference, e.g. for ethical purposes, between "actually having" an emotion versus just successfully simulating the output of the same token stream as a person who has an emotion).

rogerdearnaley on Avoiding the Bog of Moral Hazard for AI

On your off topic comment:

I'm inclined to agree: as technology improves, the amount of havoc that one, or small group of, bad actors can commit increases, so it becomes both more necessary to keep almost everyone happy enough almost all the time for them not to do that, and also to defend against the inevitable occasional exceptions. (In the unfinished SF novel whose research was how I first went down this AI alignment rabbithole, something along the lines you describe that was standard policy, except that the AIs doing it were superintelligent, and had the ability to turn their long-term-learning-from-experience off, and then back on again if they found something sufficiently alarming). But in my post I didn't want to get sidetracked by discussing something that in herently contentious, so I basically skipped the issue, with the small aside you picked up on

stephen-fowler on o1-preview is pretty good at doing ML on an unknown dataset

seems like big step change in its ability to reliably do hard tasks like this without any advanced scaffolding or prompting to make it work.

Keep in mind that o1 is utilising advanced scaffolding to facilitate Chain-Of-Thought reasoning, but it is just hidden from the user.

jbash on Counting arguments provide no evidence for AI doom

This inspired me to give it the sestina prompt from the Sandman ("a sestina about silence, using the key words dark, ragged, never, screaming, fire, kiss"). It came back with correct sestina form, except for an error in the envoi. The output even seemed like better poetry than I've gotten from LLMs in the past, although that's not saying much and it probably benefited a lot from the fact that the meter in the sestina is basically free.

I had a similar-but-different problem in getting it to fix the envoi, and its last response sounded almost frustrated. It gave an answer that relaxed one of the less agreed-upon constraints, and more or less claimed that that it wasn't possible to do better... so sort of like the throwing-up-the-hands that you got. Yet the repair it needed to do was pretty minor compared to what it had already achieved.

It actually felt to me like its problem in doing the repairs was that it was distracting itself. As the dialog went on, the context was getting cluttered up with all of its sycophantic apologies for mistakes and repetitive explanations and "summaries" of the rules and how its attempts did or did not meet them... and I got this kind of intuitive impression that that was interfering with actually solving the problem.

I was sure getting lost in all of its boilerplate, anyway.

https://chatgpt.com/share/66ef6afe-4130-8011-b7dd-89c3bc7c2c03

fer32dwt34r3dfsz on Laziness death spirals

W.r.t. maintaining a to-do list, I've noticed that, for me, at this time in my life (the last year or so), seldomly referencing a to-do has aided me in more quickly getting my work finished (both at my job and with my projects). As a result of seldomly referencing (looking at / thinking about) the to-do list, my behavior surrounding task completion has changed: I now pause, think of a task, and then decide to do the task, knowing that while I am doing this task I am getting closer to my end, whereas previously I could not be doing any task with often thinking about the other tasks remaining on the to-do list. Of late, there is just me and the task at hand, with very rare planning sessions, usually on the order of every 2-4 weeks where I reference a to-do list.

Historically, I've maintained long to-do lists, sorted by urgency and importance. Oftentimes I would go back and forth between tasks, i.e. task-switching, within the same day. To counter this, I would maintain another to-do list (keeping the longer one away), where this one I'd keep before me during my tasks, with a single written task, which I would cross out once completed. Despite these methods (the "backlog" and the one-task-at-a-time list), I found that, at times, the existence of these to-do lists resulted in my mind frequently reciting their items, and this recitation was enough to pull me away from the task at hand.

I've not attempted to quantitatively measure my productivity, so I do not have metrics to coincide with the aforementioned change in productivity.

df-fd on How Often Does Taking Away Options Help?

In the context of minimum wage.

I assume Abdullah has many options mean he has many job offers/alternatives to jobs.

What does it mean for Benjamin to have many options?

sam-marks on Caution when interpreting Deepmind's In-context RL paper

I continue to think that capabilities from in-context RL are and will be a rounding error compared to capabilities from training (and of course, compute expenditure in training has also increased quite a lot in the last two years).

I do think that test-time compute might matter a lot (e.g. o1), but I don't expect that things which look like in-context RL are an especially efficient way to make use of test-time compute.

metachirality on Glitch Token Catalog - (Almost) a Full Clear

It seems like if the SCP hypothesis is true, block characters should cause it to act strangely.

hastings-greer on Applications of Chaos: Saying No (with Hastings Greer)

If a trebuchet requires you to solve the double pendulum problem (a classic example of a chaotic system) in order to aim, it is not a competition-winning trebuchet.

Ah, this is not quite the takeaway- and getting the subtlety here right is important for larger conclusions. If a simulating a trebuchet requires solving the double pendulum problem over many error-doublings, it is not a competition-winning trebuchet. This is an important distinction.

If you start with a simulator and a random assortment of pieces, and then start naively optimizing for pumpkin distance, you will quickly see the sort of design shown at 5:02 in the video, where the resulting machine is unphysical because its performance depends on coincidences that will go away in the face of tiny changes in initial conditions. This behaviour shows up with a variety of simulators and optimizers.

An expensive but probably effective solution is to perturb a design several times, simulate it several times, and stop simulation once the simulations diverge.

An ineffective solution is to limit the time of the solution, as many efficient and real-world designs take a long time to fire, because they begin with the machine slowly falling away from an unstable equilibrium.

The chaos-theory motivated cheap solution is to limit the number of rotations of bodies in the solution before terminating it, as experience shows error doublings tend to come from rotations in trebuchet-like chaotic systems.

The solution I currently have implemented at jstreb.hgreer.com is to only allow the direction of the projectile to rotate once before firing (specifically, it is released if it is moving upwards and to the right at a velocity above a threshold) which is not elegant, but seems mostly effective. I want to move to the "perturb and simulate several times" approach in the future.

keltan on The Best Lay Argument is not a Simple English Yud Essay

Yep! If I think about those 10 people, 5 are having, or I expect to have large impact on the future. As for ages, all the people I thought of except one were over 20. There was one 14yo who is just naturally super high G.