LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

The Pragmatic Side of Cryptographically Boxing AI
Bart Jaworski (bart-jaworski) · 2024-08-06T17:46:21.754Z · comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

Modelling Social Exchange: A Systematised Method to Judge Friendship Quality
Wynn Walker · 2024-08-04T18:49:30.892Z · comments (0)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom
Ghdz (gal-hadad) · 2024-08-05T18:27:20.709Z · comments (2)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

[link] Labelling, Variables, and In-Context Learning in Llama2
Joshua Penman (joshua-penman) · 2024-08-03T19:36:34.721Z · comments (0)

[question] Request for AI risk quotes, especially around speed, large impacts and black boxes
Nathan Young · 2024-08-02T17:49:48.898Z · answers+comments (0)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Grounding self-reference paradoxes in reality
Fiora from Rosebloom · 2024-09-29T05:50:30.559Z · comments (3)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christiankl on Three Notions of "Power"

If we take the issue of forced prostitution and the official numbers are estimates and by their nature estimates are not exact.

https://www.spiegel.de/international/germany/human-trafficking-persists-despite-legality-of-prostitution-in-germany-a-902533.html would be a journalistic story about prostitution in Germany that describes what happens here with legalized prostitution.

I was once talking with someone who in the past was thinking about opening a brothel and who had some insight about how brothels are run in Germany and who said that a lot of coercion is used.

Recently, I read something from a policeman who was complaining about how the standard of proving coercion for prostitutes is too high. Proving that a prostitute who's over 21 who left was beaten was not enough in court to convince the court that she falls under the criteria of outlawed exploitation of prostitutions.

simon-lermen on The Compendium, A full argument about extinction risk from AGI

Here is a way in which it doesn't generalize in observed behavior:

Alignment does not transfer well from chat models to agents

TLDR: There are three new papers which all show the same finding, i.e. the safety guardrails from chat models don’t transfer well from chat models to the agents built from them. In other words, models won’t tell you how to do something harmful, but they will do it if given the tools. Attack methods like jailbreaks or refusal-vector ablation do transfer.

Here are the three papers, I am the author of one of them:

https://arxiv.org/abs/2410.09024

https://static.scale.com/uploads/6691558a94899f2f65a87a75/browser_art_draft_preview.pdf

https://arxiv.org/abs/2410.10871

I thought of making a post here about this if it is interesting

rob-lucas on avturchin's Shortform

When I was trekking in Qinghai my guide suggested we do a hike around a lake on our last day on the way back to town. It was just a nice easy walk around the lake. But there were tibetan nomads (nomadic yak herders, he just referred to them as nomads) living on the shore of the lake, and each family had a lot of dogs (Tibetan Mastiffs as well as a smaller local dog they call "three eyed dogs"). Each time we got near their territory the pack would come out very aggressively.

He showed me how to first always have some stones ready, and second when they approached to throw a stone over their head when they got too close. "Don't hit the dogs" he told me, "the owners wouldn't be happy if you hit them, and throwing a stone over their heads will warn them off".

When they came he said, "You watch those three, I need to keep an eye on the ones that will sneak up behind us." Each time the dogs used the same strategy. There'd be a few that were really loud and ran up to us aggressively. Then there'd be a couple sneaking up from the opposite side, behind us. It was my job to watch for them and throw a couple of stones in their direction if they got too close.

He also made sure to warn me, "If one of them does get to you, protect your throat. If you have to give it a forearm to bite down on instead of letting it get your throat." He had previously shown me the large scar on his arm where he'd used that strategy in the past. When I looked at him sort of shocked he said, "don't worry, it probably won't come to that." At this point I was wondering if maybe we should skip the lake walk, but I did go there for an adventure. Luckily the stone throwing worked, and we were walking on a road with plenty of stones, so it never really got too dangerous.

Anyway, +1 to your advice, but also look out for the dogs that are coming up behind you, not just the loud ones that are barking like mad as a distraction.

alexej-gerstmaier-1 on The Case For Bullying

Hi Justin, I already read both the posts you linked there.

My desire for Truth is overwhelmingly strong, I would change my stance if anyone would present some actual counter-arguments that go beyond the surface level.

Will give longer rebuttal later, am currently on vacation in Spain 🤝

alexej-gerstmaier-1 on The Case For Bullying

Thanks for linking, I love Worm

tailcalled on Alexander Gietelink Oldenziel's Shortform

For everyday life, flat earth is more convenient than round earth geocentrism, which in turn is more convenient than heliocentrism. Like we don't constantly change our city maps based on the time of year, for instance, which we would have to do if we used a truly heliocentric coordinate system as the positions of city buildings are not even approximately constant within such a coordinate system.

This is mainly because the sun and the earth are powerful enough to handle heliocentrism for you, e.g. the earth pulls you and the cities towards the earth so you don't have to put effort into staying on it.

The sun and the planetary motion does remain the most important governing factor for predicting activities on earth, though, even given this coordinate change. We just mix them together into ~epicyclic variables like "day"/"night" and "summer"/"autumn"/"winter"/"spring" rather than talking explicitly about the sun, the earth, and their relative positions.

tailcalled on Three Notions of "Power"

Can you explain what this coordination would look like?

khafra on Three Notions of "Power"

Your definition seems like it fits the Emperor of China example--by reputation, they had few competitors for being the most willing and able to pessimize another agent's utility function; e.g. 9 Familial Exterminations.
And that seems to be a key to understanding this type of power, because if they were able to pessimize all other agents' utility functions, that would just be an evil mirror of bargaining power. Being able to choose a sharply limited number of unfortunate agents, and punish them severely pour encourager les autres, seems like it might just stop working when the average agent is smart enough to implicitly coordinate around a shared understanding of payoff matrices.
So I think I might have arrived back to the "all dominance hierarchies will be populated solely by scheming viziers" conclusion.

fread2281 on Alexander Gietelink Oldenziel's Shortform

I guess this is sorta about your 3, which I disbelieve (though algorithms for tasks other than learning are also important). Currently, Bayesian inference vs SGD is a question of how much data you have (where SGD wins except for very little data). For small to medium amounts of data, even without AGI, I expect SGD to lose eventually due to better inference algorithms. For many problems I have the intuition that it's ~always possible to improve performance with more complicated algorithms (eg sat solvers). All that together makes me expect there to be inference algorithms that scale to very large amounts of data (that aren't going to be doing full Bayesian inference but rather some complicated approximation).

bolverk on I got dysentery so you don’t have to

Sequence 1 length:3 

Sequence 2 length:6 

Alignment length: 6 

Identity: 3/6 (50.00%) 

Similarity: 3/6 (50.00%) 

Gaps: 3/6 (50.00%)

---AGC
   |||
AGCAGC

Like this. Difference between lengths is considered non-matching.

https://en.vectorbuilder.com/tool/sequence-alignment.html