LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] How tokenization influences prompting?
Boris Kashirin (boris-kashirin) · 2024-07-29T10:28:25.056Z · answers+comments (4)

[link] Markets Are Information - Beating the Sportsbooks at Their Own Game
JJXW · 2024-11-07T20:58:43.389Z · comments (1)

NYU Debate Training Update: Methods, Baselines, Preliminary Results
samarnesen · 2024-07-06T18:28:54.053Z · comments (0)

Thinking About a Pedalboard
jefftk (jkaufman) · 2024-10-08T11:50:02.054Z · comments (2)

[link] Testing Genetic Engineering Detection with Spike-Ins
jefftk (jkaufman) · 2024-10-22T17:20:54.947Z · comments (0)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

[link] Anthropic teams up with Palantir and AWS to sell AI to defense customers
Matrice Jacobine · 2024-11-09T11:50:34.050Z · comments (0)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

My covid-related beliefs and questions
Severin T. Seehrich (sts) · 2024-07-23T03:27:09.348Z · comments (0)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

[link] Can AI agents learn to be good?
Ram Rachum (ram@rachum.com) · 2024-08-29T14:20:04.336Z · comments (0)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

Thinking About Propensity Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:23:55.091Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

[question] What are some positive developments in AI safety in 2024?
Satron · 2024-11-15T10:32:39.541Z · answers+comments (1)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

[link] An Uncanny Moat
Adam Newgas (BorisTheBrave) · 2024-11-15T11:39:15.165Z · comments (0)

[link] Michael Streamlines on Buddhism
Chris_Leong · 2024-08-09T04:44:52.126Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, July '24
gasteigerjo · 2024-08-05T13:00:46.028Z · comments (0)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

The Geometric Importance of Side Payments
StrivingForLegibility · 2024-08-07T01:38:04.635Z · comments (4)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (8)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double · 2024-09-05T00:35:39.504Z · answers+comments (9)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

[link] Approval-Seeking ⇒ Playful Evaluation
Jonathan Moregård (JonathanMoregard) · 2024-08-28T21:03:51.244Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

Consider tabooing "I think"
Adam Zerner (adamzerner) · 2024-11-12T02:00:08.433Z · comments (2)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

The grass is always greener in the environment that shaped your values
Karl Faulks (karl-faulks) · 2024-11-17T18:00:15.852Z · comments (0)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on Evolution's selection target depends on your weighting

I think there's something to this. Also since making the OP, I've been thinking that human control of fire seems important. If trees have the majority of the biomass, but humans can burn the trees for energy or just to make space, then that also makes humans special (and overlaps a lot with what you say about energy controlled).

This also neatly connects human society to the evolutionary ecology since human dominance hierarchies determine who is able to control what energy (or set fire to what trees).

ben-lang on Evolution's selection target depends on your weighting

I think a reasonable-seeming metric on which humans are doubtless the winners is "energy controlled".

Total up all the human metabolic energy, plus the output of the world's power grids, the energy of all that petrol/gas burning in cars/boilers. If you are feeling generous you could give humans a percentage of all the metabolic energy going through farm animals.

Its a bit weird, because on the one hand its obvious that collectively humans control the planet in a way no other organism does. But, you are looking for a metric where plants and single-celled organisms are allowed to participate, and they can't properly be said to control anything, even themselves.

tailcalled on Evolution's selection target depends on your weighting

The OP is more of a statement that you get different results depending on whether you focus on organism count or biomass or energy flow. I motivate this line of inquiry by a question about what evolution selects for, but that's secondary to the main point.

gerardus-mercator on Claude seems to be smarter than LessWrong community

I have a few disagreements there, but the most salient one is that I don't think that the policy of "when considering the net upside/downside of an action, calculate it with the utility function that you'll have at the time the action is finished" would even be helpful in your new example.
The agent can't magically reach into the future and grab its future utility function; the agent has to try to predict its future utility function.
And if the agent doesn't currently think that paperclip factories are valuable, it's not going to predict that in the future it'll think that paperclip factories are valuable. (It's worth noting that terminal value and incidental value are not the same thing, although I'm speaking as if they are to make the argument simpler.)
Because if the agent predicted that it was going to change its mind eventually, it'd just change its mind immediately and skip the wait.
So I don't think it would have done the agent any good in this example to try to use its future utility function, because its predicted future utility function would just average out to its current utility function.
Yes, the agent should be at least a little cautious, but using its future utility function won't help with that.

stefan42 on StefanHex's Shortform

CLDR (Cross-layer distributed representation): I don't think Lee has written his up anywhere yet so I've removed this for now.

Also, just wanted to flag that the links on 'this picture' and 'motivation image' don't currently work.

Thanks for the flag! It's these two images, I realize now that they don't seem to have direct links

Images taken from AMFTC and Crosscoders by Anthropic.

lc on Dragon Agnosticism

I think the entire point of rationalism is that you don't do things like this.

ape-in-the-coat on What are the good rationality films?

I Am Mother

Rational protagonist, who reasons under uncertainty and tries to do the right thing to the best of her knowledge, even when it requires opposing an authority figure or risking her life. A lot of focus on ethics.

The film presents a good opportunity to practise noticing your own confusion for the viewer - plot twists are masterfully hidden in plain sight and all the apparent contradictions are mysteries to be solved. Also best depiction of AI I've seen in any media.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

'China hawk and influential Trump AI advisor Jacob Helberg asserted to Reuters that “China is racing towards AGI," but I couldn't find any evidence in the report to support that claim.' https://x.com/GarrisonLovely/status/1859022323799699474

bogdan-ionut-cirstea on Why Don't We Just... Shoggoth+Face+Paraphraser?

AFAICT, there seems to quite heavy overlap between the proposal and Daniel's motivation for it and safety case (sketch) #3 in https://alignment.anthropic.com/2024/safety-cases/.

kaj_sotala on Evolution's selection target depends on your weighting

I guess I don't really understand what you're asking. I meant my comment as an answer to this bit in the OP:

I think it's common on LessWrong to think of evolution's selection target as inclusive genetic fitness - that evolution tries to create organisms which make as many organisms with similar DNA to themselves as possible. But what exactly does this select for?

In that evolution selecting for "inclusive genetic fitness" doesn't really mean selecting for anything in particular; what exactly that ends up selecting for is completely dependent on the environment (where "the environment" also includes the species itself, which is relevant for things like sexual selection or frequency-dependent selection).

If you fix the environment, assuming for the sake of argument that it's possible to do that, then the exact thing it selects for are just the traits that are useful in that environment.

Do humans have high inclusive genetic fitness?

I think it's a bit of a category mistake to ask about the inclusive fitness of a species. You could calculate the average fitness of an individual within the species, but at least to my knowledge (caveat: I'm not a biologist) that's not very useful. Usually it's individual genotypes or phenotypes within the species that are assigned a fitness.