LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

A suite of Vision Sparse Autoencoders
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-10-27T04:05:20.377Z · comments (0)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

Thoughts after the Wolfram and Yudkowsky discussion
Tahp · 2024-11-14T01:43:12.920Z · comments (13)

[link] Social events with plausible deniability
Chipmonk · 2024-11-18T18:25:17.339Z · comments (24)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

[link] Let's Design A School, Part 2.2 School as Education - The Curriculum (General)
Sable · 2024-05-07T19:22:21.730Z · comments (3)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

How to put California and Texas on the campaign trail!
Yair Halberstadt (yair-halberstadt) · 2024-11-06T06:08:25.673Z · comments (4)

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

[question] How to Model the Future of Open-Source LLMs?
[deleted] · 2024-04-19T14:28:00.175Z · answers+comments (9)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

[link] Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

How likely is brain preservation to work?
Andy_McKenzie · 2024-11-18T16:58:54.632Z · comments (3)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[link] A Theory of Equilibrium in the Offense-Defense Balance
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-15T13:51:33.376Z · comments (6)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

No Electricity in Manchuria
winstonBosan · 2024-11-19T01:11:58.661Z · comments (0)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on Heritability: Five Battles

Wow! That is a hell of a comprehensive writeup.

The bio-determinist child-rearing rule of thumb [but see caveats below!]: Things you do as a parent will have generally small or zero effects on what the kid will be like as an adult—their personality, their intelligence and competence, their mental health, etc.

I found it pretty interesting here and back when I was reading about it that this list does not include happiness. This is part of a larger societal disinterest in happiness. But I do wonder if it might be influenced by parents nontrivially by them seeding children with a life philosophy and a set of cognitive habits about how to think about life.

I also noticed that the data they tracked about how parents treat children included no efforts to determine how much parents actually loved their children, or how much they fought with their children. While most parents, particularly the middle-classers that take part in studies, love their children, how much and whether that prevents ongoing feuds with their children does seem to vary a good bit.

Of course that would be problematic, because how much parents love and feud with their children is also clearly influenced by how much said children are acting like little shits. :)

I doubt any of these would show large effects, I'm just noting their absence.

I used to care about genetics for reasons 2 (what effect do parents have) and 3 (do your adult decisions like attending therapy really matter), back when I planned to use my PhD in cognitive psychology and neuroscience to write about "free will" (really self-determination; do our decisions matter for our outcomes) in ourselves and our society. My thesis was that we have substantial self-determination but also substantial limitations in it; and that liberal American philosophies tend to emphasize the extent to which we don't, while conservative American philosophies emphasize the extent to which we do. Neither is entirely correct, causing strong adherents of either to make no sense and therefore be super irritating to talk to.

But those ambitions ended when I first read Yudkowsky, and decided free will was small potatoes in the face of an onrushing intelligence explosion and alignment crisis. Thus, my ideas related to genetics have never before been published, and probably won't be. Thanks for the excuse to rant.

Again, wow. I'll be referring anyone to this writeup if they express more than the vaguest interest in what we know about genetics.

purplehermann on We probably won't just play status games with each other after AGI

Reading novels with ancient powerful beings is probably the best direction you have for imagining how status games amongst creatures which are only loosely human look.

Resources being bounded, there will tend to always be larger numbers of smaller objects (given that those objects are stable).

There will be tiers of creatures. (In a society where this is all relevant)

While a romantic relationship skipping multiple tiers wouldn't make sense, a single tier might.

The rest of this is my imagination :)

Base humans will be F tier, the lowest category while being fully sentient. (I suppose dolphins and similar would get a special G tier).

Basic AGIs (capable of everything a standard human is, plus all the spikey capabilities) and enhanced humans E tier.

Most creatures will be here.

D tier:

Basic ASIs and super enhanced humans (gene modding for 180+ IQ plus SOTA cyborg implants) will be the next tier, there will be a bunch of these in absolute terms but relative to the earlier tier rarer.

C tier:

Then come Alien Intelligence, massive compute resources supporting ASIs trained on immense amounts of ground reality data, biological creatures that have been redesigned fundamentally to function at higher levels and optimally synergize with neural connections (whether with other carbon based or silicon based lifeforms)

B tier:

Planet sized clusters running ASIs will be a higher tier.

A, S tiers:

Then you might get entire stars, then galaxies.

There will be much less at each level.

Most tiers will have a -, neutral or +.

- : prototype, first or early version. Qualitatively smarter than the tier below, but non-optimized use of resources, often not the largest gap from the + of the earlier tier

Neutral: most low hanging optimizations and improvements and some harder ones at this tier are implemented

+: highly optimized by iteratively improved intelligences or groups of intelligences at this level, perhaps even by a tier above.

zane on Everywhere I Look, I See Kat Woods

Some of the memes you referenced do seem "cringe" to me, but people have different senses of humor. I'm not sure what the issue is with someone posting memes they personally find funny.

If you disagree with the point that the memes are making, that's different, but can you give an example of something in one of the memes she posted that you thought was invalid reasoning? You called her content "dark arts tactics" and said:

"It feels like it is trying to convince me of something rather than make me smarter about something. It feels like it is trying to convey feelings at me rather than facts."

but you've only explained how it's making you feel instead of what message it's conveying.

sharmake-farah on Marx and the Machine

I enjoyed reading this post; thank you for writing it. LessWrong has an allergy to basically every category Marx is a member of - "armchair" philosophers, socialist theorists, pop humanities idols - in my view, all entirely unjustified.

To be fair here, Marx was kind of way overoptimistic about what could be achieved with central economic planning in the 20th century, because it way overestimated how far machines/robots could go, and also this part where he says communist countries don't need a plan because the natural laws would favor communism, which was bullshit.

More here:

In his review of Peter Singer's commentary on Marx, Scott Alexander writes:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.

keltan on We probably won't just play status games with each other after AGI

Sex is fun and awesome. Though it doesn’t feel fun and awesome to have sex all day everyday. You could probably do transhuman meth and make sex fun all the time. But a Pleasure Cube/Super Happy scenario makes me sad.

I’m also wondering who you’re talking about when you say “most people” here? I have the opposite model of most people.

gwern on Passages I Highlighted in The Letters of J.R.R.Tolkien

Tolkien invented their exact usage, but he didn't invent the words. "Elf", obviously, goes way back, but "orc" also goes way back, with meanings similar to the Tolkien usage.

"Zerg", "Protoss", & "SCV", are all neologisms; notably, the least weird ones, "Kerrigan" and "Terran", are quite ordinary words. ('Hydralisk' is a bit in between. 'Hydra' is familiar, albeit increasingly hopelessly overloaded with SF/comic connotations, but 'lisk' is a very unfamiliar one: 'obelisk' is the only one that comes to mind, and that appears to get 'lisk' as a butchering of Greek and then French.)

An interesting comparison here would be Gene Wolfe's Book of the New Sun, which does something similar: it uses old words in place of neologisms, and for that reason, despite being stuffed with weird terms (so much so you can publish a dictionary of it), words like 'pelagic argosy' or 'fuligin' or 'capote' nevertheless work as well as in the 1980s as they do now, despite not having achieved the cultural currency of 'elves' or 'orcs', and so demonstrating that the 'use old words' trick works in its own right and not simply by mere familiarity.

(But conversely, writing old-timey is no surefire solution. Wolfe's dying-earth fiction was influenced by Hodgson's The Night Land, which is imaginative and influential... and the style is almost ludicrously unreadable, whether in 1912 or 2025.)

Now, why is that? I suspect that it's a mix of unrealized familiarity (you may not have seen 'destrier' often enough to consciously recognize it as a real word, much less define or use it*, but unconsciously you do) and linguistic 'dark knowledge' in recognizing that somehow, the word 'autarch' is valid and a plausible word which could exist, in a way that 'Zerg' or 'Protoss' could not exist. It somehow respects the laws of languages and etymology and spelling, and you recognize that by not immediately rejecting it like most neologisms. (And to some extent, Tolkien's own conlangs, by having their long fictional history to justify various twists & turns, gain a hidden realism that a tidy hobbyist conlang will not.)

* this is why vocab can be a good IQ test: word use frequency is the original power law, and because you have been exposed to many more words than you consciously know, and how many of those words 'stick' will reflect your intelligence's efficiency at learning from 1 or 2 uses of a word, and thus provide a good proxy

ryan_greenblatt on Yonatan Cale's Shortform

The case against a focus on algorithmic secret security is that this will emphasize and excuse a lower level of transparency which is potentially pretty bad.

moonlight on The salt in pasta water fallacy

I love this kind of post which gives a name to a specific behavior and also gives good examples for identifying it. They feel very validating for noticing the same fallacy that annoys me, but which I encounter so infrequently that it's hard to notice any pattern and articulate what feels wrong about it.

moonlight on New User's Guide to LessWrong

As a New User to LessWrong, my calculations show that the post certainly did its job! (n=1 p=0)

noa-nabeshima on Noa Nabeshima's Shortform

One barrier to SAE circuits is that it's currently hard to understand how attention out SAE latents are calculated. Even if you do IG attribution patching to try to understand which earlier latents are relevant to the attention out SAE latents, it doesn't tell you how these latents interact inside the attention layer at all.