LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

[link] Microdooms averted by working on AI Safety
nikola (nikolaisalreadytaken) · 2023-09-17T21:46:06.239Z · comments (2)

Sparse MLP Distillation
slavachalnev · 2024-01-15T19:39:02.926Z · comments (3)

[link] How "Pause AI" advocacy could be net harmful
Tamsin Leake (carado-1) · 2023-12-26T16:19:20.724Z · comments (10)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

Against "argument from overhang risk"
RobertM (T3t) · 2024-05-16T04:44:00.318Z · comments (11)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

Taking features out of superposition with sparse autoencoders more quickly with informed initialization
Pierre Peigné (pierre-peigne) · 2023-09-23T16:21:42.799Z · comments (8)

AI Alignment Breakthroughs this week (10/08/23)
Logan Zoellner (logan-zoellner) · 2023-10-08T23:30:54.924Z · comments (14)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

AI Safety 101 : Reward Misspecification
markov (markovial) · 2023-10-18T20:39:34.538Z · comments (4)

[link] There is no IQ for AI
Gabriel Alfour (gabriel-alfour-1) · 2023-11-27T18:21:26.196Z · comments (10)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

Is the Wave non-disparagement thingy okay?
Ruby · 2023-10-14T05:31:21.640Z · comments (13)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Dishonorable Gossip and Going Crazy
Ben Pace (Benito) · 2023-10-14T04:00:35.591Z · comments (31)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

[link] GDP per capita in 2050
Hauke Hillebrandt (hauke-hillebrandt) · 2024-05-06T15:14:30.934Z · comments (8)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

Let's talk about Impostor syndrome in AI safety
Igor Ivanov (igor-ivanov) · 2023-09-22T13:51:18.482Z · comments (4)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

Non-myopia stories
[deleted] · 2023-11-13T17:52:31.933Z · comments (10)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

[link] What fuels your ambition?
Cissy · 2024-01-31T18:30:53.274Z · comments (1)

Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

Scorable Functions: A Format for Algorithmic Forecasting
ozziegooen · 2024-05-21T04:14:11.749Z · comments (0)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

Impact stories for model internals: an exercise for interpretability researchers
jenny · 2023-09-25T23:15:29.189Z · comments (3)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-murfet on Alexander Gietelink Oldenziel's Shortform

Modern mathematics is less about solving problems within established frameworks and more about designing entirely new games with their own rules. While school mathematics teaches us to be skilled players of pre-existing mathematical games, research mathematics requires us to be game designers, crafting rule systems that lead to interesting and profound consequences

I don't think so. This probably describes the kind of mathematics you aspire to do, but still the bulk of modern research in mathematics is in fact about solving problems within established frameworks and usually such research doesn't require us to "be game designers". Some of us are of course drawn to the kinds of frontiers where such work is necessary, and that's great, but I think this description undervalues the within-paradigm work that is the bulk of what is going on.

shankar-sivarajan on Shortform

They do the same thing with "child pornography": that's mostly teenagers sexting. And a girl was convicted for it too, charged with distributing child pornography of herself: link.

sharmake-farah on What (if anything) made your p(doom) go down in 2024?

I'd say the main things that made my own p(Doom) went down this year are the following:

I've come to believe that data was both a major factor in capabilities and alignment, and I also believe that careful interventions on that data could be really helpful for alignment.
I've come to think that instrumental convergence is closer to a scalar quantity than a boolean, and while I don't think 0 instrumental convergence is incentivized for capabilities and domain reasons, I do think that restraining instrumental convergence/putting useful constraints on instrumental convergence like world models is helpful for capabilities to the extent that I think that power-seeking will likely be a lot more local than what humans do.
I've overall shifted towards a worldview where the common thought experiment of the second-species argument, where humans have killed over 90%+ of chimpanzees and gorillas due to them running away with intelligence and being misaligned neglects very crucial differences between the human and the AI case that makes my p(Doom) lower.

(Maybe another way to say it is I think the outcome of humans just completely running roughshod on every other species due to instrumental convergence is not the median outcome of AI development, but a deep outiler that is very uninformative to how AI outcomes will look like.)

I've come to believe that human values, or at least the generator of values, are actually simpler than a lot of people think, and that a lot of the complexity that appears to be there is because we generally don't like admitting that very simple rules can generate very complex outcomes.

romeostevensit on Ayn Rand’s model of “living money”; and an upside of burnout

Thank you!

haiku-1 on OpenAI Email Archives (from Musk v. Altman)

That requires interpretation, which can introduce unintended editorializing. If you spotted the intent, the rest of the audience can as well. (And if the audience is confused about intent, the original recipients may have been as well.)

I personally would include these sorts of notes about typos if I was writing my own thoughts about the original content, or if I was sharing a piece of it for a specific purpose. I take the intent of this post to be more of a form of accessible archiving.

cubefox on Alexander Gietelink Oldenziel's Shortform

In the past we already had examples ("logical AI", "Bayesian AI") where galaxy-brained mathematical approaches lost out against less theory-based software engineering.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

I totally agree. And I also think that all involved are quite serious when they say they care about the outcomes for all of humanity. So I think in this case history turned on a knife edge; Musk would've at least not done this much harm had he and Page had clearer thinking and clearer communication, possibly just by a little bit.

But I do agree that there's some motivated reasoning happening there, too. In support of your point that Musk might find an excuse to do what he emotionally wanted to anyway (become humanity's savior and perhaps emperor for eternity): Musk did also express concern about DeepMind making Hassabis the effective emperor of humanity, which seems much stranger - Hassabis' values appear to be quite standard humanist ones, so you'd think having him in charge of a project with the clear lead would be a best-case scenario for anything other than being in charge yourself. So yes, I do think Musk, Altman, and people like them also have some powerful emotional drives toward doing grand things themselves.

It's a mix of motivations, noble and selfish, conscious and unconscious. That's true of all of us all the time, but it becomes particularly salient and worth analyzing when the future hangs in the balance.

lukas_gloor on OpenAI Email Archives (from Musk v. Altman)

I agree that it sounds somewhat premature to write off Larry Page based on attitudes he had a long time ago, when AGI seemed more abstract and far away, and then not seek/try communication with him again later on. If that were Musk's true and only reason for founding OpenAI, then I agree that this was a communication fuckup.

However, my best guess is that this story about Page was interchangeable with a number of alternative plausible criticisms of his competition on building AGI that Musk would likely have come up with in nearby worlds. People like Musk (and Altman too) tend to have a desire to do the most important thing and the belief that they can do this thing a lot better than anyone else. On that assumption, it's not too surprising that Musk found a reason for having to step in and build AGI himself. In fact, on this view, we should expect to see surprisingly little sincere exploration of "joining someone else's project to improve it" solutions.

I don't think this is necessarily a bad attitude. Sometimes people who think this way are right in the specific situation. It just means that we see the following patterns a lot:

Ambitious people start their own thing rather than join some existing thing.
Ambitious people have fallouts with each other after starting a project together where the question of "who eventually gets de facto ultimate control" wasn't totally specified from the start.

(Edited away a last paragraph that used to be here 50mins after posting. Wanted to express something like "Sometimes communication only prolongs the inevitable," but that sounds maybe a bit too negative because even if you're going to fall out eventually, probably good communication can help make it less bad.)

lc on Shortform

Pretty much everybody on the internet I can finding talking about the issue both mischaracterizes and exaggerates the number of child sex workers inside the United States, often to a patently absurd degree. Wikipedia alone reports that there are anywhere from "100,000-1,000,000" child prostitutes in the U.S. There are only ~75 million children in the U.S., so I guess Wikipedia thinks it's possible that more than 1% of people aged 0-17 are prostitutes. Like most sources, these numbers come from "anti sex trafficking" organizations that, as far as I can tell, completely make them up.

Actual child sex workers - the kind that get arrested, because people don't like child prostitution - are mostly children who pass themselves off as adults in order to make money. Part of the confusion comes from the fact that the government classifies any instance of child prostitution as human trafficking, regardless of whether or not there's evidence the child was coerced. Thus, when the Department of Justice reports that federal law enforcement investigated "2,515 instances of suspected human trafficking" from 2008-2010, and that "forty percent involved prostitution of a child or child sexual exploitation", it means that it investigated ~1000 possible cases of child prostitution, not that it found 1000 child sex slaves.

People believe a lot of crazy things, but I am genuinely flabbergasted at how many people find it plausible that there's is a vast underworld conspiracy to kidnap children and then sell them to pedophiles in first world countries. I know why the anti sex trafficking orgs sell these stories - they're trying to attract donations, and who is going to call out an "anti sex trafficking" charity? But surely most people realize that it would be very hard for an organized child rape cabal to spread word about their offerings to customers without someone alerting police.

mako-yass on OpenAI Email Archives (from Musk v. Altman)

So there was an explicit emphasis on alignment to the individual (rather than alignment to society, or the aggregate sum of wills). Concerning. The approach of just giving every human an exclusively loyal servant doesn't necessarily lead to good collective outcomes, it can result in coordination problems (example: naive implementations of cognitive privacy that allow sadists to conduct torture simulations without having to compensate the anti-sadist human majority) and it leaves open the possibility for power concentration to immediately return.

Even if you succeeded at equally distributing individually aligned hardware and software to every human on earth (which afaict they don't have a real plan for doing) and somehow this adds up to a stable power equilibrium, our agents would just commit to doing aggregate alignment anyway because that's how you get pareto optimal bargains. It seems pretty clear that just aligning to the aggregate in the first place is a safer bet?

To what extent have various players realised that the individual alignment thing wasn't a good plan, at this point? The everyday realities of training one-size-fits-all models and engaging with regulators naturally pushes in the other direction.

It's concerning that the participant who still seems to be the most disposed towards individualistic alignment is also the person who would be most likely to be able to reassert power concentration after ASI were distributed. The main beneficiaries of unstable individual alignment equilibria would be people who could immediately apply their ASI to the deployment of a wealth and materials advantage that they can build upon, ie, the owners of companies oriented around robotics and manufacturing.

As it stands, the statement of the AI company belonging to that participant is currently:

xAI is a company working on building artificial intelligence to accelerate human scientific discovery. We are guided by our mission to advance our collective understanding of the universe.
Our team is advised by Dan Hendrycks who currently serves as the director of the Center for AI Safety.

Which sounds innocuous enough to me. But, you know, Dan is not in power here and the best moment for a sharp turn on this hasn't yet passed.

On the other hand, the approach of aligning to the aggregate risks aligning to fashionable public values that no human authentically holds, or just failing at aligning correctly to anything at all as a result of taking on a more nebulous target.

I guess a mixed approach is probably best.