LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

mathieuroy on Let's split the cake, lengthwise, upwise and slantwise

thanks, it worked! https://web.archive.org/web/20150412211654/http://reducing-suffering.org/wp-content/uploads/2015/02/wild-animals_2015-02-28.pdf

mathieuroy on Mati_Roy's Shortform

i want a better conceptual understanding of what "fundamental values" means, and how to disentangled that from beliefs (ex.: in an LLM). like, is there a meaningful way we can say that a "cat classifier" is valuing classifying cats even though it sometimes fail?

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

A bit of a rant, yes, but some good thoughts here.

I agree that unenforceable regulation can be a bad thing. On the other hand, it can also work in some limited ways. For example, the international agreements against heritable human genetic engineering seem to have held up fairly well. But I think that that requires supporting facts about the world to be true. It needs to not be obviously highly profitable to defectors, it needs to be relatively inaccessible to most people (requiring specialized tech and knowledge), it needs to fit with our collective intuitions (bio-engineering humans seems kinda icky to a lot of people).

The trouble is, all of these things fail to help us with the problem of dangerous AI! As you point out, many bitcoin miners have plenty of GPUs to be dangerous if we get even a couple more orders-of-magnitude algorithmic efficiency improvements. So it's accessible. AI and AGI offer many tempting ways to acquire power and money in society. So it's immediately and incrementally profitable. People aren't as widely instinctively outraged by AI experiments as Bio-engineering experiments. So it's not intuitively repulsive.

So yes, this seems to me to be very much a situation in which we should not place any trust in unenforceable regulation.

I also agree that we probably do need some sort of organization which enforces the necessary protections (detection and destruction) against rogue AI.

And it does seem potentially like a lot of human satisfaction could be bought in the near future with a focus on making sure everyone in the world gets a reasonable minimum amount of satisfaction from their physical and social environments as you describe here:

Usually, the median person is interested in: jobs, a full fridge, rituals, culture, the spread of their opinion leader's information, dopamine, political and other random and inherited values, life, continuation of life, and the like. Provide a universal way of obtaining this and just monitor it calmly.

As Connor Leahy has said, we should be able to build sufficiently powerful tool-AI to not need to build AGI! Stop while we still have control! Use the wealth to buy off those who would try anyway. Also, build an enforcement agency to stop runaway AI or AI misuse.

I don't know how we get there from here though.

Also, the offense-dominant weapons development landscape is looking really grim, and I don't see how to easily patch that.

On the other hand, I don't think we buy ourselves any chance of victory by trying to gag ourselves for fear of speeding up AGI development. It's coming soon regardless of what we do! The race is short now, we need to act fast!

I don't buy the arguments that our discussions here will make a significant impact in the timing of the arrival of AGI. That seems like hubris to me, to imagine we have such substantial effects, just from our discussions.

Code? Yes, code can be dangerous and shouldn't be published if so.

Sufficiently detailed technical descriptions of potential advancements? Yeah, I can see that being dangerous.

Unsubstantiated commentary about a published paper being interesting and potentially having both capabilities and alignment value? I am unconvinced that such discussions meaningfully impact the experiments being undertaken in AI labs.

the-gears-to-ascension on My hour of memoryless lucidity

geez, that's certainly a list of chemicals. I wonder what the ratios were - my intuition finds it less surprising for you to be less impaired if no one of them is particularly high dose.

martinkunev on Examples of Highly Counterfactual Discoveries?

I have previously used special relativity as an example to the opposite. It seems to me that the Michelson-Morley experiment laid the groundwork and all alternatives were more or less rejected by the time special relativity was formulated. This could be hindsight bias though.

If nobel prizes are any indicator, then the photoelectric effect is probably more counterfactually impactful than special relativity.

migueldev on CLR's recent work on multi-agent systems

safe Pareto improvement (SPI)

This URL is broken.

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

Hmm. Seems... fragile. I don't think that's a reason not to do it, but I also wouldn't put much hope in the idea that leaks would be successfully prevented by this system.

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

I think you make some valid points. In particular, I agree that some people seem to have fallen into a trap of being unrealistically pessimistic about AI outcomes which mirrors the errors of those AI developers and cheerleaders who are being unrealistically optimistic.

On the other hand, I disagree with this critique (although I can see where you're coming from):

If it's instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this "foom" is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve.

I think that FOOM is a real risk, and I have a lot of evidence grounding my calculations about available algorithmic efficiency improvements based on estimates of the compute of the human brain. The conclusion I draw from believing that FOOM is both possible, and indeed likely, after a certain threshold of AI R&D capability is reached by AI models is that preventing/controlling FOOM is an engineering problem.

I don't think we should expect a model in training to become super-human so fast that it blows past our ability to evaluate it. I do think that in order to have the best chance of catching and controlling a rapid accelerating take-off, we need to do pre-emptive engineering work. We need very comprehensive evals to have detailed measures of key factors like general capability, reasoning, deception, self-preservation, and agency. We need carefully designed high-security training facilities with air-gapped datacenters. We need regulation that prevents irresponsible actors from undertaking unsafe experiments. Indeed, most of the critical work to preventing uncontrolled rogue AGI due to FOOM is well described by 'boring engineering problems' or 'boring regulation and enforcement problems'.

Believing in the dangers of recursive self-improvement doesn't necessarily involve believing that the best solution is a genius theoretical answer to value and intent alignment. I wouldn't rule the chance of that out, but I certainly don't expect that slim possibility. It seems foolish to trust in that the primary hope for humanity. Instead, let's focus on doing the necessary engineering and political work so that we can proceed with reasonable safety measures in place!

beck-stein on Funny Anecdote of Eliezer From His Sister

I am being told that Sheva Brachos in this example is the series of celebrations in the week after the wedding. I don't know if that's a correction or just context, but there you go.

metachirality on LessOnline (May 31—June 2, Berkeley, CA)

Isn't TLP's email on his website?

LessWrong 2.0 Reader

Archive

Recent comments