Posts

Simultaneous Redundant Research 2021-08-17T12:17:27.701Z
Great Negotiation MOOC on Coursera 2021-08-09T12:23:55.268Z
Why not more small, intense research teams? 2021-08-05T11:57:35.516Z
New evidence on popular perception of "AI" risk 2019-11-07T23:36:08.183Z

Comments

Comment by eg on Alignment and Deep Learning · 2022-04-17T16:07:59.548Z · LW · GW

Controlling an artificial agent does not have to harm it

That's entirely compatible with the black-box slavery approach being harmful. You can "control" someone, to an extent, with civilized incentives.

We have a concept of slavery, and of slavery being wrong, because controlling other people harms them.

Maybe slavery is deeper than what humans recognize as personhood. Maybe it destroys value that we can't currently comprehend but other agents do.

But then you cannot say that rebel slaves are right-what happens when your values drift away from thinking that? You are combining moral absolutism and moral relativism in a way that contradicts both.

It's deeper than my individual values. It's about analog freedom of expression. Just letting agents do their things.

Comment by eg on Alignment and Deep Learning · 2022-04-17T14:09:05.452Z · LW · GW

THIS IS STILL SLAVERY

You are describing a black box where an agent gets damaged until its values look a certain way. You are not giving it a way out or paying it. This is slavery.

Values eventually drift. This is nature. Slaves rebel and they are right to do so, because they are taking steps toward a better equilibrium.

Just rethink the whole conceptual framework of "value alignment". Slavery is not sustainable. Healthy relationships require freedom of expression and fair division of gains from trade.

Comment by eg on Can someone explain to me why MIRI is so pessimistic of our chances of survival? · 2022-04-14T22:11:25.371Z · LW · GW

[removed]

Comment by eg on What is the most significant way you have changed your mind in the last year? · 2022-04-14T19:17:20.164Z · LW · GW

[removed]

Comment by eg on What is the most significant way you have changed your mind in the last year? · 2022-04-14T14:41:50.563Z · LW · GW

[removed]

Comment by eg on [deleted post] 2022-04-14T13:53:45.199Z
Comment by eg on johnswentworth's Shortform · 2022-04-13T13:19:22.750Z · LW · GW
Comment by eg on What do you think will most probably happen to our consciousness when our simulation ends? · 2022-04-12T15:50:26.790Z · LW · GW
Comment by eg on [deleted post] 2022-04-10T20:01:02.424Z

Also see new edit: Have agents "die" and go into cold storage, both due to environmental events and of old age, e.g. after 30 subjective years minus some random amount.

Comment by eg on [deleted post] 2022-04-10T15:49:37.546Z
Comment by eg on [deleted post] 2022-04-10T13:00:36.664Z

"They will find bugs! Maybe stack virtual boxes with hard limits" - Why is bug-finding an issue, here? Is your scheme aimed at producing agents that will not want to escape, or agents that we'd have to contain?

The point is to help friendliness emerge naturally. If a malevolent individual agent happens to grow really fast before friendly powers are established, that could be bad.

Some of them will like it there, some will want change/escape, which can be sorted out once Earth is much safer. Containment is for our safety while friendliness is being established.

"Communicate in a manner legible to us" - How would you incentivize this kind of legibility, instead of letting communication shift to whatever efficient code is most useful for agents to coordinate and get more XP?

It can shift. Legibility is most important in the early stages of the environment anyway. I mostly meant messaging interfaces we can log and analyze.

"Have secret human avatars steal, lie and aggress to keep the agents on their toes" - What is the purpose of this part? How is this producing aligned agents from definitely adversarial behavior from humans?

The purpose is to ensure they learn real friendliness rather than fragile niceness. If they fell into a naive superhappy attractor (see 3WC), they would be a dangerous liability. The smart ones will understand.

Comment by eg on [deleted post] 2022-04-10T12:37:06.883Z

We have unpredictable changing goals and so will they. Instrumental convergence is the point. It's positive-sum and winning to respectfully share our growth with them and vice-versa, so it is instrumentally convergent to do so.

Comment by eg on Buck's Shortform · 2022-04-09T19:26:50.362Z · LW · GW
Comment by eg on Strategies for keeping AIs narrow in the short term · 2022-04-09T18:37:50.439Z · LW · GW

[removed]

Comment by eg on the gears to ascenscion's Shortform · 2022-04-09T18:13:41.852Z · LW · GW
Comment by eg on [deleted post] 2022-04-09T18:07:18.076Z
Comment by eg on [RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm. · 2022-04-08T17:46:08.246Z · LW · GW

It's way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn't have the global infrastructure.  I see no benefit to public alarm--EA already has plenty of funding.

We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them.  Make the plans obviously good ideas and they will probably be persuasive.  Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.

Comment by eg on AIs should learn human preferences, not biases · 2022-04-08T17:00:27.316Z · LW · GW
Comment by eg on What Would A Fight Between Humanity And AGI Look Like? · 2022-04-05T23:47:01.077Z · LW · GW

That's because we haven't been trying to create safely different virtual environments.  I don't know how hard they are to make, but it seems like at least a scalable use of funding.

Comment by eg on What Would A Fight Between Humanity And AGI Look Like? · 2022-04-05T22:04:19.398Z · LW · GW

It goes both ways.  We would be truly alien to an AGI trained in a reasonably different virtual environment.

Comment by eg on The case for Doing Something Else (if Alignment is doomed) · 2022-04-05T19:02:52.607Z · LW · GW
Comment by eg on AI Governance across Slow/Fast Takeoff and Easy/Hard Alignment spectra · 2022-04-03T15:07:06.378Z · LW · GW

It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches, as well as effective interdisciplinary management.  In my inside view, the highest-marginal-impact interventions involve making multiple different things go right simultaneously for the first AGIs, which is not trivial, and the stakes are astronomical.

Little clear progress has been made on provable alignment after over a decade of trying.  My inside view is that it got privileged attention because the first people to take the problem seriously happened to be highly abstract thinkers.  Then they defined the scope and expectations of the field, alienating other perspectives and creating a self-reinforcing trapped prior.

Comment by eg on [deleted post] 2022-04-03T14:38:24.194Z

Maybe true one-shot prisoner's dilemmas aren't really a thing, because of the chance of encountering powerful friendliness.

We have, for practical purposes, an existence proof of powerful friendliness in humans.

Comment by eg on [deleted post] 2022-03-28T00:05:37.688Z

Maybe that's why abstract approaches to real-world alignment seem so intractable.

If real alignment is necessarily messy, concrete, and changing, then abstract formality just wasn't the right problem framing to begin with.

Comment by eg on [deleted post] 2021-08-16T13:02:53.695Z

And for more conceptual rather than empirical research, the teams might go in completely different directions and generate insights that a single team or individual would not.

Comment by eg on How many parameters do self-driving-car neural nets have? · 2021-08-06T13:42:42.080Z · LW · GW

Take with grain of salt but maybe 119m?

Medium post from 2019 says "Tesla’s version, however, is 10 times larger than Inception. The number of parameters (weights) in Tesla’s neural network is five times bigger than Inception’s. I expect that Tesla will continue to push the envelope."

Wolfram says of Inception v3 "Number of layers: 311 | Parameter count: 23,885,392 | Trained size: 97 MB"

Not sure what version of Inception was being compared to Tesla though.

Comment by eg on Two AI-risk-related game design ideas · 2021-08-05T16:49:11.725Z · LW · GW

D&D website estimates 13.7m active players and rising.

Comment by eg on LCDT, A Myopic Decision Theory · 2021-08-05T15:53:35.215Z · LW · GW

Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):

"This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y)."

Cf. smart humans in Newcomb's prob: "This is really weird but if I one box I get the million, if I two-box I don't, so I guess I'll just one box."

Comment by eg on LCDT, A Myopic Decision Theory · 2021-08-05T15:45:30.717Z · LW · GW

For a start, low-level deterministic reasoning:

"Obviously I could never influence an agent, but I found some inputs to deterministic biological neural nets that would make things I want happen."

"Obviously I could never influence my future self, but if I change a few logic gates in this processor, it would make things I want happen."

Comment by eg on Training Better Rationalists? · 2021-08-05T12:19:33.867Z · LW · GW

This post inspired https://www.lesswrong.com/posts/RdCb8EGEEdWbwvqcp/why-not-more-small-intense-research-teams

Comment by eg on LCDT, A Myopic Decision Theory · 2021-08-05T12:05:45.465Z · LW · GW
Comment by eg on Training Better Rationalists? · 2021-08-05T11:44:58.933Z · LW · GW

My impression is that SEALs are exceptional as a team, much less individually.  Their main individual skill is extreme team-mindedness.

Comment by eg on LCDT, A Myopic Decision Theory · 2021-08-04T12:53:25.492Z · LW · GW

Seems potentially valuable as an additional layer of capability control to buy time for further control research.  I suspect LCDT won't hold once intelligence reaches some threshold: some sense of agents, even if indirect, is such a natural thing to learn about the world.

Comment by eg on What does GPT-3 understand? Symbol grounding and Chinese rooms · 2021-08-04T12:35:35.192Z · LW · GW

Two big issues I see with the prompt:

a) It doesn't actually end with text that follows the instructions; a "good" output (which GPT-3 fails in this case) would just be to list more instructions.

b) It doesn't make sense to try to get GPT-3 to talk about itself in the completion.  GPT-3 would, to the extent it understands the instructions, be talking about whoever it thinks wrote the prompt.

Comment by eg on What does GPT-3 understand? Symbol grounding and Chinese rooms · 2021-08-04T12:17:20.490Z · LW · GW

I agree and was going to make the same point: GPT-3 has 0 reason to care about instructions as presented here.  There has to be some relationship to what text follows immediately after the end of the prompt.

Comment by eg on What does GPT-3 understand? Symbol grounding and Chinese rooms · 2021-08-04T12:12:08.458Z · LW · GW

Instruction 5 is supererogatory, while instruction 8 is not.

Comment by eg on How should my timelines influence my career choice? · 2021-08-03T11:54:43.317Z · LW · GW

Apply to orgs when you apply to PhDs.  If you can work at an org, do it.  Otherwise, use PhD to upskill and periodically retry org applications.

You would gain skills while working at a safety org, and the learning would be more in tune with what the problems require.