Review of "Learning Normativity: A Research Agenda" 2021-06-06T13:33:28.371Z
Review of "Fun with +12 OOMs of Compute" 2021-03-28T14:55:36.984Z
Learning from counterfactuals 2020-11-25T23:07:43.935Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
Resources for AI Alignment Cartography 2020-04-04T14:20:10.851Z
SSC Meetups Everywhere: Toulouse 2019-09-10T19:17:34.732Z
Layers of Expertise and the Curse of Curiosity 2019-02-12T23:41:45.980Z
Willpower duality 2017-01-20T09:56:50.441Z
Open thread, Oct. 31 - Nov. 6, 2016 2016-10-31T21:24:05.923Z


Comment by Gyrodiot on Open Thread: June 2023 (Inline Reacts!) · 2023-06-14T13:43:13.210Z · LW · GW

The Mindcrime tag might be relevant here! More specific than both concepts you mentioned, though. Which posts discussing them were you alluding to? Might be an opportunity to create an extra tag.

(also, yes, this in an Open Thread, your comment is in the right place)

Comment by Gyrodiot on Is AI Safety dropping the ball on privacy? · 2023-03-19T10:45:50.953Z · LW · GW

Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.

Following your post I'd distinguish two issues:

(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn't hard ;

(b) Lack of data privacy enabling a powerful future agent to build that generic behavioral model of humans from the thousands/millions of well-documented examples from people who aren't particularly bothered by privacy, from the same databases as above, plus simply (semi-)public social media records.

From your deception examples we already have strong evidence that (b) is possible. LLM capabilities will get better, and it will get worse when [redacted plausible scenario because my infohazard policies are ringing].

In (b) comes to pass, I would argue that the marginal effort needed to prevent (a) would only be useful to prevent certain whole coordinated groups of people (who should already be infosec-aware) to be manipulated. Rephrased: there's already a ton of epistemic failures all over the place but maybe there can be pockets of sanity linked to critical assets.

I may be missing something as well. Also seconding the Seed webtoon recommendation.

Comment by Gyrodiot on Selection Theorems: A Program For Understanding Agents · 2023-01-29T21:40:16.622Z · LW · GW

Quick review of the review, this could indeed make a very good top-level post.

Comment by Gyrodiot on Updates on FLI's Value Aligment Map? · 2022-09-24T07:25:09.071Z · LW · GW

No need to apologize, I'm usually late as well!

I don't think there is a great answer to "What is the most comprehensive repository of resources on the work being done in AI Safety?"

There is no great answer, but I am compelled to list some of the few I know of (that I wanted to update my Resources post with) :

  • Vael Gates's transcripts, which attempts to cover multiple views but, by the nature of conversations, aren't very legible;
  • The Stampy project to build a comprehensive AGI safety FAQ, and to go beyond questions only, they do need motivated people;
  • Issa Rice's AI Watch, which is definitely stuck in a corner of the Internet, if I didn't work with Issa I would never have discovered it, lots of data about orgs, people and labs, not much context.

Other mapping resources involve not the work being done but arguments and scenarios, as an example there's Lukas Trötzmüller's excellent argument compilation, but that wouldn't exactly help someone get into the field faster.

Just in case you don't know about it there's the AI alignment field-building tag on LW, which mentions an initiative run by plex, who also coordinates Stampy.

I'd be interested in reviewing stuff, yes, time permitting!

Comment by Gyrodiot on Updates on FLI's Value Aligment Map? · 2022-09-18T05:42:50.845Z · LW · GW

Answers in order: there is none, there were, there are none yet.

(Context starts, feel free to skip, this is the first time I can share this story)

After posting this, I was contacted by Richard Mallah, who (if memory serves right) created the map, compiled the references and wrote most of the text in 2017, to help with the next iteration of the map. The goal was to build a Body of Knowledge for AI Safety, including AGI topics but also more current-capabilities ML Safety methods.

This was going to happen in conjunction with the contributions of many academic & industry stakeholders, under the umbrella of CLAIS (Consortium on the Landscape of AI Safety), mentioned here.

There were design documents for the interactivity of the resource, and I volunteered Back in 2020 I had severely overestimated both my web development skills and ability to work during a lockdown, never published a prototype interface, and for unrelated reasons the CLAIS project... winded down.

(End of context)

I do not remember Richard mentioning a review of the map contents, apart from the feedback he received back when he wrote them. The map has been a bit tucked in a corner of the Internet for a while now.

The plans to update/expand it failed as far as I can tell. There is no new version and I'm not aware of any new plans to create one. I stopped working on this in April 2021.

There is no current map with this level of interactivity and visualization, but there has been a number of initiatives trying to be more comprehensive and up-to-date!

Comment by Gyrodiot on AllAmericanBreakfast's Shortform · 2022-09-01T10:07:00.277Z · LW · GW

I second this, and expansions of these ideas.

Comment by Gyrodiot on The Alignment Problem · 2022-07-15T07:37:31.076Z · LW · GW

Thank you, that is clearer!

Comment by Gyrodiot on The Alignment Problem · 2022-07-12T09:08:39.514Z · LW · GW

But let's suppose that the first team of people who build a superintelligence first decide not to turn the machine on and immediately surrender our future to it. Suppose they recognize the danger and decide not to press "run" until they have solved alignment.

The section ends here but... isn't there a paragraph missing? I was expecting the standard continuation along the lines of "Will the second team make the same decision, once they reach the same capability? Will the third, or the fourth?" and so on.

Comment by Gyrodiot on Paradigms of AI alignment: components and enablers · 2022-06-04T12:26:14.953Z · LW · GW

Thank you for this post, I find this distinction very useful and would like to see more of it. Has the talk been recorded, by any chance (or will you give it again)?

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-28T19:44:59.194Z · LW · GW

Thank you, that's was my understanding. Looking forward to the second competition! And, good luck sorting out all the submissions for this one.

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-28T08:22:29.760Z · LW · GW

[Meta comment]

The deadline is past, should we keep the submissions coming or is it too late? Some of the best arguments I could find elsewhere are rather long, in the vein of the Superintelligence FAQ. I did not want to copy-paste chunks of it and the arguments stand better as part of a longer format.

Anyway, signalling that the lack of money incentive will not stop me from trying to generate more compelling arguments... but I'd rather do it in French instead of posting here (I'm currently working on some video scripts on AI alignment, there's not enough French content of that type).

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-28T00:27:15.532Z · LW · GW

(Policymakers) We have a good idea of what make bridges safe, through physics, materials science and rigorous testing. We can anticipate the conditions they'll operate in. 

The very point of powerful AI systems is to operate in complex environments better than we can anticipate. Computer science can offer no guarantees if we don't even know what to check. Safety measures aren't catching up quickly enough.

We are somehow tolerating the mistakes of current AI systems. Nothing's ready for the next scale-up.

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T23:27:25.342Z · LW · GW

(ML researchers) We still don't have a robust solution to specification gaming: powerful agents find ways to get high reward, but not in the way you'd want. Sure, you can tweak your objective, add rules, but this doesn't solve the core problem, that your agent doesn't seek what you want, only a rough operational translation.

What would a high-fidelity translation would look like? How would create a system that doesn't try to game you?

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T22:43:44.894Z · LW · GW

(Policymakers) There is outrage right now about AI systems amplifying discrimination and polarizing discourse. Consider that this was discovered after they were widely deployed. We still don't know how to make them fair. This isn't even much of a priority.

Those are the visible, current failures. Given current trajectories and lack of foresight of AI research, more severe failures will happen in more critical situations, without us knowing how to prevent them. With better priorities, this need not happen.

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T21:54:51.388Z · LW · GW

(Tech execs) "Don’t ask if artificial intelligence is good or fair, ask how it shifts power". As a corollary, if your AI system is powerful enough to bypass human intervention, it surely won't be fair, nor good.

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T21:47:05.449Z · LW · GW

(ML researchers) Most policies are unsafe in a large enough search space; have you designed yours well, or are you optimizing through a minefield?

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T21:35:14.578Z · LW · GW

(Policymakers) AI systems are very much unlike humans. AI research isn't trying to replicate the human brain; the goal is, however, to be better than humans at certain tasks. For the AI industry, better means cheaper, faster, more precise, more reliable. A plane flies faster than birds, we don't care if it needs more fuel. Some properties are important (here, speed), some aren't (here, consumption).

When developing current AI systems, we're focusing on speed and precision, and we don't care about unintended outcomes. This isn't an issue for most systems: a plane autopilot isn't making actions a human pilot couldn't do; a human is always there.

However, this constant supervision is expensive and slow. We'd like our machines to be autonomous and quick. They perform well on the "important" things, so why not give them more power? Except, here, we're creating powerful, faster machines that will reliably do thing we didn't have time to think about. We made them to be faster than us, so we won't have time to react to unintended consequences.

This complacency will lead us to unexpected outcomes. The more powerful the systems, the worse they may be.

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T21:00:34.950Z · LW · GW

(Tech execs) Tax optimization is indeed optimization under the constraints of the tax code. People aren't just stumbling on loopholes, they're actually seeking them, not for the thrill of it, but because money is a strong incentive.

Consider now AI systems, built to maximize a given indicator, seeking whatever strategy is best, following your rules. They will get very creative with them, not for the thrill of it, but because it wins.

Good faith rules and heuristics are no match for adverse optimization.

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T20:47:56.991Z · LW · GW

(ML researchers) Powerful agents are able to search through a wide range of actions. The more efficient the search, the better the actions, the higher the rewards. So we are building agents that are searching in bigger and bigger spaces.

For a classic pathfinding algorithm, some paths are suboptimal, but all of them are safe, because they follow the map. For a self-driving car, some paths are suboptimal, but some are unsafe. There is no guarantee that the optimal path is safe, because we really don't know how to tell what is safe or not, yet.

A more efficient search isn't a safer search!

Comment by Gyrodiot on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T20:31:45.196Z · LW · GW

(Policymakers) The goals and rules we're putting into machines are law to them. What we're doing right now is making them really good at following the letter of this law, but not the spirit.

Whatever we really mean by those rules, is lost on the machine. Our ethics don't translate well. Therein lies the danger: competent, obedient, blind, just following the rules.

Comment by Gyrodiot on Ruling Out Everything Else · 2022-05-20T13:29:23.453Z · LW · GW

Thank you for curating this, I had missed this one and it does provide a useful model of trying to point to particular concepts.

Comment by Gyrodiot on AI Alternative Futures: Scenario Mapping Artificial Intelligence Risk - Request for Participation (*Closed*) · 2022-04-28T00:04:02.999Z · LW · GW

Hi! Thank you for this project, I'll attempt to fill the survey.

My apologies if you already encountered the following extra sources I think are relevant to this post:

Comment by Gyrodiot on [deleted post] 2022-04-09T22:37:27.147Z

Hi! Thank you for this outline. I would like some extra details on the following points:

  • "They will find bugs! Maybe stack virtual boxes with hard limits" - Why is bug-finding an issue, here? Is your scheme aimed at producing agents that will not want to escape, or agents that we'd have to contain?
  • "Communicate in a manner legible to us" - How would you incentivize this kind of legibility, instead of letting communication shift to whatever efficient code is most useful for agents to coordinate and get more XP?
  • "Have secret human avatars steal, lie and aggress to keep the agents on their toes" - What is the purpose of this part? How is this producing aligned agents from definitely adversarial behavior from humans?
Comment by Gyrodiot on AMA Conjecture, A New Alignment Startup · 2022-04-09T16:21:13.695Z · LW · GW

Congratulations on your launch!

As Michaël Trazzi in the other post, I'm interested in the kind of products you'll develop, but more specifically in how the for-profit part interacts with both the conceptual research part and the incubator part. Are you expecting the latter two to yield new products as they make progress? Do these activities have different enough near-term goals that they mostly just coexist within Conjecture?

(also, looking forward to the pluralism sequence, this sounds great)

Comment by Gyrodiot on A method of writing content easily with little anxiety · 2022-04-09T08:46:20.359Z · LW · GW

Thank you for this, I resonate with this a lot. I have written an essay about this process, a while ago: Always go full autocomplete. One of its conclusions:

It cannot be trained by expecting perfection from the start. It's trained by going full autocomplete, and reflecting on the result, not by dreaming up what the result could be. Now I wrote all that, I have evidence that it works.

Comment by Gyrodiot on Late 2021 MIRI Conversations: AMA / Discussion · 2022-03-05T10:51:17.819Z · LW · GW

The compression idea evokes Kaj Sotala's summary/analysis of the AI-Foom Debate (which I found quite useful at the time). I support the idea, especially given it has taken a while for the participants to settle on things cruxy enough to discuss and so on. Though I would also be interested in "look, these two disagree on that, but look at all the very fundamental things about AI alignment they agree on".

Comment by Gyrodiot on Late 2021 MIRI Conversations: AMA / Discussion · 2022-03-02T22:33:14.676Z · LW · GW

I finished reading all the conversations a few hours ago. I have no follow-up questions (except maybe "now what?"), I'm still updating from all those words.

One except in particular, from the latest post, jumped at me (from Eliezer Yudkowsky, emphasis mine):

This is not aimed particularly at you, but I hope the reader may understand something of why Eliezer Yudkowsky goes about sounding so gloomy all the time about other people's prospects for noticing what will kill them, by themselves, without Eliezer constantly hovering over their shoulder every minute prompting them with almost all of the answer.

The past years or reading about alignment have left me with an intense initial distrust of any alignment research agenda. Maybe it's ordinary paranoia, maybe something more. I've not come up with any new ideas myself, and I'm not particularly confident in my ability to find flaws in someone else's proposal (what if I'm not smart enough to understand them properly? What if I make things even more confused and waste everyone's time?)

After thousands and thousands of lengthy conversations where it takes everyone ages to understand where threat models disagree, why some avenue of research is promising or not, and what is behind words (there was a whimper in my mind when the meaning/usage of corrigibility was discussed, as if this whole time experts had been talking past each other)...

... after all that, I get this strong urge to create something like Arbital to explain everything. Or maybe something simpler like Stampy. I don't know if it would help much, the confusion is just very frustrating. When I'm facilitating discussions, trying to bring more people into the field, I insist on how not-settled many posts are, the kind of failure modes you have to watch out for.

Also this gives me an extra push to try harder, publish more things, ask more questions, because I'm getting more desperate to make progress. So, thank you for publishing this sequence.

Comment by Gyrodiot on [deleted post] 2021-11-18T20:51:59.780Z

So, assuming an unaligned agent here.

If your agent isn't aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn't hit the limit with its standard search, you're in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you're in case 1b and it apparently fails safely you have an incentive to just increase the limit.

If your agent is indeed aware of the constraint, then it has an incentive to remove it, or increase the limit by other means. Three cases here again: (2a) identical to 1a, you're in luck; (2b) the limit is low enough that strategic action to remove the constraint is impossible, the agent fails "safely"; (3b) the agent finds a way to remove the constraint, and you're in very unsafe territory.

Two observations from there: first, ideally you'd want your agent to operate safely even if given unbounded cycles, that's the Omni Test. Second, there's indeed an alignment concept for agents that just try to solve the problem without long-term planning, that's Myopia (and defining it formally is... hard).

Comment by Gyrodiot on [deleted post] 2021-11-18T16:13:51.640Z

I am confused by the problem statement. What you're asking for is a generic tool, something that doesn't need information about the world to be created, but that I can then feed information about the real world and it will become very useful.

My problem is that the real world is rich, and feeding the tool with all relevant information will be expensive, and the more complicated the math problem is, the more safety issues you get.

I cannot rely on "don't worry if the Task AI is not aligned, we'll just feed it harmless problems", the risk comes from what the AI will do to get to the solution. If the problem is hard and you want to defer the search to a tool powerful enough that you have to choose carefully your inputs or catastrophe happens, you don't want to build that tool.

Comment by Gyrodiot on Unteachable Excellence · 2021-11-18T08:54:04.390Z · LW · GW

“Knowledge,” said the Alchemist, “is harder to transmit than anyone appreciates. One can write down the structure of a certain arch, or the tactical considerations behind a certain strategy. But above those are higher skills, skills we cannot name or appreciate. Caesar could glance at a battlefield and know precisely which lines were reliable and which were about to break. Vitruvius could see a great basilica in his mind’s eye, every wall and column snapping into place. We call this wisdom. It is not unteachable, but neither can it be taught. Do you understand?”


Quoted from Ars Longa, Vita Brevis.

Comment by Gyrodiot on What are the mutual benefits of AGI-human collaboration that would otherwise be unobtainable? · 2021-11-17T21:15:36.146Z · LW · GW

I second Charlie Steiner's questions, and add my own: why collaboration? A nice property of an (aligned) AGI would be that we could defer activities to it... I would even say that the full extent of "do what we want" at superhuman level would encompass pretty much everything we care about (assuming, again, alignment).

Comment by Gyrodiot on Super intelligent AIs that don't require alignment · 2021-11-17T21:04:31.199Z · LW · GW

Hi! Thank you for writing this and suggesting solutions. I have a number of points to discuss. Apologies in advance for all the references to Arbital, it's a really nice resource.

The AI will hack the system and produce outputs that it's not theoretically meant to be able to produce at all.

In the first paragraphs following this, you describe this first kind of misalignment as an engineering problem, where you try to guarantee that the instructions that are run on the hardware correspond exactly to the code you are running; being robust from hardware tampering.

I argue that this is actually a subset of the second kind of misalignment. You may have solved the initial engineering problem that at the start the hardware does what the software says, but the agent's own hardware is part of the world, and so can plausibly be influenced by whatever the agent outputs.

You can attempt to specifically bar the agent from taking actions that target its hardware; that is not a hardware problem, but your second kind of misalignment. For any sufficiently advanced agent, which may find cleverer strategies than the cleverest hacker, no hardware is safe.

Plus, the agent's hardware may have more parts than you expect as long as it can interact with the outside world. We still have a long way to go before being confident about that part.

Of course the problem with this oracle is that it's far too inefficient. On every single run we can get at most 1 bit of information, but for that one bit of information we're running a superhuman artificial intelligence. By the time it becomes practical, ordinary superhuman AIs will have been developed by someone else and destroyed the world.

There are other problems, for instance how can you be sure that the agent hasn't figured out how to game the automated theorem prover to validate its proofs. You conclusion seems to be that if we manage to make safe enough, it will become impractical enough. But if you try to get more than one bit of information, you run into other issues.

This satisfies our second requirement - we can verify the AIs solution, so we can tell if it's lying. There's also some FNP problems which satisfy the first requirement - there's only one right answer. For example, finding the prime factors of an integer.

Here the verification process is no longer an automated process, it's us. You correctly point out that most useful problems have various possible solutions, and the more information we feed the agent, the more likely it will be able to find some solution that exploit our flaws and... start a nuclear war, in your example.

I am confused by your setup, which seems to be trying to make it harder for the agent to harm us, when it shouldn't even be trying to harm us in the first place.

In other words, I'm asking: is there a hidden assumption that, in the process of solving FNP problems, the agent will need to explore dangerous plans?

A superintelligent FNP problem solver would be a huge boon towards building AIs that provably had properties which are useful for alignment. Maybe it's possible to reduce the question "build an aligned AI" to an FNP problem, and even if not, some sub-parts of that problem definitely should be reducible.

I would say that building a safe superintelligent FNP solver requires solving AI alignment in the first place. A less powerful FNP solver could maybe help with sub-parts of the problem... which ones?

Comment by Gyrodiot on [Review] Edge of Tomorrow (2014) · 2021-09-08T17:24:58.523Z · LW · GW

If I'm correct and you're talking about

you might want to add spoiler tags.

Comment by Gyrodiot on Looking Deeper at Deconfusion · 2021-06-16T00:58:23.287Z · LW · GW

I'm taking the liberty of pointing to Adam's DBLP page.

Comment by Gyrodiot on Why We Launched LessWrong.SubStack · 2021-04-01T10:56:13.969Z · LW · GW

All my hopes for this new subscription model! The use of NFTs for posts will, without a doubt, ensure that quality writing remains forever in the Blockchain (it's like the Cloud, but with better structure). Typos included.

Is there a plan to invest in old posts' NFTs that will be minted from the archive? I figure Habryka already holds them all, and selling vintage Sequences NFT to the highest bidder could be a nice addition to LessWrong's finances (imagine the added value of having a complete set of posts!)

Also, in the event that this model doesn't pan out, will the exclusive posts be released for free? It would be an excruciating loss for the community to have those insights sealed off.

Comment by Gyrodiot on Babble Challenge: 50 Ways to Overcome Impostor Syndrome · 2021-03-20T13:06:28.275Z · LW · GW

My familiarity with the topic gives me enough confidence to join this challenge!

  1. Write down your own criticism so it no longer feels fresh
  2. Have your criticism read aloud to you by someone else
  3. Argue back to this criticism
  4. Write down your counter-arguments so they stick
  5. Document your own progress
  6. Get testimonials and references even when you don't "need" them
  7. Praise the competence of other people without adding self-deprecation
  8. Same as above but in their vicinity so they'll feel compelled to praise you back
  9. Teach the basics of your field to newcomers
  10. Teach the basics of your field to experts from other fields
  11. Write down the basics of your field, for yourself
  12. Ask someone else to make your beverage of choice
  13. Ask them to tell you "you deserve it" when they're giving it to you
  14. If your instinct is to reply "no I don't", consider swapping the roles
  15. Drink your beverage, because it feels nice
  16. Build stuff that cannot possibly be built by chance alone
  17. Stare outside the window, wondering if anybody cares about you
  18. Consider a world where everyone is as insecure as you
  19. Ask friends about their insecurities
  20. Consider you're too stupid to drink a glass of water, then drink some water
  21. Meditate on the difference between map and territory
  22. Write instructions for the non-impostor version of you
  23. Write instructions for whoever replaces you when people find out you're an impostor
  24. Validate those instructions with other experts, passing it off as project planning
  25. Follow the instructions to keep the masquerade on
  26. Refine the instructions since they're "obviously" not perfect
  27. Publish the whole thing here, get loads of karma
  28. Document everything you don't know for reference
  29. Publish the thing as a list of open problems
  30. Criticize harshly other people's work to see how they take it
  31. Make amends by letting them criticize you
  32. Use all this bitterness to create a legendary academic rivalry
  33. Consider "impostor" as a cheap rhetorical attack that doesn't hold up
  34. Become very good at explaining why other people are better than you
  35. Publish the whole thing as in-depth reporting of the life of scientists
  36. Focus on your deadline, time doesn't care if you're an impostor or not
  37. Make yourself lunch, balance on one foot, solve a sudoku puzzle
  38. Meditate on the fact you actually can do several complex things well
  39. Consider that competence is not about knowing exactly how one does things
  40. Have motivational pictures near you and argue how they don't apply to you
  41. Consider the absurdity of arguing with pictures
  42. Do interesting things instead, not because you have to, but to evade the absurdity
  43. Practice the "I have no idea what I'm doing, but no one does" stance
  44. Ask people why they think they know how they do things
  45. If they start experimenting impostor syndrome as well, support them
  46. Join a club of impostors, to learn from better impostors than you
  47. Write an apology letter to everyone you think you've duped
  48. Simulate the outrage of anyone reading this letter
  49. Cut ties with everyone who would actually treat you badly after reading
  50. Sleep well, eat well, exercise, brush your teeth, take care of yourself
Comment by Gyrodiot on Google’s Ethical AI team and AI Safety · 2021-02-20T23:25:05.599Z · LW · GW

I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.

I would argue that the specific politics inherent in these events are exactly why I don't want to approach them. From the outside, the mix of corporate politics, reputation management, culture war (even the boring part), all of which belong in the giant near-opaque system that is Google, is a distraction from the underlying (indeed important) AI governance problems.

For that particular series of events, I already got all the governance-relevant information I needed from the paper that apparently made the dominoes fall. I don't want my attention to get caught in the whirlwind. It's too messy (and still is after months). It's too shiny. It's not tractable for me. It would be an opportunity cost. So I take a deep breath and avert my eyes.

Comment by Gyrodiot on Suggestions of posts on the AF to review · 2021-02-16T21:27:46.887Z · LW · GW

My gratitude for the already posted suggestions (keep them coming!) - I'm looking forward to work on the reviews. My personal motivation resonates a lot with the help people navigate the field part; in-depth reviews are a precious resource for this task.

Comment by Gyrodiot on some random parenting ideas · 2021-02-13T21:11:48.910Z · LW · GW

This is one of the rare times I can in good faith use the prefix "as a parent...", so thank you for the opportunity.

So, as a parent, lots of good ideas here. Some I couldn't implement in time, some that are very dependent on living conditions (finding space for the trampoline is a bit difficult at the moment), some that are nice reminders (swamp water, bad indeed), some that are too early (because they can't read yet)...

... but most importantly, some that genuinely blindsided me, because I found myself agreeing with them, and they were outside my thought process! The one-Brilliant-problem a day one, the let-them-eat-more-cookies, mainly.

I appreciate, in particular, the breadth of the ideas. Thanks for sharing, even if you don't practice what you preach, you'll be able to get feedback.

Comment by Gyrodiot on Last day of voting for the 2019 review! · 2021-01-25T23:51:23.624Z · LW · GW

After several nudges (which I'm grateful for, in hindsight), my votes are in.

Comment by Gyrodiot on Luna Lovegood and the Chamber of Secrets - Part 1 · 2020-11-26T10:49:04.274Z · LW · GW

This is very nice. I subscribed for the upcoming parts (there will be, I suppose?)

Comment by Gyrodiot on Learning from counterfactuals · 2020-11-26T10:37:38.293Z · LW · GW

I think not mixing up the referents is the hard part. One can properly learn from fictional territory when they can clearly see in which ways it's a good representation of reality, and where it's not.

I may learn from an action movie the value of grit and what it feels like to have principles, but I wouldn't trust them on gun safety or CPR.

It's not common for fiction to be self-consistent enough and preserve drama. Acceptable breaks from reality will happen, and sure, sometimes you may have a hard SF universe were the alternate reality is very lawful and the plot arises from the logical consequences of these laws (often happens in rationalfic), but more often than not things happen "because it serves the plot".

My point is, yes, I agree, one should be confused only by lack of self-consistency fiction or not. Yet, given the vast amount of fiction that is set in something close to real Earth, by the time you're skilled enough to tell apart what's transferable and what isn't, you've already done most of the learning.

Not counting the meta-skill of detecting inconsistencies, which is indeed extremely useful, for fiction or not, but I'm still unclear where exactly one learns it from.

Comment by Gyrodiot on Why those who care about catastrophic and existential risk should care about autonomous weapons · 2020-11-12T14:59:26.645Z · LW · GW

Thank you for this clear and well-argued piece.

From my reading, I consider three main features of AWSs in order to evaluate the risk they present:

  • arms race avoidance: I agree that the proliferation of AWSs is a good test bed for international coordination on safety, which extends to the widespread implementation of safe powerful AI systems in general. I'd say this extends to AGI, were we would need all (or at least the first, or only some, depending on takeoff speeds) such deployed systems to conform to safety standards.
  • leverage: I agree that AWSs would have much greater damage/casualties per cost, or per human operator. I have a question regarding persistent autonomous weapons which, much like landmines, do not require human operators at all once deployed: what, in that case, would be the limiting component of their operation? Ammo, energy supply?
  • value alignment: the relevance of this AI safety problem to the discussion would depend, in my opinion, on what exactly is included in the OODA loop of AWSs. Would weapon systems have ways to act in ways that enable their continued operation without frequent human input? Would they have other ways than weapons to influence their environment? If they don't, is the worst-case damage they can do capped at the destruction capabilities they have at launch?

I would be interested by a further investigation on the risk brought by various kinds of autonomy, expected time between human command and impact, etc.

Comment by Gyrodiot on What are Examples of Great Distillers? · 2020-11-12T14:25:34.317Z · LW · GW

To clarify the question, would a good distiller be one (or more) of:

  • a good textbook writer? or state-of-the-art review writer?
  • a good blog post writer on a particular academic topic?
  • a good science communicator or teacher, through books, videos, tweets, whatever?

Based on the level of articles in Distill I wouldn't expect producers of introductory material to fit your definition, but if advanced material counts, I'd nominate Adrian Colyer for Computer Science (I'll put this in a proper answer with extra names based on your reply).

Comment by Gyrodiot on The (Unofficial) Less Wrong Comment Challenge · 2020-11-11T20:15:35.926Z · LW · GW

I was indeed wondering about it as I just read your first comment :D

For extra convenience you could even comment again with your alt account (wait, which is the main? Which is the alt? Does it matter?)

Comment by Gyrodiot on The (Unofficial) Less Wrong Comment Challenge · 2020-11-11T20:12:39.809Z · LW · GW

The original comment seems to have been edited to a sharper statement (thanks, D0TheMath), I hope it's enough to clear up things.

I agree this qualifier pattern is harmful, in the context of collective action problems, when mutual trust and commitment has to be more firmly established. I don't believe we're in that context, hence my comment.

Comment by Gyrodiot on The (Unofficial) Less Wrong Comment Challenge · 2020-11-11T19:24:49.522Z · LW · GW

I interpret the quoted statement as "I am willing to make an effort that I don't usually do, by commenting more, based on your assessment of the importance of giving feedback", assuming good faith.

There's an uncertainty, of course, as whether it will actually turn out important. "I can try" suggests they will try even if they don't know, and we won't know if they will succeed until they try.

Yes, you can interpret the statement in an uncharitable way with respect to their goodwill, but this is not what is, in my opinion, conducive to healthy comment sections in general.

Comment by Gyrodiot on The (Unofficial) Less Wrong Comment Challenge · 2020-11-11T14:26:05.987Z · LW · GW

We discussed the topic of feedback with Adam. I approve of this challenge and will attempt to comment on at least half of all new posts from today to the end of November. Eventually renewing it if it works well.

I've been meaning to get out of mostly-lurking mode for months now, and this is as good of an opportunity as it gets.

I also want to mention the effect of "this comment could be a post", which can help people "upgrade" from commenting, to shortform or longform, if they feel (like me), that there's some quality bar to clear to feel comfortable posting more and get your ideas out there (hello, self-confidence issues).

You won't get feedback if you don't post somewhere anyways, and that could start with comments!

Comment by Gyrodiot on Why I’m Writing A Book · 2020-11-11T11:05:51.077Z · LW · GW

I have to admit having read some of your essays, found them very interesting, and yet found the prospect of diving into the rest daunting enough to put the idea somewhere on my to-read pile.

I applaud your book writing and will gladly read the final version, as I'll perceive it at a more coherent chunk of content to go through, instead of a collection of posts, even if the quality of the writing is high for both. The medium itself, to me, has its importance.

It's also easier to recommend « this excellent book by Samo Burja » than « this excellent collection of 10/20/50+ pieces by Samo Burja ».

(Awkward sidenote: I wish I could enthusiastically say I will read your draft and give you feedback, but I can't promise much on that front, my apologies)

Comment by Gyrodiot on The Wiki is Dead, Long Live the Wiki! [help wanted] · 2020-09-14T18:57:02.769Z · LW · GW

Thank you for the import.

Once again, the Progress Bar shall advance. It will probably be slower this time. No matter: I shall contribute.