Question: MIRI Corrigbility Agenda 2019-03-13T19:38:05.729Z · score: 16 (7 votes)


Comment by algon33 on Numeracy neglect - A personal postmortem · 2020-09-29T17:56:17.394Z · score: 1 (1 votes) · LW · GW

Not really? The axioms (for hyperreals) aren't much different to that of the reals. Yes, its true that you need some strange constructions to justify that the algebra works as you'd expect. But many proofs in analysis become intuitive with this setup, and surely aid pedagogy. Admittedly, you need to learn standard analysis anyway since the tools are used in so many more areas. But I'd hardly call it ugly. 

Comment by algon33 on If there were an interactive software teaching Yudkowskian rationality, what concepts would you want to see it teach? · 2020-09-04T17:06:57.375Z · score: 1 (1 votes) · LW · GW

Recall that memories are pathway dependant i.e. you can remember an "idea" when given verbal cues but not visual ones. Or given cues in the form of "complete this sentence " and "complete this totally different sentence expressing the same concept". If you memorise a sentence and can recall it any relevant context, I'd say you've basically learnt it. But just putting it into SRS on its own won't do that. Like, that's why supermemo has such a long list of rules and heuristics on how to use SRS effectively.

Comment by algon33 on [AN #115]: AI safety research problems in the AI-GA framework · 2020-09-04T12:12:43.441Z · score: 1 (1 votes) · LW · GW

Thanks. Two questions:

Do the staff and faculty have a similair diversity of opinions?

Is messaging in orde to contact your peers the right procedure here?

Comment by algon33 on If there were an interactive software teaching Yudkowskian rationality, what concepts would you want to see it teach? · 2020-09-03T21:20:35.481Z · score: 3 (3 votes) · LW · GW

Here's the re-written version, and thanks for the feedback.

Having an Anki deck is kind of useless in my view. When you encounter a new idea, you need to integrate it with the rest of your thoughts. For a technique, you must integrate it with your unconscious. But often, there's a tendency to just go "oh, that's useful" and do nothing with it. Putting it into space repitition software to view later won't accomplish anything since you're basically memorising the teacher's password. Now suppose you take the idea, think about it for a bit, and maybe put it into your own words. Better both in terms of results and using Anki as you're supposed to.

But there are two issues here. One, you haven't integrated the Anki cards with the rest of your thoughts. Two, Anki is not designed such that the act of integrating is the natural thing to do. Just memorising it is the path of least resistance, which a person with poor instrumental rationality will take. So the problem with using Anki for proper learning is that you are trying to teach insturmental rationality via a method that requires instrumental rationality. Note its even worse teaching good research and creative habits, which requires yet more instrumental rationality. No, you need a system which fundamentally encourages those good habits. Incremental reading is a litttle better, if you already have good reading habits which you can use to bootstrap your way to other forms of instrumental rationality.

Now go to pargraph two of the original comment.

P.S. Just be thankful you didn't read the first draft.

Comment by algon33 on If there were an interactive software teaching Yudkowskian rationality, what concepts would you want to see it teach? · 2020-09-03T18:36:46.053Z · score: 2 (2 votes) · LW · GW

Having an Anki deck is kind of useless in my view as engaging with the ideas is not the path of least resistance. There's a tendency to just go "oh, that's useful" and do nothing with it because Anki/Supermemo are about memorisation. Using them for learning, or creating, is possible with the right mental habits. But for an irrational person, that's exactly what you want to instill! No, you need a system which fundamentally encourages those good habits.

Which is why I'm bearish about including cards that tell you to drill certain topis into Anki since the act of drilling is itself a good mental habit that many lack. Something like a curated selection of problems that require a certain aspect of rationality, spaced out to aid retention would be a good start. But

Unfortunately, there's a trade off between making the drills thorough and reducing overhead on the designer's part. If you're thinking about an empircally excellent, "no cut corners" implementation of teaching total newbs mental models, I'd suggest DARPA's Digital Tutor. As to how you'd replicate such a thing, the field of research described in here seems a good place to start.

Comment by algon33 on [AN #115]: AI safety research problems in the AI-GA framework · 2020-09-03T17:36:54.709Z · score: 1 (1 votes) · LW · GW

Active IRD doesn't have anything to do with corrigibility, I guess my mind just switched off when I was writing that. Anyway, how diverse are CHAI's views on corrigibility? Could you tell me who I should talk to? Because I've already read all the published stuff on it if I'm understanding you rightly and I want to make sure that all the perspectives no this topic are covered.

Comment by algon33 on [AN #115]: AI safety research problems in the AI-GA framework · 2020-09-03T15:06:52.219Z · score: 1 (1 votes) · LW · GW

Hey Rohin, I'm writing a review on everything that' been written on corrigibility so far. Do the "the off switch game", "Active Inverse Reward Design" "should robots be obedient", "incorrigibility in CIRL" as well as your reply in the Newsletter represent CHAI's current views on the subject? If not, which papers contain them?

Comment by algon33 on interpreting GPT: the logit lens · 2020-09-01T20:19:30.416Z · score: 2 (2 votes) · LW · GW
IIRC, this also shows a discontinuous flip at the bottom followed by slower change.

Maybe edit the post so you include this? I know I was wondering about this too.

Comment by algon33 on Zibbaldone With It All · 2020-08-31T18:18:09.689Z · score: 1 (1 votes) · LW · GW

Before or after what? If it is a passage in a book, or an article you wrote, I agree that's enough. But what about a nebulous concept you struggled to put into words? Or an idea which seemed to have suprising links to other thoughts, which you didn't pursue at the time. If you write all this stuff down explicitly, then fine. If not, and you're writing style is like mine, then it seems better to link to other cards and leave it to your future self to figure it out.

Plus, links provide the system extra information with which it can auto-suggest other relevant ideas that you weren't even aware you were considering.

Comment by algon33 on Zibbaldone With It All · 2020-08-31T17:14:18.222Z · score: 1 (1 votes) · LW · GW

I started writing a blog post in response, but that seems a bit much for a comment. Suffice to say, I agree that anti-spaced repetition is a good idea. However, it throws away the context of the notes you made, as well as showing it to you after your mind has totally forgotten about it. And as I wrote, those seem to be major factors in the value of the Zettlekasten method!

Comment by algon33 on Zibbaldone With It All · 2020-08-31T17:08:23.792Z · score: 1 (1 votes) · LW · GW

Yeah, I had some ideas concerning how to keep track of Zettlekasten as well as the right way to display graphs. Reinforcing the network is definetely a worthwhile idea. The entire point is to suggest good links, but also give you the freedom to traverse your graph. RE the hyperlinks: I agree about the worry of biases. But more than that, it seems the network should not automate link suggestion without leaving the option to create links yourself. As you say, the worth of the Zettlekasten method is largely in instilling virtuous mental hanits. What you suggested seems like it could instil laziness in the user.

Comment by algon33 on Zibbaldone With It All · 2020-08-31T17:01:02.250Z · score: 1 (1 votes) · LW · GW

Do you do this in a piecemeal way, or do you assign a few days to re-organising your thoughts when you learn some important new principle?

Comment by algon33 on Zibbaldone With It All · 2020-08-28T14:18:19.798Z · score: 6 (4 votes) · LW · GW

Epistemic status: unsure

I have a hypothesis about why Zettlekasten provide diminishing returns over time. A corrolary is that others should find even less value in your Zettles. Which ties into some of your points, and shows what is missing from the Zibbaldone. Plus there are some suggestions on how to correct the flaw.

One of the key benefits to the Zettlekasten is that the way you link cards reflects your psyche's understanding of the ideas. Of course other note-taking systems have this advantage. But this isn't baked into them like it is with Zettlekasten.

Traversing the Zettlekasten lets you approximate your past state of mind when working on a problem. Which lets you dredge up whatever your subconscious has come up with on the topic. The seemingly random orgranisations of yesterdays Zettles helps this along a little by providing a glimpse into yesterdays self. So when you wake up in the morning and look over yesterdays cards, a flood of relevant houghts arises. When considering where to place them, you glance through your Zettlekasten. Zettles your mind was working on bring new thoughts to the forefront. Often it will feel trivial to combine and play with all these new ideas, generating even more thoughts.

Unfortunately, past a certain size your Zettlekasten contains too many cards for your brain to be processing at once and too many potential states of mind you could have been in. We expect a gradual reduction in value of the system. And less value to others, who have a different understanding of the ideas. Something similair is true for other note taking systems. There's just a steeper decline in value.

How does this square with peoples' reports that between different note taking systems provides the same early returns they got with the old one? My guess is that the reduced scope of your notes and the context shift is what does it. Your brain realises it no longer has to keep track of all those ideas and can focus on a few relatively simple ones. Which makes the low hanging fruit all the easier to grab.

Can we fix these issues? Maybe. And I think you'd have to go digital to do it. Consider spaced repetition. A memory's strength decays exponentially with time. When reminded of them, your memory is strengthened. Spaced repition takes advantage of the fact that there are optimal timings to strengthen the memory. By analogy, we might say there are optimal times to dredge up ideas from your mind. And likewise, there may be optimal timings to link zettles. Perhaps these timings depend on the "distance" between the zettles.

A digital system could provide all this. A useful format would be the zettle to consider, and a graph surrounding it of the zettles you should link it to. When moving to adjacent zettles, you can see all the zettles it links to in order to provide relevant context.

Comment by algon33 on Do you vote based on what you think total karma should be? · 2020-08-25T20:31:48.565Z · score: 5 (3 votes) · LW · GW

When I bother to vote, I do take TK into account when upvoting. Karma serves a signalling purpose. But only when abs(TK) is large. If I see a post with +50 karma, I would have quite high expectations of it. If it exceeds that expectation, and I remember voting is a thing, I will upvote it. Since I almost never downvote, I can't say how much TK affects that.

Comment by algon33 on The two-layer model of human values, and problems with synthesizing preferences · 2020-08-25T19:26:55.153Z · score: 1 (1 votes) · LW · GW

How does this relate to the whole "no-self" thing? Is the character becoming aware of the player there?

Comment by algon33 on Epistemic Comparison: First Principles Land vs. Mimesis Land · 2020-08-22T16:30:51.686Z · score: 9 (3 votes) · LW · GW
If there is a mistake deep in the belief of someone

Are they not ideal Bayesians? Also, do they update based off other people's priors? It could be intresting to make them all ultra-finitists.

Mimemis land is confusing from the outside. I'm not sure how they could avoid stumbling upon "correct" forms of manipulating beliefs, if they persist for long enough and there are large enough stochastic shocks to the communities beliefs. If they also copid succesful people in the past, I feel like this would be even more likely. Unless they happen to be the equivalent of chinese rooms: just an archive of if else clauses.

Anyway, thank you for introducing this delightful style of thought experiments.

Comment by algon33 on Question: MIRI Corrigbility Agenda · 2020-08-20T16:34:36.823Z · score: 1 (1 votes) · LW · GW
Dutch custom prevents me from recommending my own recent paper in any case

This phrase and its implications are perfect examples of problems in corrigibility. Was that intentional? If so, bravo. Your paper looks interesting, but I think I'll read the blog post first. I want a break from reading heavy papers. I wonder if the researchers would be OK with my drawing on their blog posts in the review. Would you mind?

Thanks for recommending "Reward tampering", it is much appreciated. I'll get on it after synthesising what I've read so far. Otherwise, I don't think I'll learn much.

Comment by algon33 on Question: MIRI Corrigbility Agenda · 2020-08-20T14:39:46.996Z · score: 1 (1 votes) · LW · GW

Hey, thanks for writing all of that. My current goal is to do an up to date literature review on corrigibility, so that was a most helpful comment. I'll definitely look over your blog, since some of these papers are quite dense. Out of the paper's you recommended, is there one that stands out? Bear in mind that I've read Stewart and MIRI's papers already.

Comment by algon33 on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-08-18T23:52:26.041Z · score: 1 (1 votes) · LW · GW

Fair enough. Thanks for the recommendations. :)

Comment by algon33 on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-08-17T17:54:20.927Z · score: 1 (1 votes) · LW · GW

This post deserves a strong upvote. Since you've done the review, would you mind answering a reference request? What papers/blog posts represent Paul's current views on corrigibility?

Comment by algon33 on Alignment By Default · 2020-08-15T00:15:42.848Z · score: 3 (2 votes) · LW · GW

Based off what you've said in the comments, I'm guessing you'd say the various forms of corrigibility are natural abstractions. Would you say we can use the strategy you outline here to get "corrigibility by default"?

Regarding iterations, the common objection is that we're introducing optimisation pressure. So we should expect the usual alignment issues anyway. Under your theory, is this not an issue because of the sparsity of natural abstractions near human values?

Comment by algon33 on Alignment By Default · 2020-08-13T12:57:23.027Z · score: 7 (4 votes) · LW · GW

This came out of the discussion you had with John Maxwell, right? Does he think this is a good presentation of his proposal?

How do we know that the unsupervised learner won't have learnt a large number of other embeddings closer to the proxy? If it has, then why should we expect human values to do well?

Some rough thoughts on the data type issue. Depending on what types the unsupervised learner provides the supervised, it may not be able to reach the proxy type by virtue of issues with NN learning processes.

Recall that tata types can be viewed as homotopic spaces, and construction of types can be viewed as generating new spaces off the old e.g. tangent spaces or path spaces etc. We can view neural nets as a type corresponding to a particular homotopic space. But getting neural nets to learn certain functions is hard. For example, learning a function which is 0 except in two sub spaces A and B. It has different values on A and B. But A and B are shaped like intelocked rings. In other words, a non-linear classification problem. So plausibly, neural nets have trouble constructing certain types from others. Maybe this depends on architecture or learning algorithm, maybe not.

If the proxy and human values have very different types, it may be the case that the supervised learner won't be able to get from one type to another. Supposing the unsupervised learner presents it with types "reachable" from human values, then the proxy which optimises performance on the data set is just unavailable to the system even though its relatively simple in comparison.

Because of this, checking which simple homotopies neural nets can move between would be useful. Depending on the results, we could use this as an arguement that unsupervised NNs will never embed the human values type because we've found out it has some simple properties it won't be able to construct de novo. Unless we do something like feed the unsupervised learner human biases/start with an EM and modify it.

Comment by algon33 on "Go west, young man!" - Preferences in (imperfect) maps · 2020-08-04T10:49:08.624Z · score: 1 (1 votes) · LW · GW

Alright, here's the link for Friday:

Thanks for replying.

Comment by algon33 on "Go west, young man!" - Preferences in (imperfect) maps · 2020-08-01T11:43:24.157Z · score: 1 (1 votes) · LW · GW

Hangouts I suppose. It just works. Would next weekend be OK for you?

Edit: I've scheduled a meeting for 12pm UK time on Saturday. Tell me if that works for you.

Comment by algon33 on "Go west, young man!" - Preferences in (imperfect) maps · 2020-07-31T20:22:48.014Z · score: 3 (2 votes) · LW · GW

Sometimes the cluster in the map a preference is pointing at involves another preference. Which provides a natural resolution mechanism. What happens when there's two preferences, I'm unsure. I suppose it depends on how your map changes. In which case, I think you should focus on how to make purity coherent you should start off with some "simple" map and various "simple" changes in the map. To make purity coherent relative to your map is both computationally hard, and empathetically hard.

Side-note: It would be interesting to see which resolution mechanisms produce the most varied shifts in preferences for boundedly rational agents with complex utility functions.

Side-note^2: Stuart, I'm writing a review of all the work done on corrigibility. Would you mind if I asked you some questions on your contributions?

Comment by algon33 on Godel in second-order logic? · 2020-07-26T23:03:55.028Z · score: 3 (2 votes) · LW · GW

Second order logic can also arithmatise sentences, and also has fixed points. So the usual proofs carry over about the 1st incompleteness theorem. But there's an easier way to see this. There can't be any computable procedure to check if a second order sentence is valid or not, because if there was we could check if PA->Theorem and therefore decide Peano Arithmetic and therefore the Halting problem.

Comment by algon33 on Using books to prime behavior · 2020-07-25T12:25:30.529Z · score: 6 (5 votes) · LW · GW

You can use them for practicing techniques. Have cards which say: use X technique today. You need to actually do that rather than spend 1 minute thinking about it. Which is suprisingly hard. I suspect it works much better if you have some system to guide you in generating new ideas e.g. Zettlekasten. I suspect it could be even better if the method was incorporated into the software itself. Maybe create links between cards as well, and have some repititions where you explore the graph surrounding a card?

I'm also unsure if the spaced repition timings are optimal for drilling techniques. Does anyone know the relevant literature?