Insights from "All of Statistics": Statistical Inference 2021-04-08T17:49:16.270Z
Insights from "All of Statistics": Probability 2021-04-08T17:48:10.972Z
FC final: Can Factored Cognition schemes scale? 2021-01-24T22:18:55.892Z
Three types of Evidence 2021-01-19T17:25:20.605Z
Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) 2020-12-29T19:48:04.435Z
Intuition 2020-12-20T21:49:29.947Z
Clarifying Factored Cognition 2020-12-13T20:02:38.100Z
Traversing a Cognition Space 2020-12-07T18:32:21.070Z
Idealized Factored Cognition 2020-11-30T18:49:47.034Z
Preface to the Sequence on Factored Cognition 2020-11-30T18:49:26.171Z
Hiding Complexity 2020-11-20T16:35:25.498Z
A guide to Iterated Amplification & Debate 2020-11-15T17:14:55.175Z
Information Charts 2020-11-13T16:12:27.969Z
Do you vote based on what you think total karma should be? 2020-08-24T13:37:52.987Z
Existential Risk is a single category 2020-08-09T17:47:08.452Z
Inner Alignment: Explain like I'm 12 Edition 2020-08-01T15:24:33.799Z
Rafael Harth's Shortform 2020-07-22T12:58:12.316Z
The "AI Dungeons" Dragon Model is heavily path dependent (testing GPT-3 on ethics) 2020-07-21T12:14:32.824Z
UML IV: Linear Predictors 2020-07-08T19:06:05.269Z
How to evaluate (50%) predictions 2020-04-10T17:12:02.867Z
UML final 2020-03-08T20:43:58.897Z
UML XIII: Online Learning and Clustering 2020-03-01T18:32:03.584Z
What to make of Aubrey de Grey's prediction? 2020-02-28T19:25:18.027Z
UML XII: Dimensionality Reduction 2020-02-23T19:44:23.956Z
UML XI: Nearest Neighbor Schemes 2020-02-16T20:30:14.112Z
A Simple Introduction to Neural Networks 2020-02-09T22:02:38.940Z
UML IX: Kernels and Boosting 2020-02-02T21:51:25.114Z
UML VIII: Linear Predictors (2) 2020-01-26T20:09:28.305Z
UML VII: Meta-Learning 2020-01-19T18:23:09.689Z
UML VI: Stochastic Gradient Descent 2020-01-12T21:59:25.606Z
UML V: Convex Learning Problems 2020-01-05T19:47:44.265Z
Excitement vs childishness 2020-01-03T13:47:44.964Z
Understanding Machine Learning (III) 2019-12-25T18:55:55.715Z
Understanding Machine Learning (II) 2019-12-22T18:28:07.158Z
Understanding Machine Learning (I) 2019-12-20T18:22:53.505Z
Insights from the randomness/ignorance model are genuine 2019-11-13T16:18:55.544Z
The randomness/ignorance model solves many anthropic problems 2019-11-11T17:02:33.496Z
Reference Classes for Randomness 2019-11-09T14:41:04.157Z
Randomness vs. Ignorance 2019-11-07T18:51:55.706Z
We tend to forget complicated things 2019-10-20T20:05:28.325Z
Insights from Linear Algebra Done Right 2019-07-13T18:24:50.753Z
Insights from Munkres' Topology 2019-03-17T16:52:46.256Z
Signaling-based observations of (other) students 2018-05-27T18:12:07.066Z
A possible solution to the Fermi Paradox 2018-05-05T14:56:03.143Z
The master skill of matching map and territory 2018-03-27T12:06:53.377Z
Intuition should be applied at the lowest possible level 2018-02-27T22:58:42.000Z
Consider Reconsidering Pascal's Mugging 2018-01-03T00:03:32.358Z


Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-09-19T15:42:23.360Z · LW · GW

Keeping stock of and communicating what you haven't understood is an underrated skill/habit. It's very annoying to talk to someone and think they've understood something, only to realize much later that they haven't. It also makes conversations much less productive.

It's probably more of a habit than a skill. There certainly are some contexts where the right thing to do is pretend that you've understood everything even though you haven't. But on net, people do it way too much, and I'm not sure to what extent they're fooling themselves.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-09-04T17:17:48.389Z · LW · GW

Is that really a relevant phenomenon? Many of the beliefs I was thinking about (say your opinion on immigration) don't affect real life choices at all, or at least not in a way that provides feedback on whether the belief was true.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-09-04T17:14:46.653Z · LW · GW

Yeah, and this may get at another reason why the proposal doesn't seem right to me. There's no doubt that most people would be better calibrated if they adopted it, but 52% and 48% are the same for the average person, so it's completely impractical.

If anything, the proposal should be 'if you don't think you're particularly smart, your position on almost every controversial topic should be "I have no idea"'. Which still might not be good advice because there is disproportionate overlap between the set of people likely to take the advice and the set of people for whom it doesn't apply.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-09-04T17:03:38.151Z · LW · GW

Thanks for making that question explicit! That's not my position at all. I think many people who read Inadequate Equilibria are, in fact, among the top 0.1% of people when it comes to forming accurate beliefs. (If you buy into the rationality project at all, then this is much easier than being among the 0.1% most intelligent people.) As such, they can outperform most people and be justified in having reasonably confident beliefs.

This is also how I remember EY's argument. He was saying that we shouldn't apply modesty --because-- it is possible to know better than the vast majority of people.

A very relevant observation here is that there is real convergence happening among those people. If I take the set of my ~8 favorite public intellectuals, they tend to agree with close to zero exceptions on many of [the issues that I consider not that hard even though tons of people disagree about them]. Even among LW surveys, we had answers that are very different from the population mean.

Anyway, I don't think this is in any conflict with my original point. If you ask the average person with super confident beliefs, I'm pretty sure they are not likely to have an explicit belief of being among the top 0.1% when it comes to forming accurate beliefs (and of course, they aren't), and there's your inconsistency.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-09-03T19:13:46.125Z · LW · GW

Super unoriginal observation, but I've only now found a concise way of putting this:

What's weird about the vast majority of people is that they (a) would never claim to be among the 0.1% smartest people of the world, but (b) behave as though they are among the best 0.1% of the world when it comes to forming accurate beliefs, as expressed by their confidence in their beliefs. (Since otherwise being highly confident in something that lots of smart people disagree with is illogical.)

Someone (Tyler Cowen?) said that most people ought assign much lower confidences to their beliefs, like 52% instead of 99% or whatever. While this is upstream of the same observation, it has never sat right with me. I think it's because I wouldn't diagnoze the problem as overconfidence but as [not realizing or ignoring] the implication I'm confident I must be way better than almost everyone else at this process.

Comment by Rafael Harth (sil-ver) on Flirting with postmodernism · 2021-08-29T19:29:55.925Z · LW · GW

I don't have a good answer for how people do this, at least not without thinking about it for a while. But I think I know several people who are quite good at in practice, so I dispute that 'you can give a philosophical explanation of how this works' is a relevant rebuttal.

Comment by Rafael Harth (sil-ver) on Flirting with postmodernism · 2021-08-28T20:00:00.922Z · LW · GW

Instead of denying objective truth, can't you get the same benefit by making your claims precise enough that they only talk about one system?

Comment by Rafael Harth (sil-ver) on Open and Welcome Thread – August 2021 · 2021-08-24T18:08:23.049Z · LW · GW

I mean something like, "a result that would constitute a sizeable Bayesian update to a perfectly rational but uninformed agent". Think of someone who has never heard much about those vaccine thingies going from 50/50 to 75/25, that range.

Comment by Rafael Harth (sil-ver) on Open and Welcome Thread – August 2021 · 2021-08-23T21:12:32.262Z · LW · GW

Our world in data offers a free download of their big Covid-19 dataset. It's got data on lots of things including cases, deaths, and vaccines (full list of columns here), and all that by country and date -- i.e., each row corresponds to one (country,date) pair with date date ranging from 2020-02-24 to 2021-08-20 for each country, stepsize one day.

Is there any not-ultra-complicated way to demonstrate vaccine effectiveness from this dataset? I.e., is there any way to measure the effect such that you would be confident predicting the direction ahead of time? (E.g., something like, for date Z, plot all countries by and and measure the correlation, but you can make it reasonably more complicated than this by controlling for a hand full of variables or something.)

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-08-16T16:10:47.257Z · LW · GW

I was thinking about Java and Python. The fact that you can just use lambdas first occurred to me at some point in between writing this and seeing your answer. I don't know why it wasn't obvious.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-08-15T22:00:43.367Z · LW · GW

Is there a reason why most languages don't have ada's hierarchical functions? Making a function only visible inside of another function is something I want to do all the time but can't.

Comment by Rafael Harth (sil-ver) on How should my timelines influence my career choice? · 2021-08-03T18:42:52.905Z · LW · GW

Something I've been wondering while reading this is to what extent it makes sense to privilege longer timelines not based on their probability of being correct but on the expected difference you can make. I.e., you may take the pessimistic view that in a 15-year-to-AGI-world, the project of achieving alignment via the standard path is so doomed that your impact is dominated by worlds with longer timelines, even if they are less probable.

(I'm absolutely not suggesting that you shouldn't become an engineer, this really is just a thought.)

Comment by Rafael Harth (sil-ver) on Is the argument that AI is an xrisk valid? · 2021-07-21T16:06:24.210Z · LW · GW

I think that's a decent argument about what models we should build, but not an argument that AI isn't dangerous.

Comment by Rafael Harth (sil-ver) on Is the argument that AI is an xrisk valid? · 2021-07-21T14:52:45.237Z · LW · GW

No necessarily, since AIs can be WBEs or otherwise anthropomorphic. An AI with an explicitly coded goal is possible , but not the only kind.

While I think this is 100% true, it's somewhat misleading as a counter-argument. The single-goal architecture of one model of AI that we understand, and a lot of arguments focus on how that goes wrong. You can certainly build a different AI, but that comes at the price of opening yourself up to a whole different set of failure modes. And (as far as I can see), it's also not what the literature is up to right now.

Comment by Rafael Harth (sil-ver) on Is the argument that AI is an xrisk valid? · 2021-07-19T20:14:37.830Z · LW · GW

Reading this, I feel somewhat obligated to provide a different take. I am very much a moral realist, and my story for why the quoted passage isn't a good argument is very different from yours. I guess I mostly want to object to the idea that [believing AI is dangerous] is predicated on moral relativism.

Here is my take. I dispute the premise:

In the proposed picture of singularity claim & orthogonality thesis, some thoughts are supposed to be accessible to the system, but others are not. For example:

I'll grant that most of the items on the inaccessible list are, in fact, probably accessible to an ASI, but this doesn't violate the orthogonality thesis. The Orthogonality thesis states that a system can have any combination of intelligence and goals, not that it can have any combination of intelligence and beliefs about ethics.

Thus, let's grant that an AI with a paperclip-like utility function can figure out #6-#10. So what? How is [knowing that creating paperclips is morally wrong] going to make it behave differently?

You (meaning the author of the paper) may now object that we could program an AI to do what is morally right. I agree that this is possible. However:

(1) I am virtually certain that any configuration of maximal utility doesn't include humans, so this does nothing to alleviate x-risks. Also, even if you subscribe to this goal, the political problem (i.e., convincing AI people to implement it) sounds impossible.

(2) We don't know how to formalize 'do what is morally right'.

(3) If you do black box search for a model that optimizes for what is morally right, this still leaves you with the entire inner alignment problem, which is arguably the hardest part of the alignment problem anyway.

Unlike you (now meaning Steve), I wouldn't even claim that letting an AI figure out moral truths is a bad approach, but it certainly doesn't solve the problem outright.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-07-19T12:58:49.780Z · LW · GW

Instead of explaining something to a rubber duck, why not explain it via an extensive comment? Maybe this isn't practical for projects with multiple people, but if it's personal code, writing it down seems better as a way to force rigor from yourself, and it's an investment into a possible future in which you have to understand the code once again.

Comment by Rafael Harth (sil-ver) on How to Sleep Better · 2021-07-17T07:44:06.247Z · LW · GW

Yeah, I use wax earplugs every night. The downside is that they're gross and they sometimes cause itches, but they always go away if I scratch a bit. The upside is that you cut out a large portion of unexpected sounds, which to me way outweighs the problems.

Comment by Rafael Harth (sil-ver) on Chess and cheap ways to check day to day variance in cognition · 2021-07-07T22:24:01.708Z · LW · GW

I don't think this is in conflict with what I said. Blundering is itself a matter of luck. You can have both players play sufficiently reckless that they risk 1-move blunders, but then one person gets lucky that they've never made one. I assume you're familiar with the situation where you make a move that's only not a blunder due to where is something you didn't think about while making the move.

Also, I don't know how literal you meant your post, but I don't think it's true that you can get there by 'just' avoiding one-move blunders. If your positional play is sufficiently worse than your opponent's, you should lose even if your opponent blunders a minor piece at some point and you don't. I think it's more like, most people around 1500 are somewhat even in positional skill, and you can separate yourself from them by avoiding blunders.

(Out of curiosity (or perhaps because I'll challenge you), what time format are you 1550 in?)

Comment by Rafael Harth (sil-ver) on Chess and cheap ways to check day to day variance in cognition · 2021-07-07T19:28:08.908Z · LW · GW

In addition to variance of opponents' skill, there is also a substantial luck factor. Any search algorithm that can't search the entire tree has uncertainty about how good moves are, which effectively translates into luck.

I think there's a good chance your observations are still valid (also because you have intuition about how well you play on top of results), but it is a possible factor.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-07-05T11:08:35.738Z · LW · GW

The IQ objection is a really good one that hasn't occurred to me at all. Although I'd have estimated less than half as large of a difference.

On maintaining order, it's worth pointing out that insofar as this is the relative strength of the highschool teacher, it probably doesn't have much to do with what the teacher learned from the literature.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-07-03T19:50:52.671Z · LW · GW

I did mean both. Comparing just tutoring to just regular school would be pretty unfair.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-07-03T19:20:56.607Z · LW · GW

Do you mean that smart untrained people would teach an average high school class better than a trained teacher?


"the same" in math or physics is about learning the topic, or learning to teach the topic.

It's mostly like applying the knowledge somewhere. Suppose you have to solve a real problem that requires knowing physics.

Of course you can also read the literature, but my post was about when it's possible to do better without having done so.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-07-03T11:21:47.304Z · LW · GW

It seems to me that many smart people could ignore the existing literature on pedagogy entirely and outperform most people who have obtained a formal degree in the area (like highschool teachers), just by relying on their personal models. Conversely, I'd wager that no-one could do the same in physics, and (depending on how 'outperforming' is measured) no-one or almost no-one could do it in math.

I would assume most people on this site have thought about this kind of stuff, but I don't recall seeing many posts about it, and I don't anyone sharing their estimates for where different fields place on this spectrum.

There is some discussion for specific cases like prediction markets, covid models, and economics. And now that I'm writing this, I guess Inadequate Equilibria is a lot about answering this question, but it's only about the abstract level, i.e., how do you judge the competence of a field, not about concrete results. Which I'll totally grant is the more important part, but I still feel like comparing rankings of fields on this spectrum could be valuable (and certainly interesting).

Comment by Rafael Harth (sil-ver) on Irrational Modesty · 2021-06-23T14:01:32.263Z · LW · GW

And why would you refuse to seek justified status?

One reason would be that seeking status will lead to you having less of it, which I strongly think is true insofar as 'seeking' means 'having it as the driving motivation'. Think about how many of the high-status people in the rationalist sphere are relatively status-blind. If we draw from fiction, note that this is true for Harry in hpmor, too.

It's also not always true that high status people are worth imitiating or listening to, but I would agree if you just meant on average.

Comment by Rafael Harth (sil-ver) on Less Wrong needs footnotes · 2021-06-22T12:27:08.155Z · LW · GW

You can do this if you write with the markdown editor.[1] You can activate it in the user settings.

  1. Like this! ↩︎

Comment by Rafael Harth (sil-ver) on Covid vaccine safety: how correct are these allegations? · 2021-06-19T11:28:27.703Z · LW · GW

This is the video, right? You could link to that instead of the removed youtube link.

Comment by Rafael Harth (sil-ver) on Covid vaccine safety: how correct are these allegations? · 2021-06-19T11:22:36.641Z · LW · GW

This is relevant because I read that "FDA requires healthcare providers to report any death after COVID-19 vaccination to VAERS"

How does this square with OpenVaers's claim that only about 1% of injuries are reported?

Without knowing the reporting rate, it's difficult to interpret the data. If we take the 1% and 5869 numbers at face value, it implies that the vaccines killed about 560.000 people, whereas if we assume 100% reporting rate, it looks like they're an amazing preventer of unrelated causes of death. Is there any reasonable way to estimate what % to use?

Comment by Rafael Harth (sil-ver) on Open problem: how can we quantify player alignment in 2x2 normal-form games? · 2021-06-18T08:41:22.202Z · LW · GW

I'll take a shot at this. Let and be the sets of actions of Alice and Bob. Let (where 'n' means 'nice') be function that orders by how good the choices are for Alice, assuming that Alice gets to choose second. Similarly, let (where 's' means 'selfish') be the function that orders by how good the choices are for Bob, assuming that Alice gets to choose second. Choose some function measuring similarity between two orderings of a finite set (should range over ); the alignment of with is then .

Example: in the prisoner's dilemma, , and orders whereas orders . Hence should be , i.e., Bob is maximally unaligned with Alice. Note that this makes it different from Mykhailo's answer which gives alignment , i.e., medium aligned rather than maximally unaligned.

This seems like an improvement over correlation since it's not symmetrical. In the game where Alice and Bob both get to choose numbers and Alice's utility function outputs whereas Bob's outputs , Bob would be perfectly aligned with Alice (his and both order ) but Alice perfectly unaligned with Bob (her orders but her orders ).

I believe this metric meets criteria 1,3,4 you listed. It could be changed to be sensitive to players' decision theories by changing (for alignment from Bob to Alice) to be the order output by Bob's decision theory, but I think that would be a mistake. Suppose I build an AI that is more powerful than myself, and the game is such that we can both decide to steal some of the other's stuff. If the AI does this, it leads to -10 utils for me and +2 for it (otherwise 0/0); if I do it, it leads to -100 utils for me because the AI kills me in response (otherwise 0/0). This game is trivial: the AI will take my stuff and I'll do nothing. Also, the AI is maximally unaligned with me. Now suppose I become as powerful as the AI and my 'take AI's stuff' becomes -10 for AI, +2 for me. This makes the game a prisoner's dilemma. If we both run UDT or FDT, we would now cooperate. If is the ordering of the AI's decision theory, this would mean the AI is now aligned with me, which is odd since the only thing that changed is me getting more powerful. With the original proposal, the AI is still maximally unaligned with me. More abstractly, game theory assumes your actions have influence on the other player's rewards (else the game is trivial), so if you cooperate for game-theoretical reasons, this doesn't seem to capture what we mean by alignment.

Comment by Rafael Harth (sil-ver) on The dumbest kid in the world (joke) · 2021-06-07T19:50:04.559Z · LW · GW

You only have two votes right now, but they counted for -10, so probably 2 strong downvotes. You can see the number of votes by hovering your mouse over the number.

Comment by Rafael Harth (sil-ver) on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-07T14:22:56.432Z · LW · GW

This seems like a very surprising claim to me. You can make money on stocks by knowing things above pure chance. Do you really think that for all stocks?

Comment by Rafael Harth (sil-ver) on Often, enemies really are innately evil. · 2021-06-07T13:28:38.399Z · LW · GW

I don't believe a significant percentage of people is innately evil, and at the end of part I of this post, I don't think you've given me significant evidence to chance my mind. The study is not convincing, and not because of effect size -- people could have misunderstood the game or just pressed the red button for fun since we're talking about cents. I would have predicted few people to press the red button if the payouts were significant (thinking at least 100$ difference); I genuinely don't know what I would have predicted for the game as-is.

there are many, many, MANY more pieces of evidence from (almost) every internet troll, bully, and rapist, and many other criminals too.

I mean, rape has a pretty obvious advantage for the rapist. "Troll" is so overloaded that I think you'd have to define it before I can consider it seriously for anything. Bullying is the most convincing case, but my model of bullies, especially if they're young, isn't that they're innately evil. If I remember correctly, I have participated in bullying a couple of times before thinking about it and deciding that it's morally indefensible. I imagine most bullies are similar except that they skipped the part where they think about it, or that they have thought about it, maybe decided to stop, but then proceeded anyway because the instinct was too strong.

Anyway, this is quite speculative, but my point is that I don't think you're making a strong case for your leading claim. I realize that this may come across as nitpicking details in a mountain of obvious evidence, but that's often just how it feels like if someone doubts what you consider an obvious truth.

There's also an issue that we may have different ideas of what 'innately evil' means.

Comment by Rafael Harth (sil-ver) on The dumbest kid in the world (joke) · 2021-06-06T08:03:40.283Z · LW · GW

If you just cut everything from "Later" in the third-to-last paragraph onward, smart readers would probably still get it but it would be less obvious.

Comment by Rafael Harth (sil-ver) on What is the most effective way to donate to AGI XRisk mitigation? · 2021-05-30T14:44:13.507Z · LW · GW

You may be interested in Lark's AI Alignment charity reviews. The only organization I would add is the Qualia Research Institute, which is my personal speculative pick for the highest impact organization, even though they don't do alignment research. (They're trying to develop a mathematical theory of consciousness and qualia.)

Comment by Rafael Harth (sil-ver) on Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers · 2021-05-20T15:01:24.463Z · LW · GW

Thanks a bunch for summarizing your thoughts; this is helpful.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-05-18T17:05:16.966Z · LW · GW

This paper is amazing. I don't think I've ever seen such a scathing critique in an academic context as is presented here.

There is now a vast and confusing literature on some combination of interpretability and ex- plainability. Much literature on explainability confounds it with interpretability/comprehensibility, thus obscuring the arguments, detracting from their precision, and failing to convey the relative importance and use-cases of the two topics in practice. Some of the literature discusses topics in such generality that its lessons have little bearing on any specific problem. Some of it aims to design taxonomies that miss vast topics within interpretable ML. Some of it provides definitions that we disagree with. Some of it even provides guidance that could perpetuate bad practice. Most of it assumes that one would explain a black box without consideration of whether there is an interpretable model of the same accuracy.


XAI surveys have (thus far) universally failed to acknowledge the important point that inter- pretability begets accuracy when considering the full data science process, and not the other way around. [...]


In this survey, we do not aim to provide yet another dull taxonomy of “explainability” termi- nology. The ideas of interpretable ML can be stated in just one sentence: [...]

As far as I can tell, this is all pretty on point. (And I know I've conflated explanability and interpretability before.)

I think I like this because it makes up update downward on how restricted you actually are in what you can publish, as soon as you have some reasonable amount of reputation. I used to find the idea of diving into the publishing world paralyzing because you have to adhere to the process, but nowadays that seems like much less of a big deal.

Comment by Rafael Harth (sil-ver) on Let's Rename Ourselves The "Metacognitive Movement" · 2021-04-24T07:43:30.905Z · LW · GW

"Metacognition" is defined as "thinking about thinking." That's exactly what we do.

I think it's an ok description of what we do in terms of epistemic rationality. I'm not so sure it captures the instrumental part. The biggest impact that joining this community had on my life was that I started really taking actions to further my goals.

Comment by Rafael Harth (sil-ver) on Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers · 2021-04-12T09:42:13.079Z · LW · GW

there are books on the topic

Does anyone know if this book is any good? I'm planning to get more familiar with interpretability research, and 'read a book' has just appeared in my set of options.

Comment by Rafael Harth (sil-ver) on Some blindspots in rationality and effective altruism · 2021-03-20T17:39:31.099Z · LW · GW

I think the culprit is 'overturned'. That makes it sound like their counterarguments were a done deal or something. I'll reword that to 'rebutted and reframed in finer detail'.

Yeah, I think overturned is the word I took issue with. How about 'disputed'? That seems to be the term that remains agnostic about whether there is something wrong with the original argument or not.

Perhaps, your impression from your circle is different from mine in terms of what proportion of AIS researchers prioritise work on the fast takeoff scenario?

My impression is that gradual takeoff has gone from a minority to a majority position on LessWrong, primarily due to Paul Christiano, but not an overwhelming majority. (I don't know how it differs among Alignment Researchers.)

I believe the only data I've seen on this was in a thread where people were asked to make predictions about AI stuff, including takeoff speed and timelines, using the new interactive prediction feature. (I can't find this post -- maybe someone else remembers what it was called?) I believe that was roughly compatible with the sizeable minority summary, but I could be wrong.

Comment by Rafael Harth (sil-ver) on Some blindspots in rationality and effective altruism · 2021-03-19T21:06:26.825Z · LW · GW
  • Eliezer Yudkowsky's portrayal of a single self-recursively improving AGI (later overturned by some applied ML researchers)

I've found myself doubting this claim, so I've read the post in question. As far as I can tell, it's a reasonable summary of the fast takeoff position that many people still hold today. If all you meant to say was that there was disagreement, then fine -- but saying 'later overturned' makes it sound like there is consensus, not that people still have the same disagreement they've had 13 years ago. (And your characterization in the paragraph I'll quote below also gives that impression.)

In hindsight, judgements read as simplistic and naive in similar repeating ways (relying on one metric, study, or paradigm and failing to factor in mean reversion or model error there; fixating on the individual and ignoring societal interactions; assuming validity across contexts):

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-02-15T19:56:34.847Z · LW · GW

Here is a construction of : We have that is the inverse of . Moreover, is the inverse of . [...]

Yeah, that's conclusive. Well done! I guess you can't divide by zero after all ;)

I think the main mistake I've made here is to assume that inverses are unique without questioning it, which of course doesn't make sense at all if I don't yet know that the structure is a field.

My hunch is that any bidirectional sum of integer powers of x which we can actually construct is "artificially complicated" and it can be rewritten as a one-directional sum of integer powers of x. So, this would mean that your number system is what you get when you take the union of Laurent series going in the positive and negative directions, where bidirectional coordinate representations are far from unique. Would be delighted to hear a justification of this or a counterexample.

So, I guess one possibility is that, if we let be the equivalence class of all elements that are in this structure, the resulting set of classes is isomorphic to the Laurent numbers. But another possibility could be that it all collapses into a single class -- right? At least I don't yet see a reason why that can't be the case (though I haven't given it much thought). You've just proven that some elements equal zero, perhaps it's possible to prove it for all elements.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-02-14T08:31:24.557Z · LW · GW

You've understood correctly minus one important detail:

The structure you describe (where we want elements and their inverses to have finite support)

Not elements and their inverses! Elements or their inverses. I've shown the example of to demonstrate that you quickly get infinite inverses, and you've come up with an abstract argument why finite inverses won't cut it:

To show that nothing else works, let and be any two nonzero sums of finitely many integer powers of (so like ). Then, the leading term (product of the highest power terms of and ) will be some nonzero thing. But also, the smallest term (product of the lower power terms of and ) will be some nonzero thing. Moreover, we can't get either of these to cancel out. So, the product can never be equal to . (Unless both are monomials.)

In particular, your example of has the inverse . Perhaps a better way to describe this set is 'all you can build in finitely many steps using addition, inverse, and multiplication, starting from only elements with finite support'. Perhaps you can construct infinite-but-periodical elements with infinite-but-periodical inverses; if so, those would be in the field as well (if it's a field).

If you can construct , it would not be field. But constructing this may be impossible.

I'm currently completely unsure if the resulting structure is a field. If you get a bunch of finite elements, take their infinite-but-periodical inverse, and multiply those inverses, the resulting number has again a finite inverse due to the argument I've shown in the previous comment. But if you use addition on one of them, things may go wrong.

A larger structure to take would be formal Laurent series in . These are sums of finitely many negative powers of x and arbitrarily many positive powers of . This set is closed under multiplicative inverses.

Thanks; this is quite similar -- although not identical.

Comment by Rafael Harth (sil-ver) on Rafael Harth's Shortform · 2021-02-11T17:27:00.746Z · LW · GW

Edit: this structure is not a field as proved by just_browsing.

Here is a wacky idea I've had forever.

There are a bunch of areas in math where you get expressions of the form and they resolve to some number, but it's not always the same number. I've heard some people say that "can be any number". Can we formalize this? The formalism would have to include as something different than , so that if you divide the first by 0, you get 4, but the second gets 3.

Here is a way to turn this into what may be a field or ring. Each element is a function , where a function of the form reads as . Addition is component-wise (; this makes sense), i.e., , and multiplication is, well, , so we get the rule

This becomes a problem once elements with infinite support are considered, i.e., functions that are nonzero at infinitely many values, since then the sum may not converge. But it's well defined for numbers with finite support. This is all similar to how polynomials are handled formally, except that polynomials only go in one direction (i.e., they're functions from rather than ), and that also solves the non-convergence problem. Even if infinite polynomials are allowed, multiplication is well-defined since for any , there are only finitely many pairs of natural numbers such that .

The additively neutral element in this setting is and the multiplicatively neutral element is . Additive inverses are easy; . The interesting part is multiplicative inverses. Of course, there is no inverse of , so we still can't divide by the 'real' zero. But I believe all elements with finite support do have a multicative inverse (there should be a straight-forward inductive proof for this). Interestingly, those inverses are not finite anymore, but they are periodical. For example, the inverse of is just , but the inverse of is actually

I think this becomes a field with well-defined operations if one considers only the elements with finite support and elements with inverses of finite support. (The product of two elements-whose-inverses-have-finite-support should itself have an inverse of finite support because ). I wonder if this structure has been studied somewhere... probably without anyone thinking of the interpretation considered here.

Comment by Rafael Harth (sil-ver) on Open & Welcome Thread – February 2021 · 2021-02-06T14:29:54.855Z · LW · GW

There are a bunch of sequences, like the value learning sequence, that have structured formatting in the sequence overview (the page the link goes to), so something like Headline, a bunch of posts, headline, a bunch of more posts.

How is this done? When I go into the sequence editor, I only see one text field where I can write something which then appears in front of the list of posts.

Comment by Rafael Harth (sil-ver) on The GameStop Situation: Simplified · 2021-01-29T20:24:56.554Z · LW · GW

This post is similar to the one Eliezer Yudkowsky wrote.

Comment by Rafael Harth (sil-ver) on Preface to the Sequence on Factored Cognition · 2021-01-26T19:45:52.179Z · LW · GW

Cool, thanks.

Comment by Rafael Harth (sil-ver) on Qualia Research Institute: History & 2021 Strategy · 2021-01-26T10:38:55.258Z · LW · GW

While QRI is only occasionally talked about on LessWrong, I personally continue to think that they're doing the most exciting research that exists today, provided you take a utilitarian perspective. I've donated to Miri in the past, in part because their work seems highly non-replaceable. I still stand by that reason, but it applies even more to QRI. Even if there is only a small chance that formalizing consciousness is both possible and practically feasible, the potential upside seems enormous. Success in formalizing suffering wouldn't solve AI alignment (for several reasons, one of them being Inner Optimizers), but I imagine it would be extremely helpful. There is nothing approximating a consensus on the related philosophical problems in the community, and positions on those issues seem to have a significant causal influence on what research is being pursued.

It helps that I share most if not all of the essential philosophical intuitions that motivate QRI's research. On the other hand, research should be asymmetrical with regard to what's true. In the world where moral realism is false and suffering isn't objective or doesn't have structure, beliefs to the contrary (which many people in the community hold today) could lead to bad alignment-related decisions. In that case, any attempts to quantify suffering would inevitably fail, and that would itself be relevant evidence.

Comment by Rafael Harth (sil-ver) on Preface to the Sequence on Factored Cognition · 2021-01-26T09:24:23.317Z · LW · GW

Re personal opinion: what is your take on the feasibility of human experiments? It seems like your model is compatible with IDA working out even though no-one can ever demonstrate something like 'solve the hardest exercise in a textbook' using participants with limited time who haven't read the book.

Comment by Rafael Harth (sil-ver) on Preface to the Sequence on Factored Cognition · 2021-01-26T09:19:54.512Z · LW · GW

This is an accurate summary, minus one detail:

The judge decides the winner by evaluating whether the final statement is true or not.

"True or not" makes it sound symmetrical, but the choice is between 'very confident that it's true' and 'anything else'. Something like '80% confident' goes into the second category.

One thing I would like to be added is just that I come out moderately optimistic about Debate. It's not too difficult for me to imagine the counter-factual world where I think about FC and find reasons to be pessimistic about Debate, so I take the fact that I didn't as non-zero evidence.

Comment by Rafael Harth (sil-ver) on Why I'm excited about Debate · 2021-01-17T10:24:57.647Z · LW · GW

I think the Go example really gets to the heart of why I think Debate doesn't cut it.

Your comment is an argument against using Debate to settle moral questions. However, what if Debate is trained on Physics and/or math questions, with the eventual goal of asking "what is a provably secure alignment proposal?"

Comment by sil-ver on [deleted post] 2021-01-17T10:20:44.414Z

Before offering an "X is really about Y" signaling explanation, it's important to falsify the "X is about X" hypothesis first. Once that's done, signaling explanations require, at minimum:

  1. An action or decision by the receiver that the sender is trying to motivate.
  2. (2.1) An explanation for why the receiver is listening for signals in the first place, and (2.2) why the sender is trying to communicate them.
  3. A language that the sender has reason to think the receiver will understand and believe as the sender intended.
  4. A physical mechanism for sending and receiving the signal.

(Added numbers for reference.)

I think 1, 2.1, and 3 are all wrong, in that none of them are required for a signaling hypothesis to be plausible. I believe you're assuming that signaling is effective and/or rational, but this is a mistake. Signaling was optimized to be effective in the ancestral environment, so there's no reason why it should still be effective today. As far as I can tell, it generally is not.

As an example, consider men wearing solid shoes in the summer despite finding those uncomfortable. There is no action this is trying to motivate, and there is no reason to expect the receiver is listening -- in fact, there is often good reason to expect that they are not listening (in many contexts, people really don't care about your shoes). Nonetheless, I think conformity signaling is the correct explanation for this behavior.

The pilot example is problematic because in this case, signaling is part of a high-level plan. This is a non-central example. Most of the time, signaling is motivated by evolutionary instincts, like the fear of standing out. In the case of religion, I think this is most of the story. Those instincts can then translate into high-level behavior like going to the church, but it's not the beginning of the causal chain.