LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

jblack on Is There Really a Child Penalty in the Long Run?

Yes, that was my first guess as well. Increased income from employment is most strongly associated with major changes, such as promotion to a new position with changed (and usually increased) responsibilities, or leaving one job and starting work somewhere else that pays more.

It seems plausible that these are not the sorts of changes that women are likely to seek out at the same rate when planning to devote a lot of time in the very near future to being a first-time parent. Some may, but all? Seems unlikely. Men seem more likely to continue to pursue such opportunities at a similar rate due to gender differences in child-rearing roles.

annasalamon on Stephen Fowler's Shortform

I don't know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question. Maybe someone should make one?

jblack on LLMs could be as conscious as human emulations, potentially

I don't expect this to "cash out" at all, which is rather the point.

The only really surprising part would be that we had a way to determine for certain whether some other system is conscious or not at all. That is, very similar (high) levels of surprisal for either "ems are definitely conscious" or "ems are definitely not conscious", but the ratio between them not being anywhere near "what the fuck" level.

As it stands, I can determine that I am conscious but I do not know how or why I am conscious. I have only a sample size of 1, and no way to access a larger sample. I cannot determine that you are conscious. I can't even determine for certain when or whether I was conscious in the past, and there are some time periods for which I am very uncertain. I have hypotheses regarding all of these uncertainties, but there are no prospects of checking whether they're actually correct.

So given that, why would I be "what the fuck" surprised if some of my currently favoured hypotheses such as "ems will be conscious" were actually false? I don't have anywhere near the degree of evidence required to justify that level of prior confidence. I am quite certain that you don't either. I would be very surprised if other active fleshy humans weren't conscious, but still not "what the fuck" surprised.

o-o on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

Additionally, the AI might think it's in an alignment simulation and just leave the humans as is or even nominally address their needs. This might be mentioned in the linked post, but I want to highlight it. Since we already do very low fidelity alignment simulations by training deceptive models, there is reason to think this.

thomas-kwa on Do you believe in hundred dollar bills lying on the ground? Consider humming

My prior is that solutions contain on the order of 1% active ingredients, and of things on the Enovid ingredients list, citric acid and NaNO2 are probably the reagents that create NO [1], which happens at a 5.5:1 mass ratio. 0.11ppm*hr as an integral over time already means the solution is only around 0.01% NO by mass [1], which is 0.055% reagents by mass, probably a bit more because yield is not 100%. This is a bit low but believable. If the concentration were really only 0.88ppm and dissipated quickly, it would be extremely dilute which seems unlikely. This is some evidence for the integral interpretation over the instantaneous 0.88ppm interpretation-- not very strong evidence; I mostly believe it because it seems more logical and also dimensionally correct. [2]

[1] https://chatgpt.com/share/e95fcaa3-4062-4805-80c3-7f1b18b12db2

[2] If you multiply 0.11ppmhr by 8 hours, you get 0.88ppmhr^2, which doesn't make sense.

emrik-1 on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

It's a reasonable concern to have, but I've spoken enough with him to know that he's not out of touch with reality. I do think he's out of sync with social reality, however, and as a result I also think this post is badly written and the anecdotes unwisely overemphasized. His willingness to step out of social reality in order to stay grounded with what's real, however, is exactly one of the main traits that make me hopefwl about him.

I have another friend who's bipolar and has manic episodes. My ex-step-father also had rapid-cycling BP, so I know a bit about what it looks like when somebody's manic.^[1] They have larger-than-usual gaps in their ability to notice their effects on other people, and it's obvious in conversation with them. When I was in a 3-person conversation with Johannes, he was highly attuned to the emotions and wellbeing of others, so I have no reason to think he has obvious mania-like blindspots here.

But when you start tuning yourself hard to reality, you usually end up weird in a way that's distinct from the weirdness associated with mania. Onlookers who don't know the difference may fail to distinguish the underlying causes, however. ("Weirdness" is a larger cluster than "normality", but people mostly practice distinguishing between samples of normality, so weirdness all looks the same to them.)

^{^}
I was also evaluated for it after an outlier depressive episode in 2021, so I got to see the diagnostic process up close. Turns out I just have recurring depressions, and I'm not bipolar.

vinayak-pathak on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

I read the paper, and overall it's an interesting framework. One thing I am somewhat unconvinced about (likely because I have misunderstood something) is its utility despite the dependence on the world model. If we prove guarantees assuming a world model, but don't know what happens if the real world deviates from the world model, then we have a problem. Ideally perhaps we want a guarantee akin to what's proved in learning theory, for example, that the accuracy will be small for any data distribution as long as the distribution remains the same during training and testing.

But perhaps I have misunderstood what's meant by a world model and maybe it's simply the set of precise assumptions under which the guarantees have been proved. For example, in the learning theory setup, maybe the world model is the assumption that the training and test distributions are the same, as opposed to a description of the data distribution.

seth-herd on Instruction-following AGI is easier and more likely than value aligned AGI

I read your linked shortform thread. I agreed with pretty most of your arguments against some common AGI takeover arguments. I agree that they won't coordinate against us and won't have "collective grudges" against us.

But I don't think the arguments for continued stability are very thorough, either. I think we just don't know how it will play out. And I think there's a reason to be concerned that takeover will be rational for AGIs, where it's not for humans.

The central difference in logic is the capacity for self-improvement. In your post, you addressed self-improvement by linking a Christiano piece on slow takeoff. But he noted at the start that he wasn't arguing against self-improvement, only that the pace of self improvement would be more modest. But the potential implications for a balance of power in the world remain.

Humans are all locked to a similar level of cognitive and physical capabilities. That has implications for game theory where all of the competitors are humans. Cooperation often makes more sense for humans. But the same isn't necessarily true of AGI. Their cognitive and physical capacities can potentially be expanded on. So it's (very loosely) like the difference between game theory in chess, and chess where one of the moves is to add new capabilities to your pieces. We can't learn much about the new game from theory of the old, particularly if we don't even know all of the capabilities that a player might add to their pieces.

More concretely: it may be quite rational for a human controlling an AGI to tell it to try to self-improve and develop new capacities, strategies and technologies to potentially take over the world. With a first-mover advantage, such a takeover might be entirely possible. Its capacities might remain ahead of the rest of the world's AI/AGIs if they hadn't started to aggressively self-improve and develop the capacities to win conflicts. This would be particularly true if the aggressor AGI was willing to cause global catastrophe (e.g., EMPs, bringing down power grids).

The assumption of a stable balance of power in the face of competitors that can improve their capacities in dramatic ways seems unlikely to be true by default, and at the least, worthy of close inspection. Yet I'm afraid it's the default assumption for many.

Your shortform post is more on-topic for this part of the discussion, so I'm copying this comment there and will continue there if you want. It's worth more posts; I hope to write one myself if time allows.

Edit: It looks like there's an extensive discussion there, including my points here, so I won't bother copying this over. As far as I could tell, neither you nor anyone else had really addressed the destabilizing effect of potential AGI self-improvement. So I continue to think that a massively multipolar AGI scenario probably results fairly quickly in conflict and potential catastrophe.

eggsyntax on Language Models Model Us

I'm aware of the paper because of the impact it had. I might personally not have chosen to draw their attention to the issue, since the main effect seems to be making some research significantly more difficult, and I haven't heard of any attempts to deliberately exfiltrate weights that this would be preventing.

bec-hawk on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Noting that while Sam describes the provision as being about “about potential equity cancellation”, the actual wording says ‘shall be cancelled’ not ‘may be cancelled’, as per this tweet from Kelsey Piper: https://x.com/KelseyTuoc/status/1791584341669396560

LessWrong 2.0 Reader

Archive

Recent comments