LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

alex_altair on Towards a formalization of the agent structure problem

Yep, that paper has been on my list for a while, but I have thus far been unable to penetrate the formalisms that the Causal Incentive Group uses. This paper in particular also seems have some fairly limiting assumptions in the theorem.

gwern on Language Models Model Us

I can be deanonymized in other ways more easily.

I write these as warnings to other people who might think that it is still adequate to simply use a pseudonym and write exclusively in text and not make the obvious OPSEC mistakes, and so you can safely write under multiple names. It is not, because you will have already lost in a few years.

Regrettable as it is, if you wish to write anything online which might invite persecution over the next few years or lead activists to try to dox you - if you are, say, blowing a whistle at a sophisticated megacorp company with the most punitive NDAs & equity policies in the industry - you would be well-advised to start laundering your writings through an LLM yesterday, despite the deplorable effects on style. Truesight will only get keener and flense away more of the security by obscurity we so take for granted, because "attacks only get better".

quetzal_rainbow on robo's Shortform

I am talking about belief state in ~2015, because everyone was already skeptical about policy approach at that time.

seth-herd on What's the risk that AI tortures us all?

The keyword for discussions of this topic is s-risk, for suffering risk.

People don't generally think it's too likely, but as with other topics in alignment, there is reasonable disagreement.

One source of s-risk is curiosity as a core drive in AI. Depending on how you define curiosity, that could lead to AGIs trying to "learn everything about humans", including how we'd react to really awful situations (as well as, presumably, really great situations).

So I think it's possible but unlikely. But you can find more discussion by searching for s-risks

charlie-steiner on What's the risk that AI tortures us all?

20% maybe? I'm feeling optimistic today.

habryka4 on Stephen Fowler's Shortform

I don't think this is true. Nonprofits can aim to amass large amounts of wealth, they just aren't allowed to distribute that wealth to its shareholders. A good chunk of obviously very wealthy and powerful companies are nonprofits.

carl-feynman on Alexander Gietelink Oldenziel's Shortform

When I brought up sample inefficiency, I was supporting Mr. Helm-Burger‘s statement that “there's huge algorithmic gains in …training efficiency (less data, less compute) … waiting to be discovered”. You’re right of course that a reduction in training data will not necessarily reduce the amount of computation needed. But once again, that’s the way to bet.

bogdan-ionut-cirstea on Success without dignity: a nearcasting story of avoiding catastrophe by luck

I think interpretability looks like a particularly promising area for “automated research” - AIs might grind through large numbers of analyses relatively quickly and reach a conclusion about the thought process of some larger, more sophisticated system.

Arguably, this is already starting to happen (very early, with obviously-non-x-risky systems) with interpretability LM agents like in FIND and MAIA.

kaj_sotala on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Yes, I think this was irrational to not clean up the glass. That is the point I want to make. I don't think it is virtual to have failed in this way at all. What I want to say is: "Look I am running into failure modes because I want to work so much."

Ah! I completely missed that, that changes my interpretation significantly. Thank you for the clarification, now I'm less worried for you since it no longer sounds like you have a blindspot around it.

Not running into these failure modes is important, but these failure modes where you are working too much are much easier to handle than the failure mode of "I can't get myself to put in at least 50 hours of work per week consistently."
While I do think that it is true, I am probably very bad in general at optimizing for myself to be happy. But the thing is while I was working so hard during AISC I was most of the time very happy. The same when I made these games. Most of the time I did these things because I deeply wanted to.

It sounds right that these failure modes are easier to handle than the failure mode of not being able to do much work.

Though working too much can lead to the failure mode of "I can't get myself put in work consistently". I'd be cautious in that it's possible to feel like you really enjoy your work... and then burn out anyway! I've heard several people report this happening to them. The way I model that is something like... there are some parts of the person [? · GW] that are obsessed with the work, and become really happy about being able to completely focus on the obsession. But meanwhile, that single-minded focus can lead to the person's other needs not being met, and eventually those unmet needs add up and cause a collapse.

I don't know how much you need to be worried about that, but it's at least good to be aware of.

jenniferrm on Scientific Notation Options

There is a bit of a tradeoff if the notation aims to transmit the idea of measurement error.

I would read "700e6" as saying that there were three digits of presumed accuracy in the measurement, and "50e3" as claiming only two digits of confidence in the precision.

If I knew that both were actually a measurement with a mere one part in ten of accuracy, and I was going to bodge the numeric representation for verbal convenience like this, it would give my soul a twinge of pain.

Also, if I'm gonna bodge my symbols to show how sloppy I'm being, like in text, I'd probably write 50k and 700M (pronounced "fifty kay" and "seven hundred million" respectively).

Then I'd generally expect people to expect me to be so sloppy with this that it doesn't even matter (like I haven't looked it up, to be precise about anything) if I meant to point to 5*10^3 or 5*2^10. In practice I would have meant roughly "both or either of these and I can't be arsed to check right now, we're just talking and not making spreadsheets or writing code or cutting material yet".

LessWrong 2.0 Reader

Archive

Recent comments