dalmert

Posts
Comments

Posts

OpenAI Superalignment: Weak-to-strong generalization 2023-12-14T19:47:24.347Z

Interview with Paul Christiano: How We Prevent the AI’s from Killing us 2023-04-27T14:39:49.571Z

Personal predictions for decisions: seeking insights 2023-02-15T06:45:20.298Z

Comments

Comment by Dalmert on Any-Benefit Mindset and Any-Reason Reasoning · 2025-03-16T04:51:38.531Z · LW · GW

I think this is a nice write-up, let me add some nuance in two directions:

Indeed these are quick-and-dirty heuristics that can be subpar, but you may or may not be surprised just how often decisions don't reach even this bar. In my work, when we are about to make a decision, I sometimes explicitly have to ask: do we have even a single reason to pick the option that we were about to pick over one or more others? And I find myself saying that (one of) those other options actually have reason(s) for us to pick them--I didn't bring up the question for nothing after all.

In these cases I could argue that we upgraded from no-reason deciding to at least any-reason deciding. (If we even did, in some contexts it's not unheard of that the answer to the above is something along the lines of "I cannot name any reasons but I still want to pick the first option.")

This is how we can cross from lower sophistication to the middle. However, there are perils of going ever higher: once we have identified at least one set of opposing reasons, we cross into a regime that can be immensely costly: how to weigh reasons against each other, especially when people disagree. And I'd argue that people in general are quite bad at doing this, hence why this can take up a lot of resources and have results with questionable arbitrariness.

Of course this has to all balance in how important the decision even is and how much effort, if any, should be extended towards it. And I think humans are quite bad at judging this too but we do approximate it somewhat at least with large variance.

Thank you for naming these patterns!

Comment by Dalmert on Open Thread Winter 2024/2025 · 2024-12-31T00:51:37.111Z · LW · GW

I have found it! This was the one:

https://www.lesswrong.com/posts/qvNrmTqywWqYY8rsP/solutions-to-problems-with-bayesianism

Seems to have seen better reception at: https://forum.effectivealtruism.org/posts/3z9acGc5sspAdKenr/solutions-to-problems-with-bayesianism

The winning search strategy was quite interesting as well I think:

I took the history of all LW articles I have roughly ever read, I had easy access to all such titles and URLs, but not article contents. I fed them one by one into a 7B LLM asking it to rate how likely based on the title alone the unseen article content could match what I described above, as vague as that memory may be. Then I looked at the highest ranking candidates, and they were a dud. Did the same thing with a 70B model, et voila, the solution was near the top indeed.

Now I just need to re-read it if it was worth dredging up, I guess when a problem starts to itch it's hard to resist solving it.

Comment by Dalmert on Open Thread Winter 2024/2025 · 2024-12-30T07:47:42.266Z · LW · GW

Hey, can anyone help me find this LW (likely but could be diaspora) article, especially if you might have read it too?

My vague memory: It was talking about (among other things?) some potential ways of extending point estimate probability predictions and calibration curves. I.e. in a situation where making a prediction in one way affects what the outcome will be, i.e. if there is a mind-reader/accurate-simulator involved that bases its actions on your prediction. And in this case, a two dimensional probability estimate might be more appropriate: If 40% is predicted for event A, event B will have a probability of 60%. If 70% for event A, then 80% for event B, and so on, a mapping potentially continuously defined for the whole range. (event A and event B might be the same.) IIRC the article contained 2D charts where curves and rectangles were drawn for illustration.

IIRC it didn't have too many upvotes, more like around low-dozen, or at most low-hundred.

Searches I've tried so far: Google, Exa, Gemini 1.5 with Deep Research, Perplexity, OpenAI GPT-4o with Search.

p.s. if you are also unable to put enough time into finding it, do you have any ideas how it could be found?

Comment by Dalmert on Hire (or Become) a Thinking Assistant · 2024-12-23T11:24:37.125Z · LW · GW

I'm interested in variants of this from both sides. Feel free to shoot me a DM and let's see if we can set something up.

I haven't had a good label to put on things like this but I've gravitated towards similar ways of work over the last 10-20 years, and I've very often found very good performance boosting effects, especially where compatibility and trust could be achieved.

Comment by Dalmert on Reflections on Less Online · 2024-07-07T11:06:29.763Z · LW · GW

If anyone reading this feel like they missed out, or this sparked their curiosity, or they are bummed that they might have to wait 11 months for a chance at something similar, or they feel like that so many cool things happen in North America and so few things in Europe, (all preceding "or"s are inclusive) then I can heartily recommend you to come to LessWrong Community Weekend 2024 [Applications Open] in Berlin in about 2 months over the weekend of 13 September. Applications are open as of now.

I've attended it a couple of times so far, and I quite liked it. Reading this article, it seemed very similar and I begun to wonder if LWCW was a big inspiration for LessOnline, or if they had a common source of inspiration. So I do mean to emphasize what I wrote in the first paragraph: if you think you might like something as described here then I strongly encourage you to come!

(If someone attended both then maybe they can weigh in even more authoritatively whether my impression is accurate or if more nuance would be beneficial.)

Comment by Dalmert on Sum-threshold attacks · 2023-09-14T00:38:50.772Z · LW · GW

In a not-too-fast and therefore requisitely stealthy ASI takeover scenario, if the intelligence explosion is not too steep, this could be a main meta-method by which the system gains increasing influence and power while fully remaining under the radar and avoiding detection until it is reasonably sure that it can no longer be opposed. This could be happening without anyone knowing or maybe even being able to know. Frightening.

Comment by Dalmert on AI: Practical Advice for the Worried · 2023-03-17T10:50:07.221Z · LW · GW

The employees of the RAND corporation, in charge of nuclear strategic planning, famously did not contribute to their retirement accounts because they did not expect to live long enough to need them.

Any sources for this? I tried searching around without avail yet, which is surprising if this is indeed famously known.

Comment by Dalmert on Personal predictions for decisions: seeking insights · 2023-02-21T18:51:14.276Z · LW · GW

I expect that until I find a satisfactory resolution to this topic, I might come back to it a few times, and potentially keep a bit of a log here of what I find in case it does add up to something. So far this is one of the things I found:

https://www.lesswrong.com/posts/JnDEAmNhSpBRpjD8L/resolutions-to-the-challenge-of-resolving-forecasts

This seems very relevant to a part of what I was pondering about, but not sure how actionable are the takeaways yet.

Comment by Dalmert on Medlife Crisis: "Why Do People Keep Falling For Things That Don't Work?" · 2023-02-21T07:53:33.488Z · LW · GW

I strong-upvoted this, but I fear you won't see a lot of traction on this forum for this idea.

I have a vague understanding of why, but I don't think I heard compelling enough reasons from other LWers yet. If someone has some, I'd be happy to read them or be pointed towards them.

I value empiricism highly, i.e. putting ideas into action to be tested against the universe; but I think I've read EY state somewhere that a superintelligence would need to perform very few or even zero experiments to find out a lot (or even most? all?) true things about our universe that we humans need painstaking effort and experiments for.

Please don't consider this very vague recollection as anywhere close to a steelman.

I think this was motivated by how much bits of information can be taken in even with human-like senses, and how a single bit of information can halve a set of hypotheses. And where I did not see sufficient motivation for this argument for yet: this can indeed be true for very valuable bits of information, but are we assuming that any entity will easily be able to receive those very valuable bits? Surely a lot of bits are redundant and give no novel information, and some bits are very costly to attain. Sometimes you are lucky if you can even just so much as eliminate a single potential hypothesis, and even that is costly and requires interacting with the universe instead of just passively observing it.

But let's hear it from others!

(I'm not sure if this spectrum of positions have any accepted names, maybe rationalist vs empiricist?)

User info

Posts

Comments