LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

dagon on Experiment on repeating choices

For me, these topics seem extremely contextual and variable with the situation and specifics of the tradeoff in the moment. For many of them, I do somewhat frequently explore consciously what it might feel like (and for cheap ones, try out) to make a different tradeoff, but those experiments don't generalize well.

I suspect that for the impactful ones (heavily repeated or large), your first two bullet points don't apply - feedback is delayed from the decision, and if harmful, it will be significant.

Still, it's VERY GOOD to be reminded that these decisions are mostly made by type-1 thinking, out of habit or instinct (aka deep/early learning) that deserves reconsideration from time to time.

lost-futures on AI #60: Oh the Humanity

The Devin mishap is a reminder of how tricky it often is for the general public to gauge what's currently possible and what isn't for AI. A lot of people, including myself, assumed the claimed performance was legitimate. No doubt many AI startups like Devin are waiting for the rising tide of improving foundational models to make their ideas feasible. I wonder how many are engaging in similar deceptive marketing tactics or will do so in the future.

romeostevensit on Raemon's Shortform

CRPGs with a lot of open world dynamics might work, where the goal is for the person to identify the most important experiments to run in a limited time window in order to manmax certain stats.

gunnar_zarncke on What's up with all the non-Mormons? Weirdly specific universalities across LLMs

If I haven't overlooked the explanation (I have read only part of it and skimmed the rest), my guess for the non-membership definition of the empty string would be all the SQL and programming queries where "" stands for matching all elements (or sometimes matching none). The small round things are a riddle for me too.

owencb on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

I feel awkward about trying to offer examples because (1) I'm often bad at that when on the spot, and (2) I don't want people to over-index on particular ones I give. I'd be happy to offer thoughts on putative examples, if you wanted (while being clear that the judges will all ultimately assess things as seem best to them).

Will probably respond to emails on entries (which might be to decline to comment on aspects of it).

ricraz on What is the best way to talk about probabilities you expect to change with evidence/experiments?

The thing that distinguishes the coin case from the wind case is how hard it is to gather additional information, not how much more information could be gathered in principle. In theory you could run all sorts of simulations that would give you informative data about an individual flip of the coin, it's just that it would be really hard to do so/very few people are able to do so. I don't think the entropy of the posterior captures this dynamic.

owencb on Express interest in an "FHI of the West"

I don't really disagree with anything you're saying here, and am left with confusion about what your confusion is about (like it seemed like you were offering it as examples of disagreement?).

razied on What is the best way to talk about probabilities you expect to change with evidence/experiments?

Wait, why doesn't the entropy of your posterior distribution capture this effect? In the basic example where we get to see samples from a bernoulli process, the posterior is a beta distribution that gets ever sharper around the truth. If you compute the entropy of the posterior, you might say something like "I'm unlikely to change my mind about this, my posterior only has 0.2 bits to go until zero entropy". That's already a quantity which estimates how much future evidence will influence your beliefs.

sam-svenningsen on Inducing Unprompted Misalignment in LLMs

Thanks, yes, I think that is a reasonable summary.

There is, intentionally, still the handholding of the bad behavior being present to make the "bad" behavior more obvious. I try to make those caveats in the post. Sorry if I didn't make enough, particularly in the intro.

I still thought the title was appropriate since

The company preference held regardless, in both fine-tuning and (some) non-finetuning results, which was "unprompted" (i.e. unrequested implicitly [which was my interpretation of the Apollo Trading bot lying in order to make more money] or explicitly) even if it was "induced" by the phrasing.
The non-coding results, where it tried to protect its interests by being less helpful, are a different "bad" behavior that was also unprompted.
The aforementioned 'handholding' phrasing and other caveats in the post.

So, I am interested in the question of: ''when some types of "bad behavior" get reinforced, how does this generalize?'.

I am too. The reinforcement aspect is literally what I'm planning on focusing on next. Thanks for the feedback.

aysja on Express interest in an "FHI of the West"

Huh, I feel confused. I suppose we just have different impressions. Like, I would say that Oliver is exceedingly good at cutting through the bullshit. E.g., I consider his reasoning around shutting down the Lightcone offices to be of this type, in that it felt like a very straightforward document of important considerations, some of which I imagine were socially and/or politically costly to make. One way to say that is that I think Oliver is very high integrity, and I think this helps with bullshit detection: it's easier to see how things don't cut to the core unless you deeply care about the core yourself. In any case, I think this skill carries over to object-level research, e.g., he often seems, to me, to ask cutting-to-the core type questions there, too. I also think he's great at argument: legible reasoning, identifying the important cruxes in conversations, etc., all of which makes it easier to tell the bullshit from the not.

I do not think of Oliver as being afraid to be disagreeable, and ime he gets to the heart of things quite quickly, so much so that I found him quite startling to interact with when we first met. And although I have some disagreements over Oliver's past walled-garden taste, from my perspective it's getting better, and I am increasingly excited about him being at the helm of a project such as this. Not sure what to say about his beacon-ness, but I do think that many people respect Oliver, Lightcone, and rationality culture more generally; I wouldn't be that surprised if there were an initial group of independent researcher types who were down and excited for this project as is.

LessWrong 2.0 Reader

Archive

Recent comments