LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

next page (older posts) →

Recent comments

rosiecam on Which skincare products are evidence-based?

Nice!! I don't know much about that moisturizer but the rest looks good to me

rosiecam on Which skincare products are evidence-based?

Seems like the evidence is overwhelmingly in favor of sunscreen, the studies I've seen against it generally seem to not address the obvious confounder that people who tend to wear sunscreen more are also the ones who have a lifestyle that involves being in the sun a lot more.

rosiecam on Which skincare products are evidence-based?

I used to get breakouts maybe like once a month, sometimes with really stubborn/painful zits that would take quite a long time to disappear. Now I basically never get breakouts, I think I've had like 2 small zits since I started and they have disappeared quickly. I have not had any big painful ones.
My fine lines have been reduced, my skin looks and feels smoother and softer
I had some redness/discoloration in some areas which has been reduced a lot - no longer needs to be covered with makeup

Dermatica prompts you to send them photos every few months so they can check how your skin is reacting, but it's also convenient because you can look back and see the improvement.

logan-zoellner on an effective ai safety initiative

It's not trying to address present harms, it's trying to address future harms, which are the important ones.

A real AI system that kills literally everyone will do so by gaining power/resources over a period of time. Most likely it will do so the same way existing bad-agents accumulate power and resources.

Unless you're explicitly committing to the Diamondoid bacteria thing, stopping hacking is stopping AI from taking over the world.

logan-zoellner on an effective ai safety initiative

Point taken. "$$$" was not the correct framing (if we're specifically talking about the Gwern story). I will edit to say "it accumulates 'resources'".

The Gwern story has faster takeoff than I would expect (especially if we're talking a ~GPT4.5 autoGPT agent), but the focus on money vs just hacking stuff is not the point of my essay.

bogdan-ionut-cirstea on Mechanistically Eliciting Latent Behaviors in Language Models

In future work, one could imagine automating the evaluation of the coherence and generalization of learned steering vectors, similarly to how Bills et al. (2023) automate interpretability of neurons in language models. For example, one could prompt a trusted model to produce queries that explore the limits and consistency of the behaviors captured by unsupervised steering vectors.

Probably even better to use interpretability agents (e.g. MAIA, AIA) for this, especially since they can do (iterative) hypothesis testing.

fabien-roger on Fabien's Shortform

I also listened to How to Measure Anything in Cybersecurity Risk 2nd Edition by the same author. I had a huge amount of overlapping content with The Failure of Risk Management (and the non-overlapping parts were quite dry), but I still learned a few things:

Executives of big companies now care a lot about cybersecurity (e.g. citing it as one of the main threats they have to face), which wasn't true in ~2010.
Evaluation of cybersecurity risk is not at all synonyms with red teaming. This book is entirely about risk assessment in cyber and doesn't speak about red teaming at all. Rather, it focuses on reference class forecasting, comparison with other incidents in the industry, trying to estimate the damages if there is a breach, ... It only captures information from red teaming indirectly via expert interviews.

I'd like to find a good resource that explains how red teaming (including intrusion tests, bug bounties, ...) can fit into a quantitative risk assessment.

the-gears-to-ascension on Biorisk is an Unhelpful Analogy for AI Risk

In other words, AI risk looks at least as bad as bio risk, but in many ways much worse. Agree, but I think trying to place these things in a semantically meaningful hand-designed multidimensional space of factors is probably a useful exercise, along with computer security. Your axes of comparison are an interesting starting point.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

I wonder how much near-term interpretability [V]LM agents (e.g. MAIA, AIA) might help with finding better probes and better steering vectors (e.g. by iteratively testing counterfactual hypotheses against potentially spurious features, a major challenge for Contrast-consistent search (CCS) [LW · GW]).

This seems plausible since MAIA can already find spurious features, and feature interpretability [V]LM agents could have much lengthier hypotheses iteration cycles (compared to current [V]LM agents and perhaps even to human researchers).

cousin_it on Accidental Electronic Instrument

Crosstalk is definitely a problem, e-drums and pads have it too. But are you sure the tradeoff is inescapable? Here's a thought experiment: imagine the tines sit on separate pads, or on the same pad but far from each other. (Or physically close, but sitting on long rods or something, so that the distance through the connecting material is large.) Then damping and crosstalk can be small at the same time. So maybe you can reduce damping but not increase crosstalk, by changing the instrument's shape or materials.

LessWrong 2.0 Reader

Archive

Recent comments