LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

The Sequences on YouTube
Neil (neil-warren) · 2024-01-07T01:44:39.663Z · comments (9)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (7)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

Decent plan prize winner & highlights
lukehmiles (lcmgcd) · 2024-01-19T23:30:34.242Z · comments (2)

[question] How to Model the Future of Open-Source LLMs?
Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · answers+comments (9)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Decent plan prize announcement (1 paragraph, $1k)
lukehmiles (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

If a little is good, is more better?
DanielFilan · 2023-11-04T07:10:05.943Z · comments (16)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

[link] Report: Evaluating an AI Chip Registration Policy
Deric Cheng (deric-cheng) · 2024-04-12T04:39:45.671Z · comments (0)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

[link] Let's Design A School, Part 2.2 School as Education - The Curriculum (General)
Sable · 2024-05-07T19:22:21.730Z · comments (3)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

raemon on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

fwiw, I think it'd be helpful if this post had the transcript posted as part of the main post body.

raemon on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I think I actually agree with Lincoln here and think he was saying a different thing than your comment here seems to be oriented around.

I don't think Lincoln's comment had much to do with assuming there was a shadow EA cabal that was aligned with your values. He said "your words are having an impact."

Words having impacts just does actually take time. I updated from stuff Ben Hoffman said, but it did take 3-4 years or something for the update to fully happen (for me in particular), and when I did ~finish updating the amount I was going to update, it wasn't exactly the way Ben Hoffman wanted. In the first 3 years, it's not like I can show Ben Hoffman "I am ready for your approval", or even that I've concretely updated any particular way, because it was a slow messy process and it wasn't like I knew for sure how close to his camp I was going to land.

But, it wouldn't have been true to say "his critiques dropped like a stone through water". (Habryka has said they also affected him, and this seems generally to have actually reveberated a lot).

I don't know whether or not your critiques have landed, but I think it is too soon to judge.

erich_grunewald on Lab governance reading list

Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).

Isn't much of that criticism also forms of lab governance? I've always understood the field of "lab governance" as something like "analysing and suggesting improvements for practices, policies, and organisational structures in AI labs". By that definition, many critiques of RSPs would count as lab governance, as could the coverage of OpenAI's NDAs. But arguments of the sort "labs aren't responsive to outside analyses/suggestions, dooming such analyses/suggestions" would indeed be criticisms of lab governance as a field or activity.

(ETA: Actually, I suppose there's no reason why a piece of X research cannot critique X (the field it's a part of). So my whole comment may be superfluous. But eh, maybe it's worth pointing out that the stuff you propose adding can also be seen as a natural part of the field.)

habryka4 on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I think feedback loops are good, but how is that incompatible with taking AI seriously? At this point, even if you want to work on things with tighter feedback loops, AI seems like the central game in town (probably by developing technology that leverages it, while thinking carefully about the indirect effects of that, or at the very least, by being in touch with how it will affect whatever other problem you are trying to solve, since it will probably affect all of them).

green_leaf on A Logical Proof for the Emergence and Substrate Independence of Sentience

so the idea is that you can describe the brain by treating each neuron as a little black box about which you just know its input/output behavior, and then describe the interactions between those little black boxes. Then, assuming you can implement the input/output behavior of your black boxes with a different substrate (i.e., an artificial neuron)

This is guaranteed, because the universe (and any of its subsets) is computable (that means a classical computer can run software that acts the same way).

green_leaf on A Logical Proof for the Emergence and Substrate Independence of Sentience

And there are orders of magnitude more detail going on in my body (and even just in my brain) than I perceive, let alone that I communicate.

There are no sentient details going on that you wouldn't perceive.

It doesn't matter if you communicate something, the important part is that you are capable of communicating it, which means that in changes your input/output pattern (if it didn't, you wouldn't be capable of communicating it even in principle).

Circular arguments that "something is discussed, therefore that thing exists"

This isn't the argument in the OP (even though, when reading quickly, I can see how someone could get that impression).

elityre on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

We currently have too many people taking it all the way into AI town.

I reject the implication that AI town is the last stop on the crazy train.

cubefox on Sodium's Shortform

Wiktionary entry

zero-contradictions on Why Academia is Mostly Not Truth-Seeking

I didn't come up with the title for the essay, but I re-titled this LW post, so thank you for your suggestion. In hindsight, I'll agree that my comment came off as condescending to some extent, so I edited that as well. Regardless of the essay's title, the essay's contents raise serious questions about whether academia is intellectually honest.

I've thought about expanding my sequel essay even further to more precisely quantify and evaluate the research in each academic field, but I ended up not doing this since it would probably take me a week or longer to further detail everything. Another problem was that even if I finished it, people could always say that I failed to evaluate this or that, since there are tens of thousands of papers out there. Another issue is that not everybody agrees on what counts as "fake", as I mentioned in the sequel essay. So even if someone quantified all academic research as best as they can, it's not possible for them to make an overall assessment that a majority of people would agree with.

For these reasons, I don't think it's productive to quantify whether most academic research is true or false or high-quality or low-quality, which would explain why the author didn't do so. I think it's more productive to analyze how academia and the academic research process work and what kind of output such a system is likely to produce. From everything that I've seen across a multitude of fields, my overall impression is that most academic research tends to be low-quality. Blithering Genius's analysis and my own analysis both conclude that that's probably the case for most academic research.

erich_grunewald on The Hopium Wars: the AGI Entente Delusion

Yes, this seems right to me. The OP says

The key point I will make is that, from a game-theoretic point of view, this race is not an arms race but a suicide race. In an arms race, the winner ends up better off than the loser, whereas in a suicide race, both parties lose massively if either one crosses the finish line.

But from a game-theoretic perspective, it can still make sense for the US to aggressively pursue AGI, even if one believes there's a substantial risk of an AGI takeover in the case of a race, especially if the US acts in its own self interest. Even with this simple model, the optimal strategy would depend on how likely AGI takeover is, how bad China getting controllable AGI first would be from the point of view of the US, and how likely China is to also not race if the US does not race. In particular, if the US is highly confident that China will aggressively pursue AGI even if the US chooses to not race, then the optimal strategy for the US could be to race even if AGI takeover is highly likely.

So really I think some key cruxes here are:

How likely is AGI (or its descendants) to take over?
How likely is China to aggressively pursue AGI if the US chooses not to race?

And vice versa for China. But the OP doesn't really make any headway on those.

Additionally, I think there are a bunch of complicating details that also end up mattering, for example:

To what extent can two rival countries cooperate while simultaneously competing? The US and the Soviets did cooperate on multiple occasions, while engaged in intense geopolitic competition. That could matter if one thinks racing is bad because it makes cooperation harder (as opposed to being bad because it brings AGI faster).
How (if at all) does the magnitude of the leader's lead over the follower change the probability of AGI takeover (i.e., does the leader need "room to manoeuvre" to develop AGI safely)?
Is the likelihood of AGI takeover lower when AGI is developed in some given country than in some other given country (all else equal)?
Is some sort of coordination more likely in worlds where there's a larger gap between racing nations (e.g., because the leader has more leverage over the follower, or because a close follower is less willing to accept a deal)?
And adding to that, obviously constructs like "the US" and "China" are simplifications too, and the details around who actually makes and influences decisions could end up mattering a lot

It seems to me all these things could matter when determining the optimal US strategy, but I don't see them addressed in the OP.