LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

The Geometric Importance of Side Payments
StrivingForLegibility · 2024-08-07T01:38:04.635Z · comments (4)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

[link] Michael Streamlines on Buddhism
Chris_Leong · 2024-08-09T04:44:52.126Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

A Case for Conscious Significance rather than Free Will.
James Stephen Brown (james-brown) · 2024-10-25T23:20:30.834Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (8)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

Making Beliefs Pay Rent
Screwtape · 2024-07-28T17:59:52.101Z · comments (2)

Relativity Theory for What the Future 'You' Is and Isn't
FlorianH (florian-habermacher) · 2024-07-29T02:01:17.736Z · comments (48)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (4)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

[link] Species as Canonical Referents of Super-Organisms
Yudhister Kumar (randomwalks) · 2024-10-18T07:49:52.944Z · comments (8)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

Utilitarianism and the replaceability of desires and attachments
MichaelStJules · 2024-07-27T01:57:42.419Z · comments (2)

Lenses of Control
WillPetillo · 2024-10-22T07:51:06.355Z · comments (0)

Broadly human level, cognitively complete AGI
p.b. · 2024-08-06T09:26:13.220Z · comments (0)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

[question] Practical advice for secure virtual communication post easy AI voice-cloning?
hmys (the-cactus) · 2024-08-09T17:32:33.458Z · answers+comments (5)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

raemon on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

fwiw, I think it'd be helpful if this post had the transcript posted as part of the main post body.

raemon on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I think I actually agree with Lincoln here and think he was saying a different thing than your comment here seems to be oriented around.

I don't think Lincoln's comment had much to do with assuming there was a shadow EA cabal that was aligned with your values. He said "your words are having an impact."

Words having impacts just does actually take time. I updated from stuff Ben Hoffman said, but it did take 3-4 years or something for the update to fully happen (for me in particular), and when I did ~finish updating the amount I was going to update, it wasn't exactly the way Ben Hoffman wanted. In the first 3 years, it's not like I can show Ben Hoffman "I am ready for your approval", or even that I've concretely updated any particular way, because it was a slow messy process and it wasn't like I knew for sure how close to his camp I was going to land.

But, it wouldn't have been true to say "his critiques dropped like a stone through water". (Habryka has said they also affected him, and this seems generally to have actually reveberated a lot).

I don't know whether or not your critiques have landed, but I think it is too soon to judge.

erich_grunewald on Lab governance reading list

Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).

Isn't much of that criticism also forms of lab governance? I've always understood the field of "lab governance" as something like "analysing and suggesting improvements for practices, policies, and organisational structures in AI labs". By that definition, many critiques of RSPs would count as lab governance, as could the coverage of OpenAI's NDAs. But arguments of the sort "labs aren't responsive to outside analyses/suggestions, dooming such analyses/suggestions" would indeed be criticisms of lab governance as a field or activity.

(ETA: Actually, I suppose there's no reason why a piece of X research cannot critique X (the field it's a part of). So my whole comment may be superfluous. But eh, maybe it's worth pointing out that the stuff you propose adding can also be seen as a natural part of the field.)

habryka4 on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I think feedback loops are good, but how is that incompatible with taking AI seriously? At this point, even if you want to work on things with tighter feedback loops, AI seems like the central game in town (probably by developing technology that leverages it, while thinking carefully about the indirect effects of that, or at the very least, by being in touch with how it will affect whatever other problem you are trying to solve, since it will probably affect all of them).

green_leaf on A Logical Proof for the Emergence and Substrate Independence of Sentience

so the idea is that you can describe the brain by treating each neuron as a little black box about which you just know its input/output behavior, and then describe the interactions between those little black boxes. Then, assuming you can implement the input/output behavior of your black boxes with a different substrate (i.e., an artificial neuron)

This is guaranteed, because the universe (and any of its subsets) is computable (that means a classical computer can run software that acts the same way).

green_leaf on A Logical Proof for the Emergence and Substrate Independence of Sentience

And there are orders of magnitude more detail going on in my body (and even just in my brain) than I perceive, let alone that I communicate.

There are no sentient details going on that you wouldn't perceive.

It doesn't matter if you communicate something, the important part is that you are capable of communicating it, which means that in changes your input/output pattern (if it didn't, you wouldn't be capable of communicating it even in principle).

Circular arguments that "something is discussed, therefore that thing exists"

This isn't the argument in the OP (even though, when reading quickly, I can see how someone could get that impression).

elityre on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

We currently have too many people taking it all the way into AI town.

I reject the implication that AI town is the last stop on the crazy train.

cubefox on Sodium's Shortform

Wiktionary entry

zero-contradictions on Why Academia is Mostly Not Truth-Seeking

I didn't come up with the title for the essay, but I re-titled this LW post, so thank you for your suggestion. In hindsight, I'll agree that my comment came off as condescending to some extent, so I edited that as well. Regardless of the essay's title, the essay's contents raise serious questions about whether academia is intellectually honest.

I've thought about expanding my sequel essay even further to more precisely quantify and evaluate the research in each academic field, but I ended up not doing this since it would probably take me a week or longer to further detail everything. Another problem was that even if I finished it, people could always say that I failed to evaluate this or that, since there are tens of thousands of papers out there. Another issue is that not everybody agrees on what counts as "fake", as I mentioned in the sequel essay. So even if someone quantified all academic research as best as they can, it's not possible for them to make an overall assessment that a majority of people would agree with.

For these reasons, I don't think it's productive to quantify whether most academic research is true or false or high-quality or low-quality, which would explain why the author didn't do so. I think it's more productive to analyze how academia and the academic research process work and what kind of output such a system is likely to produce. From everything that I've seen across a multitude of fields, my overall impression is that most academic research tends to be low-quality. Blithering Genius's analysis and my own analysis both conclude that that's probably the case for most academic research.

erich_grunewald on The Hopium Wars: the AGI Entente Delusion

Yes, this seems right to me. The OP says

The key point I will make is that, from a game-theoretic point of view, this race is not an arms race but a suicide race. In an arms race, the winner ends up better off than the loser, whereas in a suicide race, both parties lose massively if either one crosses the finish line.

But from a game-theoretic perspective, it can still make sense for the US to aggressively pursue AGI, even if one believes there's a substantial risk of an AGI takeover in the case of a race, especially if the US acts in its own self interest. Even with this simple model, the optimal strategy would depend on how likely AGI takeover is, how bad China getting controllable AGI first would be from the point of view of the US, and how likely China is to also not race if the US does not race. In particular, if the US is highly confident that China will aggressively pursue AGI even if the US chooses to not race, then the optimal strategy for the US could be to race even if AGI takeover is highly likely.

So really I think some key cruxes here are:

How likely is AGI (or its descendants) to take over?
How likely is China to aggressively pursue AGI if the US chooses not to race?

And vice versa for China. But the OP doesn't really make any headway on those.

Additionally, I think there are a bunch of complicating details that also end up mattering, for example:

To what extent can two rival countries cooperate while simultaneously competing? The US and the Soviets did cooperate on multiple occasions, while engaged in intense geopolitic competition. That could matter if one thinks racing is bad because it makes cooperation harder (as opposed to being bad because it brings AGI faster).
How (if at all) does the magnitude of the leader's lead over the follower change the probability of AGI takeover (i.e., does the leader need "room to manoeuvre" to develop AGI safely)?
Is the likelihood of AGI takeover lower when AGI is developed in some given country than in some other given country (all else equal)?
Is some sort of coordination more likely in worlds where there's a larger gap between racing nations (e.g., because the leader has more leverage over the follower, or because a close follower is less willing to accept a deal)?
And adding to that, obviously constructs like "the US" and "China" are simplifications too, and the details around who actually makes and influences decisions could end up mattering a lot

It seems to me all these things could matter when determining the optimal US strategy, but I don't see them addressed in the OP.