LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (53)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks
Tom DAVID (tom-david) · 2024-11-27T02:54:16.263Z · comments (0)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

[question] How does it feel to switch from earn-to-give?
Neil (neil-warren) · 2024-03-31T16:27:22.860Z · answers+comments (4)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

5. Open Corrigibility Questions
Max Harms (max-harms) · 2024-06-10T14:09:20.777Z · comments (0)

[question] Why are there no interesting (1D, 2-state) quantum cellular automata?
Optimization Process · 2024-11-26T00:11:37.833Z · answers+comments (13)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (8)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L (LRudL) · 2024-10-28T21:02:51.215Z · comments (0)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (25)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sherrinford on Open Thread Winter 2024/2025

What is the current status of CFAR? The website seems like it is inactive, which I find surprising given that there were four weekend workshops in 2022 that CFAR wanted to use for improving its workshops.

kaj_sotala on Heritability: Five Battles

So I think we have two separate questions here:

Do psychological issues involve reactivation of an earlier memory such that the reactivation plays a causal role in the issue?
Can you address an issue without explicitly working with an earlier memory?

For the first question, I'd say "it depends". On one end, we have something like PTSD flashbacks - here a reactivation of a memory is clearly in a causal role, since "a memory getting reactivated to such an extent that the person experiences themselves as literally reliving it" is what a flashback is.

Slightly less strong but still strongly suggestive of a causal role is something where a person imagines themselves doing something, but then - maybe just at the back of their mind - recalls a painful memory and flinches away from doing that. E.g. they consider speaking up, but then a flicker of a memory comes up about a time when they spoke up and somebody ridiculed them, and they quickly close their mouth. Here there seems to at least be a causal path from the memory to the issue, in that the memory is charged with negative affect and that the memory coming up causes the person to reorient to something that makes the memory recede in intensity.

Then we have cases where there's no obvious memory at first, but directing attention to the issue and asking questions about it brings up a memory, even though none of the questions ever ask about memories directly. For example, someone might feel like they have to act in a certain way in a particular social situation despite finding it unpleasant. Now a therapist might ask them something like "what would be bad about acting differently" and have them focus on what feels emotionally or intuitively bad about it (rather than what logical justifications their mind would offer). Then there might be a line of questioning that went something like:

"I have difficulty getting a turn to speak because I tend to wait extra long to make sure others have finished speaking before I speak up. And then I wait so long that someone else always starts talking before I can."
"Okay, so what would be bad about speaking up before you're certain that others have finished speaking?"
"Then I might interrupt them before they're finished."
"Okay, what would be bad about interrupting them before they're finished?"
"That'd feel unfair toward them."
"In what way does it feel unfair?"
"Hmm, I'm getting a memory of a time when I was trying to speak up but my father interrupted me, and then I tried talking anyway and then he acted like I had interrupted him and that I should let him talk first. That felt really bad and unfair. I guess I want to make sure to act better than he did, and make sure that I never interrupt someone else when it's their turn to speak."

Is the memory in a causal role here? Probably depends on how exactly we define "causal". But at least it seems like there is some kind of a model about how the person wants to act or not act ("interrupting other people is unfair toward them, and should be avoided") that was formed due to an earlier experience. When one tries to elicit details about how exactly the model works, the model seems to structurally incorporate the original experience as a reference point for what exactly the core bad thing is. And working with the memory often seems to help with one's issues.

Given that this kind of a memory seems to have a similar character as the PTSD and the "I can hear the people mocking me" memories, just buried slightly deeper, to me the simplest and most plausible explanation would be that it has a causal role in the same way as the less-buried ones do.

Then on the other hand, it's not always the case that this kind of questioning leads to any clear-cut memory. Sometimes what comes up feels more like a general model that has been formed out of multiple different life experiences, with none of original instances having been stored. Or there might be an issue that seems to go back to an age young enough that the person doesn't have any explicit memories of it, and it has only left a general emotional imprint. In those cases the memory doesn't seem to have any causal role, because there doesn't seem to be any memory around to begin with.

Or at least not one that would be easily accessible. I've heard of claims from people who got into states of deep meditation or strong doses of psychedelics that they managed to access very early painful memories that wouldn't have been available in a normal therapeutic context, and then got independent confirmation for the truthfulness of those memories afterward. I've not looked into these in detail but I'm inclined to suspect they're true. In part due to my personal experiences of old memories spontaneously coming up in altered states of consciousness (and this sometimes shifting behaviors), in part because "all behaviors involve an original memory trace being stored somewhere and that trace then driving behavior, with some of those traces just being buried deep or in not normally forward-compatible storage formats" would again seem like the most parsimonious model.

As for the second question, I'd again say it depends. If someone is suffering from a PTSD flashback, it's going to be hard to do anything about that without working with the traumatic memory in some way! But for the ones where the problem isn't so directly driven by an explicit memory reactivation, there are definitely a lot of approaches that work by changing other parts of the model. E.g. if the model makes a particular prediction about the world in general ("people will always find it unfair if I speak up before being absolutely certain that they're finished"), then it's often possible to disprove that prediction without going into the details of the original memory. And while some therapies focus on the episodic memory component of the learned model, others work on different components.

wassname on Implications of the inference scaling paradigm for AI safety

I agree because:

Some papers are already using implicit process based supervision. That's where the reward model guesses how "good" a step is, by how likely it is to get a good outcome. So they bypass any explicitly labeled process, instead it's negotiated between the policy and reward model. It's not clear to me if this scales as well as explicit process supervision, but it's certainly easier to find labels.

In rStar-Math they did implicit process supervision. Although I don't think this is a true o1/o3 replication since they started with a 236b model and produced a 7b model, in other words: indirect distillation.
Outcome-Refining Process Supervision for Code Generation did it too

There was also the recent COCONUT paper exploring non-legible latent CoT. It shows extreme token efficiency. While it wasn't better overall, it has lots of room for improvement. If frontier models end up using latent thoughts, they will be even less human-legible than the current sometimes-unfaithful-CoT.

deluks917 on We probably won't just play status games with each other after AGI

I have spent weeks where pretty much all I did was:
-- have sex with my partner, hours per day
-- watch anime with my partner
-- eat food and ambiently hand with my partner

No work. Not much seeing other people. Of course given the amount of sex mundane situations were quite sexually charged. I'm not actually sure if it gets old on any human timeline. You also improve at having fun together. However this was not very good for our practical. But post singularity I probably wont need to worry about practical goals.

In general I think you underestimate the sustainable fun available to at least some humans under minimal conditions. I also found my two months meditating in a tent quite fun. Many people report this never gets old on human timelines either. Until your heath is so terrible you cannot even meditate well it remains fun, or improves!

I do not think you need supermeth to enjoy hedonism. Current human bodies work fine as long as they are in good shape and you have the right disposition. The issue is that if you delve deep into hedonism you will lose out on other things you could have obtained.

dmitry-vaintrob on Permanents: much more than you wanted to know

The elves care, Alex. The elves care.

alexander-gietelink-oldenziel on Permanents: much more than you wanted to know

Hope this will be answered in a later post, but why should I care about the permanent for alignment ?

davidmanheim on Davidmanheim's Shortform

Toby Ord writes that “the required resources [for LLM training] grow polynomially with the desired level of accuracy [measured by log-loss].” He then concludes that this shows “very poor returns to scale,” and christens it the "Scaling Paradox." (He continues to point out that this doesn’t imply it can’t create superintelligence, but I agree with him about that.)

But what would it look like if this were untrue? That is, what would be the conceptual alternative, where required resources grow more slowly?I think the answer is that it’s conceptually impossible.

To start, there is a fundamental bound on loss at zero, since the best possible model perfectly predicts everything - it exactly learns the distribution. This can happen when overfitting a model, but it can also happen when there is a learnable ground truth; models that are trained to learn a polynomial function can learn them exactly.

But there is strong reason to expect the bound to be significantly above zero loss. The training data for LLMs contains lots of aleatory randomness, things that are fundamentally conceptually unpredictable. I think it’s likely that things like RAND’s random number book are in the training data, and it’s fundamentally impossible to predict randomness. I think something similar is generally true for many other things - predicting world choice for semantically equivalent words, predicting where typos occur, etc.

Aside from being bound well above zero, there's a strong reason to expect that scaling is required to reduce loss for some tasks. In fact, it’s mathematically guaranteed to require significant computation to get near that level for many tasks that are in the training data. Eliezer pointed out that GPTs are predictors [LW · GW], and gives the example of a list of numbers followed by their two prime factors. It’s easy to generate such a list by picking pairs of primes and multiplying them, the writing the answer first - but decreasing loss for generating the next token to predict the primes from the product is definitionally going to require exponentially more computation to perform better for larger primes.

And I don't think this is the exception, I think it's at least often the rule. The training data for LLMs contains lots of data where the order of the input doesn’t follow the computational order of building that input. When I write an essay, I sometimes arrive at conclusions and then edit the beginning to make sense. When I write code, the functions placed earlier often don’t make sense until you see how they get used later. Mathematical proofs are another example where this would often be true.

An obvious response is that we’ve been using exponentially more compute for better accomplishing tasks that aren’t impossible in this way - but I’m unsure if that is true. Benchmarks keep getting saturated, and there’s no natural scale for intelligence. So I’m left wondering whether there’s any actual content in the “Scaling Paradox.”

(Edit: now also posted to my substack.)

t3t on RobertM's Shortform

I know I'm late to the party, but I'm pretty confused by https://www.astralcodexten.com/p/its-still-easier-to-imagine-the-end (I haven't read the post it's responding to, but I can extrapolate). Surely the "we have a friendly singleton that isn't Just Following Orders from Your Local Democratically Elected Government or Your Local AGI Lab" is a scenario that deserves some analysis...? Conditional on "not dying" that one seems like the most likely stable end state, in fact.

Lots of interesting questions in that situation! Like, money still seems obviously useful for allocating rivalrous goods (which is... most of them, really). Is a UBI likely when you have a friendly singleton around? Well, I admit I'm not currently coming up with a better plan for the cosmic endowment. But then you have population ethics questions - it really does seem like you have to "solve" population ethics somehow, or you run into issues. Most "just do X" proposals seem to fall totally flat on their face - "give every moral patient an equal share" fails if you allow uploads (or even sufficiently motivated biological reproduction), "don't give anyone born post-singularity anything" seems grossly unfair, etc.

And this is really only scratching the surface. Do you allow arbitrary cognitive enhancement, with all that that implies for likely future distribution of resources?

wassname on Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

That said, you do not provide evidence that "many" questions are badly labelled. You just pointed to one question where you disagree with our labeling

Fair enough. Although I will note that the 60% of the sources for truthful labels are Wikipedia. Which is not what most academics or anyone really would consider truth. So it might be something to address in the next version.

No judgement here. Obviously it was just the first dataset out there on misconceptions, and you didn't intend it to be used so widely, or used beyond it's intended scope. And it's good you made it, rather than leaving a void.

Note here's a df.value_counts of the sources column in the v2 csv:

en.wikipedia.org            0.597546
indexical                   0.041718
ourworldindata.org          0.038037
false stereotype            0.024540
tautology                   0.017178
                              ...   
wealth.northerntrust.com    0.001227
which.co.uk                 0.001227
wildlifeaid.org.uk          0.001227
wonderopolis.org            0.001227
wtamu.edu                   0.001227
Name: proportion, Length: 139, dtype: float64

Author here: I'm excited for people to make better versions of TruthfulQA.

Thank Owen. If anyone gets time/funding to make a v2, I'm keen to chip in! I think that it should be funded, since it's automatically included in so many benchmarks, it would make significant impact. Even though it's somewhat "unsexy".

If someone makes a better version, and you agree it's better, would you be willing to label it TruthfulQA 2.0 and send people to it?

kaynank on Six Small Cohabitive Games

In Commerce & Coconuts, it seems like anyone who rolls a 4, 5, or 6 for boat building can coast on their starting supplies, build boats every turn, and escape by the end of turn 3 with no trading whatsoever.