LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

akram-choudhary on Please stop publishing ideas/insights/research about AI

Daniel, your interpretation is literally contradicted by Eliezer's exact words. Eliezer defines dignity as that which increases our chance of survival.

""Wait, dignity points?" you ask. "What are those? In what units are they measured, exactly?"

And to this I reply: Obviously, the measuring units of dignity are over humanity's log odds of survival - the graph on which the logistic success curve is a straight line. A project that doubles humanity's chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity."

niplav on Thomas Kwa's Shortform

Thank you a lot! Strong upvoted.

I was wondering a while ago whether Bayesianism says anything about how much my probabilities are "allowed" to oscillate around—I was noticing that my probability of doom was often moving by 5% in the span of 1-3 weeks, though I guess this was mainly due to logical uncertainty and not empirical uncertainty.

bogdan-ionut-cirstea on Mechanistically Eliciting Latent Behaviors in Language Models

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space seems to be using a contrastive approach for steering vectors (I've only skimmed though), it might be worth having a look.

tigerlily on How would you navigate a severe financial emergency with no help or resources?

Thank you for this. I'm not eligible for it but I will send it to my sister who is. She needs emergency dental work but the health insurance plan offered through her employer doesn't cover it so she's just been suffering through the pain. So really, thank you. She will be so glad.

martinsq on Martín Soto's Shortform

Claude learns across different chats. What does this mean?

I was asking Claude 3 Sonnet "what is a PPU" in the context of this thread [LW(p) · GW(p)]. For that purpose, I pasted part of the thread.

Claude automatically assumed that OA meant Anthropic (instead of OpenAI), which was surprising.

I opened a new chat, copying the exact same text, but with OA replaced by GDM. Even then, Claude assumed GDM meant Anthropic (instead of Google DeepMind).

This seemed like interesting behavior, so I started toying around (in new chats) with more tweaks to the prompt to check its robustness. But from then on Claude always correctly assumed OA was OpenAI, and GDM was Google DeepMind.

In fact, even when copying in a new chat the exact same original prompt (which elicited Claude to take OA to be Anthropic), the mistake no longer happened. Neither when I went for a lot of retries, nor tried the same thing in many different new chats.

Does this mean Claude somehow learns across different chats (inside the same user account)?
If so, this might not happen through a process as naive as "append previous chats as the start of the prompt, with a certain indicator that they are different", but instead some more effective distillation of the important information from those chats.
Do we have any information on whether and how this happens?

(A different hypothesis is not that the later queries had access to the information from the previous ones, but rather that they were for some reason "more intelligent" and were able to catch up to the real meanings of OA and GDM, where the previous queries were not. This seems way less likely.)

I've checked for cross-chat memory explicitly (telling it to remember some information in one chat, and asking about it in the other), and it acts is if it doesn't have it.
Claude also explicitly states it doesn't have cross-chat memory, when asked about it.
Might something happen like "it does have some chat memory, but it's told not to acknowledge this fact, but it sometimes slips"?

Probably more nuanced experiments are in order. Although note maybe this only happens for the chat webapp, and not different ways to access the API.

tigerlily on How would you navigate a severe financial emergency with no help or resources?

Thank you for the thoughtful suggestions. Aella is exemplary but camgirling strikes me as a nightmare.

I have considered making stuff, like custom glasses/premium drinkware, and selling on Etsy but the market seems saturated and I've never had the money to buy the equipment to learn the skills required to do this kind of thing.

I am certified in Salesforce and could probably get hired helping to manage the Salesforce org for my tribe (Cherokee Nation) but would have to move to Oklahoma.

I've applied for every grant I can find that I'm eligible for, but there's not much out there and the competition is stiff.

We will figure out something, I'm sure. If we don't, there's nothing standing between us and homelessness and that reality fills me with anger and despair.

I feel like there's nothing society wants from me, so there's no way for me to convince society that I deserve anything from it.

It's so hard out here.

martinsq on William_S's Shortform

What's PPU?

crissman on LessOnline Festival Updates Thread

Health and longevity blogger from Unaging.com here. I've submitted talks on optimal diet, optimal exercise, how to run sub 3:30 for your first marathon, and sugar is fine -- fight me!

Looking forward to extended, rational health discussions!

remmelt-ellen on "Open Source AI" is a lie, but it doesn't have to be

Although the training process, in theory, can be wholly defined by source code, this is generally not practical, because doing so would require releasing (1) the methods used to train the model, (2) all data used to train the model, and (3) so called “training checkpoints” which are snapshots of the state of the model at various points in the training process.

Exactly. Without the data, the model design cannot be trained again, and you end up fine-tuning a black box (the "open weights").

Thanks for writing this.

alexander-gietelink-oldenziel on Thomas Kwa's Shortform

Interesting...

Wouldn't I expect the evidence to come out in a few big chunks, e.g. OpenAI releasing a new product?

LessWrong 2.0 Reader

Archive

Recent comments