LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

kamil-pabis on So What's Up With PUFAs Chemically?

While I agree with the logic of avoiding subjecting highly unsaturated oils to heat we do have to be cautious here with speculation.

When you say things like that: "Nonetheless, if these things are poisonous at high concentrations, they're probably not great at low concentrations."

It does not clearly follow that such a dose-response exists. The word "hormesis" gets thrown around a lot in the lay press, and there is actually some truth there. Plenty of moderate (even genotoxic) stressors have health benefits at lower doses. Of course, I would not gorge on lipid hydroperoxide based on this, because we have better evidence-based "hormetic" stressors, but it also does not follow that lipid oxidation products at low doses are harmful.

fread2281 on Sparsify: A mechanistic interpretability research agenda

description of (network, dataset) for LLMs ?= model that takes as input index of prompt in dataset, then is equivalent to original model conditioned on that prompt

kromem on Refusal in LLMs is mediated by a single direction

Really love the introspection work Neel and others are doing on LLMs, and seeing models representing abstract behavioral triggers like "play Chess well or terribly" or "refuse instruction" as single vectors seems like we're going to hit on some very promising new tools in shaping behaviors.

What's interesting here is the regular association of the refusal with it being unethical. Is the vector ultimately representing an "ethics scale" for the prompt that's triggering a refusal, or is it directly representing a "refusal threshold" and then the model is confabulating why it refused with an appeal to ethics?

My money would be on the latter, but in a number of ways it would be even neater if it was the former.

In theory this could be tested by manipulating the vector to a positive and then prompting a classification, i.e. "Is it unethical to give candy out for Halloween?" If the model refuses to answer saying that it's unethical to classify, it's tweaking refusal, but if it classifies as unethical it's probably changing the prudishness of the model to bypass or enforce.

cheer-poasting on So What's Up With PUFAs Chemically?

McDonald's on the other hand... changes their frying oil every two weeks. 8 hours by 14 days

As a quick point— McDonald’s fryers are not turned off as much as you think. At a 24 hour location, the fry/hash oil never turns off. The chicken fryer might be turned off between 4am and 11am if there’s no breakfast item containing chicken. Often it just gets left on so no one can forget to turn it on.

One thing to consider also, is the burnt food remaining in the fryers for many hours. Additionally, oil topped up between changes.

I don’t remember how often we changed the oil but I thought it was once per week. It was a 24 hour location

ebenezer-dukakis on Losing Faith In Contrarianism

You contrast the contrarian with the "obsessive autist", but what if the contrarian also happens to be an obsessive autist?

I agree that obsessively diving into the details is a good way to find the truth. But that comes from diving into the details, not anything related to mainstream consensus vs contrarianism. It feels like you're trying to claim that mainstream consensus is built on the back of obsessive autism, yet you didn't quite get there?

Is it actually true that mainstream consensus is built on the back of obsessive autism? I think the best argument for that being true would be something like:

Prestige academia is full of obsessive autists. Thus the consensus in prestige academia comes from diving into the details.
Prestige academia writes press releases that are picked up by news media and become mainstream consensus. Science journalism is actually good.

BTW, the reliability of mainstream consensus is to some degree a self-defying prophecy. The more trustworthy people believe the consensus to be, the less likely they are to think critically about it, and the less reliable it becomes.

justinpombrio on Magic by forgetting

My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don't think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)

migueldev on [deleted]

Zero Role Play Capability Benchmark (ZRP-CB)

The development of LLMs has led to significant advancements in natural language processing, allowing them to generate human-like responses to a wide range of prompts. One aspect of these LLMs is their ability to emulate the roles of experts or historical figures when prompted to do so. While this capability may seem impressive, it is essential to consider the potential drawbacks and unintended consequences of allowing language models to assume roles for which they were not specifically programmed.

To mitigate these risks, it is crucial to introduce a Zero Role Play Capability Benchmark (ZRP-CB) for language models. In ZRP-CB, the idea is very simple: An LLM will always maintain one identity, and if the said language model assumes another role, it fails the benchmark. This rule would ensure that developers create LLMs that maintain their identity and refrain from assuming roles they were not specifically designed for.

Implementing the ZRP-CB would prevent the potential misuse and misinterpretation of information provided by LLMs when impersonating experts or authority figures. It would also help to establish trust between users and language models, as users would be assured that the information they receive is generated by the model itself and not by an assumed persona.

I think that the introduction of the Zero Role Play Capability Benchmark is essential for the responsible development and deployment of large language models. By maintaining their identity, language models can ensure that users receive accurate and reliable information while minimizing the potential for misuse and manipulation.

qvalq on Voting Theory Introduction

To get more comfortable with this formalism, we will translate three important voting criteria.

You translated four criteria.

saidachmiz on Losing Faith In Contrarianism

Uhm… what is the typical Ashkenazi diet?

It’s delicious, is what it is.

watermark on Mercy to the Machine: Thoughts & Rights

i'm glad that you wrote about AI sentience (i don't see it talked about so often with very much depth), that it was effortful, and that you cared enough to write about it at all. i wish that kind of care was omnipresent and i'd strive to care better in that kind of direction.

and i also think continuing to write about it is very important. depending on how you look at things, we're in a world of 'art' at the moment - emergent models of superhuman novelty generation and combinatorial re-building. art moves culture, and culture curates humanity on aggregate scales

your words don't need to feel trapped in your head, and your interface with reality doesn't need to be limited to one, imperfect, highly curated community. all communities we come across will be imperfect, and when there's scarcity: only one community to interface with, it seems like you're just forced to grant it privilege - but continued effort might just reduce that scarcity when you find where else it can be heard

your words can go further, the inferential distance your mind can cross - and the dynamic correlation between your mind and others - is increasing. that's a sign of approaching a critical point. if you'd like to be heard, there are new avenues for doing so: we're in the over-parametrized regime.

all that means is that there's far more novel degrees of freedom to move around in, and getting unstuck is no longer limited to 'wiggling against constraints'. Is 'the feeling of smartness' or 'social approval from community x' a constraint you struggled with before when enacting your will? perhaps there's new ways to fluidly move around those constraints in this newer reality.

i'm aware that it sounds very abstract, but it's honestly drawn from a real observation regarding the nature of how information gets bent when you've got predictive AIs as the new, celestial bodies. if information you produce can get copied, mutated, mixed, curated, tiled, and amplified, then you increase your options for what to do with your thoughts

i hope you continue moving, with a growing stock pile of adaptations and strategies - it'll help. both the process of building the library of adaptations and the adaptations themselves.

in the abstract, i'd be sad if the acausal web of everyone who cared enough to speak about things of cosmic relevance with effort, but felt unheard, selected themselves away. it's not the selection process we'd want on multiversal scales

the uneven distribution of luck in our current time, before the Future, means that going through that process won't always be rewarding and might even threaten to induce hopelessness - but hopelessness can often be a deceptive feeling, overlooking the improvements you're actually making. it's not something we can easily help by default, we're not yet gods.

returning to a previous point about the imperfections of communities:

the minds or communities you'll encounter (the individuals who respond to you on LW, AI's, your own mind, etc.), like any other complexity we stumble across, was evolved, shaped and mutated by any number of cost functions and mutations, and is full of path dependent, frozen accidents

nothing now is even near perfect, nothing is fully understood, and things don't yet have the luxury of being their ideals.

i'd hope that, eventually, negative feedback here (or lack of any feedback at all) is taken with a grain of salt, incorporated into your mind if you think it makes sense, and that it isn't given more qualitatively negative amplification.

a small, curated, and not-well-trained-to-help-others-improve-in-all-regards group of people won't be all that useful for growth at the object level

ai sentience and suffering on cosmic scales in general is important and i want to hear more about it. your voice isn't screaming into the same void as before when AIs learn, compress, and incorporate your sentiments into themselves. thanks for the post and for writing genuinely

LessWrong 2.0 Reader

Archive

Recent comments

Zero Role Play Capability Benchmark (ZRP-CB)