LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

AMA: Earning to Give
jefftk (jkaufman) · 2023-11-07T16:20:10.972Z · comments (8)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Self-Blinded L-Theanine RCT
niplav · 2023-10-31T15:24:57.717Z · comments (12)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (5)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

[link] The Long-Term Future Fund is looking for a full-time fund chair
Linch · 2023-10-05T22:18:53.720Z · comments (0)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:20.031Z · comments (20)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[question] Intelligence Enhancement (Monthly Thread) 13 Oct 2023
Nicholas / Heather Kross (NicholasKross) · 2023-10-13T17:28:37.490Z · answers+comments (40)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

[link] The Good Balsamic Vinegar
jenn (pixx) · 2024-01-26T19:30:57.435Z · comments (4)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (5)

Will 2024 be very hot? Should we be worried?
A.H. (AlfredHarwood) · 2023-12-29T11:22:50.200Z · comments (12)

[link] how birds sense magnetic fields
bhauth · 2024-06-27T18:59:35.075Z · comments (4)

[link] Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke · 2024-06-22T07:53:38.989Z · comments (0)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (12)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (22)

Polysemantic Attention Head in a 4-Layer Transformer
Jett (jett) · 2023-11-09T16:16:35.132Z · comments (0)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

OpenAI-Microsoft partnership
Zach Stein-Perlman · 2023-10-03T20:01:44.795Z · comments (19)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

Measuring Structure Development in Algorithmic Transformers
Micurie (micurie) · 2024-08-22T08:38:02.140Z · comments (4)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

The Assumed Intent Bias
silentbob · 2023-11-05T16:28:03.282Z · comments (13)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

[link] A model of research skill
L Rudolf L (LRudL) · 2024-01-08T00:13:12.755Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

michael-cohn on Slave Morality: A place for every man and every man in his place

This description does a good job of providing two kinds of evocative theme but I think it doesn't draw out the connections or distinctions that need to be clarified when people are interacting with is vs ought, or perhaps with cosmos vs. society. When describing everyday life as a physical object in the universe, I think a rock-bottom existentialism is obviously right: The universe does not owe you anything. God isn't going to punish your oppressors or reward you for being good, and God also isn't going to punish you if you get what you want by being arrogant and doing things that are shocking. If that's master morality, then fine.

But when we form a society, we can choose to make a world where people who follow the rules are rewarded. We can describe obligations for people to participate and belong, and actively put energy into the system to buffer those people from random shocks. I can see how that reads as slave morality from the perspective of someone who's an unreflective participant in it, but I think that's also an unsatisfying way to describe an active project to defy the nature of the universe and, through work, create a world more to our liking.

So in sum I think your description here needs to clarify when this kind of be-good-and-you'll-get-a-cookie living is bad simply because it's inaccurate (God is not going to give you a cookie) and when you think it's wrong because, even if people establish an enclave in which rules are enforced and people are taken care of, that diminishes the human spirit or is actively harmful to its participants.

It might also just be a matter of hierarchical perception. To use a smaller scale example:

On a day-to-day level I do my chores around the house even when I don't feel like it, because if I do what I promised then my housemates will also do what they promised, and I'll get the benefits of a functional place to live. That's a quid-pro-quo with an external system but I wouldn't call that slave morality.
If my housemates arbitrarily stopped doing their chores, I'd go and remind them them we'd all agreed to do our chores. That's expecting others to fulfill their obligations, but I wouldn't call that slave morality.
If they simply refused to do their chores and nothing I did could move them, and my response to that was to rend my garments and cry out to God that this was cosmically wrong and I refused to go on living, that would be slave morality. But if my reaction was to leave and find another place to live where people do their damn chores, I'd call that being a master.

niark on How I started believing religion might actually matter for rationality and moral philosophy

You feel your limbs weird, and what? What implication are you trying to point to? How should your experience or insight contribute to rationality and moral philosophy?

The reasoning is quite basic actually.

You believe, for decades, that X happening would make absolutely no rational sense.
X happens.
You are shocked. You realize your rationality was lacking.
You didn't thought your rationality could be lacking in this way.
This meta-fact is important for rationality.

Yes, we have inherent biases but [...] they are not resolved by merely being aware of them

Except if the biases are "fixable"*. Suppose they are. Then you need to work on them. But to do so, it's pretty logical that you need to be aware of them first. The emphasis on the awareness ensues.

*somehow, partially, and with lot of efforts.

I hope it's somehow clearer!

cole-wyeth on Work with me on agent foundations: independent fellowship

I have been thinking about extending the AIXI framework from reward to more general utility functions, and working out some of the math, would be happy to chat if that's something you're interested in. I am already supported by the LTFF (for work on embedded agency) so can't apply to the job offer currently. But maybe I can suggest some independent researchers who might be interested.

thomascederborg on The case for more Alignment Target Analysis (ATA)

I interpret your comment as a prediction regarding where new alignment target proposals will come from. Is this correct?

I also have a couple of questions about the linked text:

How do you define the difference between explaining something and trying to change someone's mind? Consider the case where Bob is asking a factual question. An objectively correct straightforward answer would radically change Bob's entire system of morality, in ways that the AI can predict. A slightly obfuscated answer would result in far less dramatic changes. But those changes would be in a completely different direction (compared to the straightforward answer). Refusing to answer, while being honest about the reason for refusal, would send Bob into a tailspin. How certain are you that you can find a definition of Acceptable Forms of Explanation that holds up in a large number of messy situations along these lines? See also this. [LW(p) · GW(p)]

And if you cannot define such things in a solid way, how do you plan to define ``benefit humanity''? PCEV was an effort to define ``benefit humanity''. And PCEV has been found to suffer from at least one difficult-to-notice problem [LW · GW]. How certain are you that you can find a definition of ``benefit humanity'' that does not suffer from some difficult-to-notice problem?

PS:

Speculation regarding where novel alignment target proposals are likely to come from are very welcome. It is a prediction of things that will probably be fairly observable fairly soon. And it is directly relevant to my work. So I am always happy to hear this type of speculation.

thomascederborg on The case for more Alignment Target Analysis (ATA)

Let's reason from the assumption that you are completely right. Specifically, let's assume that every possible Sovereign AI Project (SAIP) would make things worse in expectation. And let's assume that there exists a feasible Better Long Term Solution (BLTS).

In this scenario ATA would still only be a useful tool for reducing the probability of one subset of SAIPs (even if all SAIPs are bad some designers might be unresponsive to arguments, some flaws might not be realistically findable, etc). But it seems to me that ATA would be one complementary tool for reducing the overall probability of SAIP. And this tool would not be easy to replace with other methods. ATA could convince the designers of a specific SAIP that their particular project should be abandoned. If ATA results in the description of necessary features, then it might even help a (member of a) design team see that it would be bad if a secret project were to successfully hit a completely novel, unpublished, alignment target (for example along the lines of this necessary Membrane formalism feature [LW · GW]).

ATA would also be a project where people can collaborate despite almost opposite viewpoints on the desirability of SAIP. Consider Bob who mostly just wants to get some SAIP implemented as fast as possible. But Bob still recognizes the unlikely possibility of dangerous alignment targets with hidden flaws (but he does not think that this risk is anywhere near large enough to justify waiting to launch a SAIP). You and Bob clearly have very different viewpoint regarding how the world should deal with AI. But there is actually nothing preventing you and Bob from cooperating on a risk reduction focused ATA project.

This type of diversity of perspectives might actually be very productive for such a project. You are not trying to build a bridge on a deadline. You are not trying to win an election. You do not have to be on the same page to get things done. You are trying to make novel conceptual progress, looking for a flaw of an unknown type.

Basically: reducing the probability of outcomes along the lines of the outcome implied by PCEV [LW · GW] is useful according to a wide range of viewpoints regarding how the world should deal with AI. (there is nothing unusual about this general state of affairs. Consider for example Dave and Gregg who are on opposite sides of a vicious political trench war over the issue of pandemic lockdowns. There is nothing on the object level that prevents them from collaborating on a vaccine research effort. So this feature is certainly not unique. But I still wanted to highlight the fact that a risk mitigation focused ATA project does have this feature)

thomascederborg on The case for more Alignment Target Analysis (ATA)

I think I see your point. Attempting to design a good alignment target could lead to developing intuitions that would be useful for ATA. A project trying to design an alignment target might result in people learning skills that allows them to notice flaws in alignment targets proposed by others. Such projects can therefore contribute to the type of risk mitigation that I think is lacking. I think that this is true. But I do not think that such projects can be a substitute for an ATA project with a risk mitigation focus.

Regarding Orthogonal:

It is difficult for me to estimate how much effort Orthogonal spends on different types of work. But it seems to me that your published results are mostly about methods for hitting alignment targets. This also seems to me to be the case for your research goals. If you are successful, it seems to me that your methods could be used to hit almost any alignment target (subject to constraints related to finding individuals that want to hit specific alignment targets).

I appreciate you engaging on this, and I would be very interested in hearing more about how the work done by Orthogonal could contribute to the type of risk mitigation effort discussed in the post. I would, for example, be very happy to have a voice chat with you about this.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

I suspect current approaches probably significantly or even drastically under-elicit automated ML research capabilities.

I'd guess the average cost of producing a decent ML paper is at least 10k$ (in the West, at least) and probably closer to 100k's $.

In contrast, Sakana's AI scientist cost on average 15$/paper and .50$/review. PaperQA2, which claims superhuman performance at some scientific Q&A and lit review tasks, costs something like 4$/query. Other papers with claims of human-range performance on ideation or reviewing also probably have costs of <10$/idea or review.

Even the auto ML R&D benchmarks from METR or UK AISI don't give me at all the vibes of coming anywhere near close enough to e.g. what a 100-person team at OpenAI could accomplish in 1 year, if they tried really hard to automate ML.

A fairer comparison would probably be to actually try hard at building the kind of scaffold which could use ~10k$ in inference costs productively. I suspect the resulting agent would probably not do much better than with 100$ of inference, but it seems hard to be confident. And it seems harder still to be confident about what will happen even in just 3 years' time, given that pretraining compute seems like it will probably grow about 10x/year and that there might be stronger pushes towards automated ML.

This seems pretty bad both w.r.t. underestimating the probability of shorter timelines and faster takeoffs, and in more specific ways too. E.g. we could be underestimating by a lot the risks of open-weights Llama-3 (or 4 soon) given all the potential under-elicitation.

tailcalled on tailcalled's Shortform

One of the big open questions that the LDSL sequence hasn't addressed yet is, what starts all the lognormals and why are they so commensurate with each other. So far, the best answer I've been able to come up with is a thermodynamic approach (hence my various recent comments about thermodynamics). The lognormals all originate as emanations from the sun, which is obviously a higher power. They then split up and recombine in various complicated ways.

As for destiny: The sun throws in a lot of free energy, which can be developed in various ways, increasing entropy along the way. But some developments don't work very well, e.g. self-sabotaging (fire), degenerating (parasitism leading to capabilities becoming vestigial), or otherwise getting "stuck". But it's not all developments that get stuck, some developments lead to continuous progress (sunlight -> cells -> eukaryotes -> animals -> mammals -> humans -> society -> capitalism -> ?).

This continuous progress is not just accidental, but rather an intrinsic part of the possibility landscape. For instance, eyes have evolved in parallel to very similar structures, and even modern cameras have a lot in common with eyes. There's basically some developments that intrinsically unblock lots of derived developments while preferentially unblocking developments that defend themselves over developments that sabotage themselves. Thus as entropy increases, such developments will intrinsically be favored by the universe. That's destiny.

Critically, getting people to change many small behaviors in accordance with long explanations contradicts destiny because it is all about homogenizing things and adding additional constraints whereas destiny is all about differentiating things and releasing constraints.

irenictruth on How I started believing religion might actually matter for rationality and moral philosophy

The next post is Secular interpretations of core perennialist claims [LW · GW]. Zhukeepa should edit the main text to explicitly link to it rather than just mentioning that it exists. (Or people could upvote this comment so it's at the top. I don't object to more good karma.)

lao-mein on A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)

Interesting that GPT4o is so bad at math and tokenizes large numbers like this. I wonder if adding commas would improve performance?