LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (8)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

Decent plan prize winner & highlights
lukehmiles (lcmgcd) · 2024-01-19T23:30:34.242Z · comments (2)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (21)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

Foresight Institute: 2023 Progress & 2024 Plans for funding beneficial technology development
Allison Duettmann (allison-duettmann) · 2023-11-22T22:09:16.956Z · comments (1)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

Scientific Method
Andrij “Androniq” Ghorbunov (andrij-androniq-ghorbunov) · 2024-02-18T21:06:45.228Z · comments (4)

[link] Extinction Risks from AI: Invisible to Science?
VojtaKovarik · 2024-02-21T18:07:33.986Z · comments (7)

Building Trust in Strategic Settings
StrivingForLegibility · 2023-12-28T22:12:24.024Z · comments (0)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)

[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (76)

[link] The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit · 2024-06-24T10:05:55.101Z · comments (0)

[link] Cellular respiration as a steam engine
dkl9 · 2024-02-25T20:17:38.788Z · comments (1)

[link] The absence of self-rejection is self-acceptance
Chipmonk · 2023-12-21T21:54:52.116Z · comments (1)

A conceptual precursor to today's language machines [Shannon]
Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · comments (6)

[link] Alignment work in anomalous worlds
Tamsin Leake (carado-1) · 2023-12-16T19:34:26.202Z · comments (4)

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages
Ethan Edwards · 2024-04-04T13:18:54.909Z · comments (2)

Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese (paul-colognese) · 2024-03-04T16:52:52.568Z · comments (3)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

[link] Clickbait Soapboxing
DaystarEld · 2024-03-13T14:09:29.890Z · comments (15)

A brief review of China's AI industry and regulations
Elliot_Mckernon (elliot) · 2024-03-14T12:19:00.775Z · comments (0)

[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

Become a PIBBSS Research Affiliate
Nora_Ammann · 2023-10-10T07:41:02.037Z · comments (6)

[link] Compensating for Life Biases
Jonathan Moregård (JonathanMoregard) · 2024-01-09T14:39:14.229Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

towards_keeperhood on Tuning your Cognitive Strategies

This is neat.

Did you write all that or who did?

sdm on What are the best arguments for/against AIs being "slightly 'nice'"?

One underlying idea comes from how AI mis-generalization is intended to work. If superintelligent AI systems are misaligned, does this misalignment look like an inaccurate generalization from what their overseers wanted or a deceptive goal that's entirely unrelated to anything their overseers intended to train. Levels 1-4 vs levels 5+, [LW · GW] more or less.

If the AI that's misaligned ends up 'egregiously' misaligned and doesn't care at all about anything human valuable, as Eliezer thinks, it places zero terminal value on human welfare and only trade, threats or compromise would get it to be nice.

If it's super-intelligent and you aren't, none of those apply. Hence, nothing is left for humans. If the AI is misaligned in any other way that's not an entirely arbitrary value system, then it values human survival at least a bit.

kerrigan on Understanding and avoiding value drift

Could there not be AI value drift in our favor, from a paperclipper AI to a moral realist AI?

jiro on In Praise of the Beatitudes

But some of the writing in those old sacred texts is actually really good.

The Halo Effect (pun not intended) is very strong here.

I seldom see people who say thibnngs like this also say that the writing in the Koran is actually really good, for some reason. And not because there's some difference in how good the writing in the Koran and in the Bible is.

adele-lopez-1 on A Path out of Insufficient Views

The "one weird trick" to getting the right answers is to discard all stuck, fixed points. Discard all priors and posteriors. Discard all aliefs and beliefs. Discard worldview after worldview. Discard perspective. Discard unity. Discard separation. Discard conceptuality. Discard map, discard territory. Discard past, present, and future. Discard a sense of you. Discard a sense of world. Discard dichotomy and trichotomy. Discard vague senses of wishy-washy flip floppiness. Discard something vs nothing. Discard one vs all. Discard symbols, discard signs, discard waves, discard particles.
All of these things are Ignorance. Discard Ignorance.

Is this the same principle as "non-attachment"?

jenniferrm on How I started believing religion might actually matter for rationality and moral philosophy

I'm not sure about the rest of it, but this caught my eye:

if moral realism was true, and one of the key roles of religion was to free people from trapped priors so they could recognize these universal moral truths, then at least during the founding of religions, we should see some evidence of higher moral standards before they invariably mutate into institutions devoid of moral truths.

I had a similar thought, and was trying to figure out if I could find a single good person to formally and efficiently coordinate with in a non-trivial pre-existing institution full of "safely good and sane people".

I'm still searching. If anyone has a solid lead on this, please DM me, maybe?

Something you might expect is that many such "hypothetically existing hypothetically good people" would be willing to die slightly earlier for a good enough cause (especially late in life when their life expectancy is low, and especially for very high stakes issues where a lot of leverage is possible) but they wouldn't waste lives, because waste is ceteris paribus bad, and so... so... what about martyrs who are also leaders?

This line of thinking is how I learned about Martin The Confessor, the last Pope to ever die for his beliefs.

Since 655 AD is much much earlier than 2024 AD, it would seem that Catholicism no longer "has the sauce" so to speak?

Also, slightly relatedly, I'm more glad that I otherwise might be that in this timeline the bullet missed Trump. In other very nearby timelines I'm pretty sure the whole idea of using physical courage to detect morally good leadership in a morally good group would be much more controversial than the principle is here, now, in this timeline, where no one has trapped priors about it that are being actively pumped full of energy by the media, with the creation of new social traumas, and so on...

...not that elected secular leaders of mere nation states would have any obvious formal duties to specifically be the person to benevolently serve literally all good beings as a focal point.

To get that formula to basically work, in a way that it kinda seems to work with US elections, since many US Presidents are assassinated in ways they could probably predict were possible (modulo this currently only working within the intrinsically "partial" nature of US elections, since these are merely elections for the leader of a single nation state that faces many other hostile nation states in a hobbesian world of eternal war (at least eternal war... so far! [? · GW]) ) I think one might need to hold global elections?

And... But... And this... this seems sorta do-able?!? Weirdly so!

We have the internet now. We have translation software to translate all the political statements into all the languages. We have internet money that could be used to donate to something that was worth donating to.

Why not create a "United Persons Alliance" (to play the "House of Representatives" to the UN's "Senate"?) and find out what the UPA's "Donation Weighted Condorcet Prime Minister" has to say?

I kinda can't figure out why no one has tried it yet.

Maybe it is because, logically speaking, moral realism MIGHT be true and also maybe all humans are objectively bad?

If a lot of people knew for sure that "moral realism is true but humans are universally fallen" then it might explain why we almost never "produce and maintain legibly just institutions".

Under the premises entertained here so far, IF such institutions were attempted anyway, and the attempt had security holes [? · GW], THEN those security holes would be predictably abused and it would be predictably regretted by anyone who spent money setting it up, or trusted such a thing.

So maybe it is just that "moral realism is true, humans are bad, and designing secure systems is hard and humans are also smart enough to never try to summon a real justice system"?

Maybe.

jonas-hallgren on A Path out of Insufficient Views

This does seem kind of correct to me?

Maybe you could see the fixed points that OP is pointing towards as priors in the search process for frames.

Like, your search is determined by your priors which are learnt through your upbringing. The problem is that they're often maladaptive and misleading. Therefore, working through these priors and generating new ones is a bit like relearning from overfitting or similar.

Another nice thing about meditation is that it sharpens your mind's perception which makes your new priors better. It also makes you less dependent on attractor states you could have gotten into from before since you become less emotionally dependent on past behaviour. (there's obviously more complexity here) (I'm referring to dependent origination for you meditators out there)

It's like pruning the bad data from your dataset and retraining your model, you're basically guaranteed to find better ontologies from that (or that's the hope at least).

kerrigan on The alignment stability problem

Both quotes are from your above post. Apologies for confusion.

jonas-hallgren on A Path out of Insufficient Views

I'm currently in the process of releasing more of my fixed points through meditation and man is it a weird process. It is very fascinating and that fundamental openness to moving between views seems more prevalent. I'm not sure that I fully agree with you on the all-in part but cudos for trying!

I think it probably makes sense to spend earlier years doing this cognition training and then using that within specific frames to gather the bits of information that you need to solve problems.

Frames are still useful to gather bits of information through so don't poopoo the mind!

Otherwise, it was very interesting to hear about your journey!

richard_kennaway on Non-human centric view of existence

Well, why want anything? Why not just be dead? Peace of mind [LW · GW] guaranteed for ever.