LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

Building Trust in Strategic Settings
StrivingForLegibility · 2023-12-28T22:12:24.024Z · comments (0)

[link] Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results
Iknownothing · 2024-01-15T19:37:07.984Z · comments (0)

Even if we lose, we win
Morphism (pi-rogers) · 2024-01-15T02:15:43.447Z · comments (17)

Foresight Institute: 2023 Progress & 2024 Plans for funding beneficial technology development
Allison Duettmann (allison-duettmann) · 2023-11-22T22:09:16.956Z · comments (1)

[link] Alignment work in anomalous worlds
Tamsin Leake (carado-1) · 2023-12-16T19:34:26.202Z · comments (4)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

[link] Extinction Risks from AI: Invisible to Science?
VojtaKovarik · 2024-02-21T18:07:33.986Z · comments (7)

Paper Summary: The Koha Code - A Biological Theory of Memory
jakej (jake-jenks) · 2023-12-30T22:37:13.865Z · comments (2)

My Alignment "Plan": Avoid Strong Optimisation and Align Economy
VojtaKovarik · 2024-01-31T17:03:34.778Z · comments (9)

[link] Compensating for Life Biases
Jonathan Moregård (JonathanMoregard) · 2024-01-09T14:39:14.229Z · comments (6)

[link] AI Alignment [Progress] this Week (11/05/2023)
Logan Zoellner (logan-zoellner) · 2023-11-07T13:26:21.995Z · comments (0)

5 psychological reasons for dismissing x-risks from AGI
Igor Ivanov (igor-ivanov) · 2023-10-26T17:21:48.580Z · comments (6)

A bet on critical periods in neural networks
kave · 2023-11-06T23:21:17.279Z · comments (1)

[link] The absence of self-rejection is self-acceptance
Chipmonk · 2023-12-21T21:54:52.116Z · comments (1)

Technology path dependence and evaluating expertise
bhauth · 2024-01-05T19:21:23.302Z · comments (2)

[link] Eric Schmidt on recursive self-improvement
nikola (nikolaisalreadytaken) · 2023-11-05T19:05:15.416Z · comments (3)

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages
Ethan Edwards · 2024-04-04T13:18:54.909Z · comments (2)

[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)

aintelope project update
Gunnar_Zarncke · 2024-02-08T18:32:00.000Z · comments (2)

2. Premise two: Some cases of value change are (il)legitimate
Nora_Ammann · 2023-10-26T14:36:53.511Z · comments (7)

Utility is not the selection target
tailcalled · 2023-11-04T22:48:20.713Z · comments (1)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (76)

Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)

An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)

A conceptual precursor to today's language machines [Shannon]
Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · comments (6)

Weeping Agents
pleiotroth · 2024-06-06T12:18:54.978Z · comments (2)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (3)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

[link] Tokyo AI Safety 2025: Call For Papers
Blaine (blaine-rogers) · 2024-10-21T08:43:38.467Z · comments (0)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)

[link] The unreasonable effectiveness of plasmid sequencing as a service
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-08T02:02:55.352Z · comments (0)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

Rashomon - A newsbetting site
ideasthete · 2024-10-15T18:15:02.476Z · comments (8)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

Who is Harry Potter? Some predictions.
Donald Hobson (donald-hobson) · 2023-10-24T16:14:17.860Z · comments (7)

Extinction-level Goodhart's Law as a Property of the Environment
VojtaKovarik · 2024-02-21T17:56:02.052Z · comments (0)

Bent or Blunt Hoods?
jefftk (jkaufman) · 2024-01-09T20:10:11.545Z · comments (0)

On excluding dangerous information from training
ShayBenMoshe (shay-ben-moshe) · 2023-11-17T11:14:54.847Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

satchlj on Arithmetic is an underrated world-modeling technology

Calculations on Hydroelectric Energy Storage

For those interested in the numbers on pumped hydroelectric storage, we can get more energy by increasing 'head' or the distance that the weight falls, from 6 meters to up to 500 meters for some of the largest projects (and we could in theory go bigger).

Let's pick a more reasonable number like 60 meters:

MASS/house = 15 kWh/house / (9.8 m/s² × 60 m) = 91,836 kg/house = 91 m^3/house

Let's say we have a dam with ~20 meters of water level fluctuation (drawdown). Then that's 5 m^2 per house of surface area.

As a sanity check, Bath County Pumped Storage Station in VA stores about 24000 MWh/ 30 KWh/house = 800,000 houses worth of energy.

800,000 houses * 5 m^2 = 4 km^2

The Bath County reservoir is about 1km^2 so we're in the right range here (the reservoir has a little more drawdown and a way bigger head).

charlie-steiner on Resolving von Neumann-Morgenstern Inconsistent Preferences

Nice, especially the second half.

To me, this post makes it seem like the natural place to push forward is to try and restrict incoherent policies over lotteries until they're just barely well-behaved - either on the behavior end (continuity etc) or the algorithmic end (compte budgets etc).

lorxus on D&D Sci Coliseum: Arena of Data

I'm going to start by attacking this a little on my own before I even look much at what other people have done.

Some initial observations from the SQL+Python practice this gave me a good excuse to do:

Adelon looks to have rough matchups against Elf Monks. Which we don't have. They are however soft to even level 3-4 challengers sometimes. Maybe Monks and/or Fencers have an edge on Warriors?
Bauchard seems to have particularly strong matchups against other Knights, so we don't send Velaya there. They seem a little soft to Monks and to Dwarf Ninjas and especially to Knights, so maybe Zelaya? Boots should help here.
Cadagal has precious few defeats, but one of them might be to a level 2(!) Human Warrior with fancy +3 Gauntlets. Though it seems like there's a lot of combats where some Cadagal-like fighter has +4 Boots instead? Not sure if that's the same guy.
- And on that note, the max level is 7, and the max bonus for Boots and Gauntlets both is +4.
- Max Boots (+4) is always on a level 7 Elf Ninja with +3 Gauntlets (but disappears altogether most of the way through the dataset).
- Max Gauntlets (+4) is on either a level 7 Dwarf Monk who upgraded from +1 Boots to +3 Boots halfway through, or else there's two of them. Thankfully we're not facing them.
Deepwrack poses problems. They have just as few defeats, and one of them even contradicts the ordering I derived below! Ninjas are meant to lose to Monks. Maybe the speed matters a lot in that case?
It looks like a strict advantage in level or gear - holding all else constant - means you win every time. If everything is totally identical, you win about half the time. (Which seems obvious but worth checking.)
Looking through upsets - bouts where the classes are different, the losing fighter had at least 2 levels on the winner, and the loser's gear was no better than the winner's - we generally see that:
- Fencers beat Monks and Rangers and lose to Knights, Ninjas, and Warriors
- Knights beat Fencers and Ninjas, tie(???) with Monks and Warriors, and lose (weakly) to Rangers
- Monks beat Ninjas, Rangers, and maybe Warriors, tie (?) with Knights, and lose to Fencers
- Ninjas beat Fencers and (weakly) Rangers, and lose to Knights, Monks, and Warriors
- Rangers beat Knights (weakly), Ninjas, and Warriors, tie with Fencers, and lose to Monks
- Warriors beat Fencers, Ninjas, tie(?) with Knights, and lose to Rangers and maybe Monks

So my current best guess (pending understanding which gear is best for which class/race) is:

Willow v Adelon, Varina v Bauchard, Xerxes v Cadagal, Yalathinel v Deepwrack.

If I had to guess what gear to give to who: Warrior v Knight is a rough matchup, so Varina's going to need the help; the rest of my assignments are based thus far on ~vibes~ for whether speed or power will matter more for the class. Thus:

Willow gets +2 Boots and +1 Gauntlets, Varina gets +4 Boots and +3 Gauntlets, Xerxes gets +1 Boots and +2 Gauntlets, and Yalathinel gets +3 Boots.

Some theories I need to test:

Race affects how good you are at a class. Elves might be best at rangering, say.
Race and/or class affect how much benefit you get out of boots and/or gauntlets. Being a warrior might mean you get full benefit from gauntlets but none from boots.
Color might affect how well classes do. Ninjas wearing red might win way less often.
- The color does not actually seem to affect ninjas all that much if at all - 6963 vs 6762 wins. Could still be a tiebreaker?
- Color doesn't affect things much overall either: 40136 vs 39961 wins.
There's some rank-ordering of class+race+level matchups, maybe an additive one.
- Alternatively there could be some nontransitive thing going on with tiebreaks sometimes from levels, races, and gear?
- On further reflection that totally seems to be what's going on here.
- Maybe there's something about the matchup ordering being sorted over (race, class)? D's loss (as a L6 Dwarf Monk) to a L4 Dwarf Ninja is... unexpected to say the least!

Wild speculation:

If you [use the +4 Boots in combat and beat Cadagal then they'll know you were] responsible [for] ????? ?????? [Boots from his/her/the] House. [You will gain its] lasting enmity, [and] [people? will?] ???????? ???? ??? ???? ?? ??? ???? ??????? ?? ?? ????? ?? ????????? ?? ??? [upon] your honor [if] ????????? ???? ?? ???? ??? ???? ??? ??? friendship ???? ?? ??? ???? ?? ??? ?????? ??????? ?? ?? ?????.
- So maybe we're OK to use the +4 Boots as long as it's not against Cadagal?
- No idea how to even guess at what's going on in that second sentence apart from "bad things will happen and everyone will hate you, you dirty thief".

declan-molony on Conversational Signposts—An Antidote to Dull Social Interactions

Sure! Here are two of my favorites.

(1) From Leil Lowndes' book:

Don't ask what they do. In the US in my experience, the most common question upon meeting someone is "what do you do?" But the problem with this is that while 65% of Americans are satisfied with their jobs, only 20% of Americans are passionate about their work. From Lowndes:

If you instead ask, "How do you enjoy spending most of your time?" It allows people to mention their job or their hobbies. And homemakers are no longer embarrassed to say, "I'm just a mom" to the question of "what do you do?"

(2) From Dale Carnegie's book:

Never disagree and say "you're wrong". I am a naturally disagreeable person. Learning about this technique hasn't made me more agreeable, I just express my disagreement differently now. From Carnegie:

Never announce, "I am going to prove so-and-so to you." That's bad. That's tantamount to saying: "I am smarter than you are and am going to make you change your mind."
"We sometimes find ourselves changing our minds without any resistance; but if we are told we are wrong, we resent the imputation and harden our hearts. We are heedless in the formation of our beliefs, but find ourselves filled with a passion for them when anyone proposes to rob us of their companionship. It is obviously not the ideas themselves that are dear to us, but our self-esteem which is threatened."—James Harvey Robinson

I've adopted a more indirect way of challenging people's beliefs. Rather than stating my disagreement, I tend to ask questions (à la the Socratic Method) to get to the root of somebody's belief. Sometimes they'll notice contradictions in their own arguments without me having to point them out.

elizabeth-1 on Elizabeth's Shortform

Some related posts:

one example [LW(p) · GW(p)] among many of a long runway letting me make more moral choices
ongoing twitter thread on frying pan agency

cubefox on Conversational Signposts—An Antidote to Dull Social Interactions

Ah. So I guess you should neither ask just questions, nor just always talk about yourself, but rather balance both.

Praising questions is an interesting tip to ease-in more introverted people. This reminds me of another strategy a sociable colleague of mine used on shy people: He made playful jokes about his conversation partner ("teasing"), encouraging them to be bold and "hit back" with something, and then laugh about their joke. Which instantly empowered them. This probably works best when both are men. Making self-deprecating jokes is a safer option, though not as effective.

evolutionbydesign on Advice on Communicating Concisely

Thank you for your response!

There were 3 situations today where the second case arose:

I asked whether GTP was less stable than ATP in AP Bio (I wanted to understand how it was possible that GTP + ADP -> GDP + ATP, and if this was the case, why ATP was used at all). My teacher thought I was asking about it from an evolutionary standpoint, and started explaining the evolution of the system.
I have a tendency to speak in short bursts of content in which I think I often underestimate inferential distances. (My internal monologue normally consists of vaguely used phrases + a sense of what I mean, which I think this carries over into my speech).
I'm realizing that I find it difficult to phrase sentences correctly. I rewrite almost every sentence I type many times over because I'm not sure about grammar and word arrangement (I spent a minute thinking about whether I should wrap "many times over" in a pair of commas at the start of this sentence).

johnswentworth on Some Rules for an Algebra of Bayes Nets

Here's a new Bookkeeping Theorem, which unifies all of the Bookkeeping Rules mentioned (but mostly not proven) in the post, as well as all possible other Bookkeeping Rules.

If all distributions which factor over Bayes net also factor over Bayes net $G_{2}$ , then all distributions which approximately factor over $G_{1}$ also approximately factor over $G_{2}$ . Quantitatively:

$D_{K L} (P [X] | | \prod_{i} P [X_{i} | X_{p a^{1} (i)}]) \geq D_{K L} (P [X] | | \prod_{i} P [X_{i} | X_{p a^{2} (i)}])$

where $p a^{j} (i)$ indicates parents of variable $i$ in $G_{j}$ .

Proof: Define the distribution $Q [X] := \prod_{i} P [X_{i} | X_{p a^{1} (i)}]$ . Since $Q [X]$ exactly factors over $G_{1}$ , it also exactly factors over $G_{2}$ : $Q [X] = \prod_{i} Q [X_{i} | X_{p a^{2} (i)}]$ . So

$D_{K L} (P [X] | | \prod_{i} P [X_{i} | X_{p a^{1} (i)}]) = D_{K L} (P [X] | | Q [X])$

$= D_{K L} (P [X] | | \prod_{i} Q [X_{i} | X_{p a^{2} (i)}])$

Then by the factorization transfer rule (from the post):

$\geq D_{K L} (P [X] | | \prod_{i} P [X_{i} | X_{p a^{2} (i)}])$

which completes the proof.

bogdan-ionut-cirstea on The case for unlearning that removes information from LLM weights

Information you should probably remove from the weights

Perhaps it might also be useful to remove information which might reduce the likelihood that 'A TAI which kills all humans might also doom itself' (especially in short timelines/nearcast scenarios).

raemon on Arithmetic is an underrated world-modeling technology

Curated. This post hit me exactly at a moment where I was trying to become a more "numerate person" (i.e. gain some skills where it was easier to convert problems into 'things with numbers' and then solve the problems with simple math).

I liked the worked examples of the post, and I appreciated the emphasis on why it's important to keep track of units (which I had roughly known, but seeing how it played out in the examples drove the point home more).