LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (22)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christian-z-r on D&D Sci Coliseum: Arena of Data

A thanks a lot. I was actually working through the earlier scenarios, I just missed that I new one had popped up. Subscribed now, then I will hopefully notice the next one.

Also, my approach didn't work this time, I ended up trying with a way too complicated model. I really like how the actual answer to this one worked.

avturchin on avturchin's Shortform

Lifehack: If you're attacked by a group of stray dogs, pretend to throw a stone at them. Each dog will think you're throwing the stone at it and will run away. This has worked for me twice.

cousin_it on The Alignment Trap: AI Safety as Path to Power

Yeah, this is my main risk scenario. But I think it's more correct to talk about imbalance of power, not concentration of power. Maybe there will be one AI dictator, or one human+AI dictator, or many AIs, or many human+AI companies; but anyway most humans will end up at the bottom of a huge power differential. If history teaches us anything, this is a very dangerous prospect.

It seems the only good path is aligning AI to the interests of most people, not just its creators. But there's no commercial or military incentive to do that, so I don't see it happening, at least not by default.

james-chua on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

author on Binder et al. 2024 here. Thanks for reading our paper and suggesting the experiment!

To summarize the suggested experiment:

Train a model to be calibrated on whether it gets an answer correcct.
Modify the model (e.g. activation steering). This changes the model's performance on whether it gets an answer correct.
Check if the modified model is still well calibrated.

This could work and I'm excited about it.

One failure mode is that the modification makes the model very dumb in all instances. Then its easy to be well calibrated on all these instances -- just assume the model is dumb. An alternative is to make the model do better on some instances (by finetuning?), and check if the model is still calibrated on those too.

remmelt-ellen on Why Stop AI is barricading OpenAI

Noticing no response here after we addressed superficial critiques and moved to discussing the actual argument.

For those few interested in questions raised above, Forrest wrote some responses: http://69.27.64.19/ai_alignment_1/d_241016_recap_gen.html

The claims made will feel unfamiliar and the reasoning paths too. I suggest (again) taking the time to consider what is meant. If a conclusion looks intuitively wrong from some AI Safety perspective, it may be valuable to explicitly consider the argumentation and premises behind that.

tropicalfruit on Dating Roundup #1: This is Why You’re Single

Same. It would take incredible effort to find one person I reasonably connect with each year.

So much of this is just location. I've met 100s of people over the last few years. Nearly all either over 40 with kids, or those kids. I've connected with many, maybe 10%, on a pretty good level. That doesn't help with dating at all.

I just really, really don't want it to be the case that he only answer is: move to NY, SF, or Seattle, becuase I really like it here.

tailcalled on Three Notions of "Power"

However, though dominance is hard-coded, it seems like something of a simple evolved hack to avoid costly fights among relatively low-cognitive-capability agents; it does not seem like the sort of thing which more capable agents (like e.g. future AI, or even future more-intelligent humans) would rely on very heavily.

This seems exactly reversed to me. It seems to me that since dominance underlies defense, law, taxes and public expenditure, it will stay crucial even with more intelligent agents. Conversely, as intelligence becomes "too cheap to meter", "getting what you want" will become less bottlenecked on relevant insights, as those insights are always available.

green_leaf on Habryka's Shortform Feed

I use Google Chrome on Ubuntu Budgie and it does look to me like both the font and the font size changed.

saidachmiz on Habryka's Shortform Feed

Well, let’s see. Calibri is a humanist sans; Gill Sans is technically also humanist, but more more geometric in design. Geometric sans fonts tend to be less readable when used for body text.

Gill Sans has a lower x-height than Calibri. That (obviously) is the cause of all the “the new font looks smaller” comments.

(A side-by-side comparison of the fonts, for anyone curious, although note that this is Gill Sans MT Pro, not Gill Sans Nova, so the weight [i.e., stroke thickness] will be a bit different than the version that LW now uses.)

Now, as far as font rendering goes… I just looked at the site on my Windows box (adjusting the font stack CSS value to see Gill Sans Nova again, since I see you guys tweaked it to give Calibri priority)… yikes. Yeah, that’s not rendering well at all. Definitely more blurry than Calibri. Maybe something to do with the hinting, I don’t know. (Not really surprising, since Calibri was designed from the beginning to look good on Windows.) And I’ve got a hi-DPI monitor on my Windows machine…

Interestingly, the older version of Gill Sans (seen in the demo on my wiki, linked above) doesn’t have this problem; it renders crisply on Windows. (Note that this is not the flawed, broken-kerning version of the font that comes with Macs!)

I also notice that the comment font size is set to… 15.08px. Seems weird? Bumping it up to 16px improves things a bit, although it’s still not amazing.

If you can switch to the older (but not broken) version of Gill Sans, that’d be my recommendation.

If you can’t… then one option might be to check out one of the many similar fonts to see if perhaps one of them renders better on Windows while still having matching metrics.

habryka4 on Habryka's Shortform Feed

Sure, I was just responding to this literal quote:

Couldn't you please just set the comment font to the same as the post font?