LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

Weeping Agents
pleiotroth · 2024-06-06T12:18:54.978Z · comments (2)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages
Ethan Edwards · 2024-04-04T13:18:54.909Z · comments (2)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (1)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

[LDSL#2] Latent variable models, network models, and linear diffusion of sparse lognormals
tailcalled · 2024-08-09T19:57:56.122Z · comments (2)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

[link] [EA xpost] The Rationale-Shaped Hole At The Heart Of Forecasting
dschwarz · 2024-04-02T17:40:44.278Z · comments (2)

The Garden of Eden
Alexander Turok · 2024-07-22T16:07:42.509Z · comments (2)

[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

Луна Лавгуд и Комната Тайн, Часть 1
Kongo Landwalker (kongo-landwalker) · 2024-05-26T22:17:17.137Z · comments (0)

[link] Masculinity—A Case For Courage
James Stephen Brown (james-brown) · 2024-06-04T00:04:48.411Z · comments (0)

AI #77: A Few Upgrades
Zvi · 2024-08-20T00:20:09.717Z · comments (3)

Blessed information, garbage information, cursed information
tailcalled · 2024-04-18T16:56:17.370Z · comments (8)

Disentangling Competence and Intelligence
Robert Kralisch (nonmali-1) · 2024-04-29T00:12:50.779Z · comments (7)

[link] Is There Really a Child Penalty in the Long Run?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-17T11:56:22.892Z · comments (6)

[link] The unreasonable effectiveness of plasmid sequencing as a service
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-08T02:02:55.352Z · comments (0)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

[link] Tokyo AI Safety 2025: Call For Papers
Blaine (blaine-rogers) · 2024-10-21T08:43:38.467Z · comments (0)

Less Anti-Dakka
Mateusz Bagiński (mateusz-baginski) · 2024-05-31T09:07:10.450Z · comments (5)

[link] Libs vs Frameworks, Middle-Level Regularities vs Theories
adamShimi · 2024-07-04T19:01:59.440Z · comments (0)

Rashomon - A newsbetting site
ideasthete · 2024-10-15T18:15:02.476Z · comments (8)

[question] Money Pump Arguments assume Memoryless Agents. Isn't this Unrealistic?
Dalcy (Darcy) · 2024-08-16T04:16:23.159Z · answers+comments (6)

Text Posts from the Kids Group: 2019
jefftk (jkaufman) · 2024-06-23T13:20:01.495Z · comments (0)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

[link] Managing Emotional Potential Energy
adamShimi · 2024-07-10T18:20:45.640Z · comments (4)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.
sevdeawesome · 2024-07-01T05:50:49.498Z · comments (0)

[link] A Defense of Peer Review
Niko_McCarty (niko-2) · 2024-10-22T16:16:49.982Z · comments (1)

[link] [Talk transcript] What “structure” is and why it matters
Alex_Altair · 2024-07-25T15:49:00.844Z · comments (0)

AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan · 2024-07-28T03:30:07.192Z · comments (0)

On excluding dangerous information from training
ShayBenMoshe (shay-ben-moshe) · 2023-11-17T11:14:54.847Z · comments (5)

D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup
aphyer · 2024-01-18T03:06:39.344Z · comments (5)

Trying to align humans with inclusive genetic fitness
peterbarnett · 2024-01-11T00:13:29.487Z · comments (5)

[link] Increasing IQ by 10 Points is Possible
George3d6 · 2024-03-19T20:48:41.277Z · comments (50)

Tend to your clarity, not your confusion
Severin T. Seehrich (sts) · 2024-03-11T15:09:24.099Z · comments (1)

[question] How much fraud is there in academia?
ChristianKl · 2023-11-16T11:50:41.544Z · answers+comments (10)

[question] Should people build productizations of open source AI models?
lc · 2023-11-02T01:26:47.516Z · answers+comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christian-z-r on D&D Sci Coliseum: Arena of Data

A thanks a lot. I was actually working through the earlier scenarios, I just missed that I new one had popped up. Subscribed now, then I will hopefully notice the next one.

Also, my approach didn't work this time, I ended up trying with a way too complicated model. I really like how the actual answer to this one worked.

avturchin on avturchin's Shortform

Lifehack: If you're attacked by a group of stray dogs, pretend to throw a stone at them. Each dog will think you're throwing the stone at it and will run away. This has worked for me twice.

cousin_it on The Alignment Trap: AI Safety as Path to Power

Yeah, this is my main risk scenario. But I think it's more correct to talk about imbalance of power, not concentration of power. Maybe there will be one AI dictator, or one human+AI dictator, or many AIs, or many human+AI companies; but anyway most humans will end up at the bottom of a huge power differential. If history teaches us anything, this is a very dangerous prospect.

It seems the only good path is aligning AI to the interests of most people, not just its creators. But there's no commercial or military incentive to do that, so I don't see it happening, at least not by default.

james-chua on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

author on Binder et al. 2024 here. Thanks for reading our paper and suggesting the experiment!

To summarize the suggested experiment:

Train a model to be calibrated on whether it gets an answer correcct.
Modify the model (e.g. activation steering). This changes the model's performance on whether it gets an answer correct.
Check if the modified model is still well calibrated.

This could work and I'm excited about it.

One failure mode is that the modification makes the model very dumb in all instances. Then its easy to be well calibrated on all these instances -- just assume the model is dumb. An alternative is to make the model do better on some instances (by finetuning?), and check if the model is still calibrated on those too.

remmelt-ellen on Why Stop AI is barricading OpenAI

Noticing no response here after we addressed superficial critiques and moved to discussing the actual argument.

For those few interested in questions raised above, Forrest wrote some responses: http://69.27.64.19/ai_alignment_1/d_241016_recap_gen.html

The claims made will feel unfamiliar and the reasoning paths too. I suggest (again) taking the time to consider what is meant. If a conclusion looks intuitively wrong from some AI Safety perspective, it may be valuable to explicitly consider the argumentation and premises behind that.

tropicalfruit on Dating Roundup #1: This is Why You’re Single

Same. It would take incredible effort to find one person I reasonably connect with each year.

So much of this is just location. I've met 100s of people over the last few years. Nearly all either over 40 with kids, or those kids. I've connected with many, maybe 10%, on a pretty good level. That doesn't help with dating at all.

I just really, really don't want it to be the case that he only answer is: move to NY, SF, or Seattle, becuase I really like it here.

tailcalled on Three Notions of "Power"

However, though dominance is hard-coded, it seems like something of a simple evolved hack to avoid costly fights among relatively low-cognitive-capability agents; it does not seem like the sort of thing which more capable agents (like e.g. future AI, or even future more-intelligent humans) would rely on very heavily.

This seems exactly reversed to me. It seems to me that since dominance underlies defense, law, taxes and public expenditure, it will stay crucial even with more intelligent agents. Conversely, as intelligence becomes "too cheap to meter", "getting what you want" will become less bottlenecked on relevant insights, as those insights are always available.

green_leaf on Habryka's Shortform Feed

I use Google Chrome on Ubuntu Budgie and it does look to me like both the font and the font size changed.

saidachmiz on Habryka's Shortform Feed

Well, let’s see. Calibri is a humanist sans; Gill Sans is technically also humanist, but more more geometric in design. Geometric sans fonts tend to be less readable when used for body text.

Gill Sans has a lower x-height than Calibri. That (obviously) is the cause of all the “the new font looks smaller” comments.

(A side-by-side comparison of the fonts, for anyone curious, although note that this is Gill Sans MT Pro, not Gill Sans Nova, so the weight [i.e., stroke thickness] will be a bit different than the version that LW now uses.)

Now, as far as font rendering goes… I just looked at the site on my Windows box (adjusting the font stack CSS value to see Gill Sans Nova again, since I see you guys tweaked it to give Calibri priority)… yikes. Yeah, that’s not rendering well at all. Definitely more blurry than Calibri. Maybe something to do with the hinting, I don’t know. (Not really surprising, since Calibri was designed from the beginning to look good on Windows.) And I’ve got a hi-DPI monitor on my Windows machine…

Interestingly, the older version of Gill Sans (seen in the demo on my wiki, linked above) doesn’t have this problem; it renders crisply on Windows. (Note that this is not the flawed, broken-kerning version of the font that comes with Macs!)

I also notice that the comment font size is set to… 15.08px. Seems weird? Bumping it up to 16px improves things a bit, although it’s still not amazing.

If you can switch to the older (but not broken) version of Gill Sans, that’d be my recommendation.

If you can’t… then one option might be to check out one of the many similar fonts to see if perhaps one of them renders better on Windows while still having matching metrics.

habryka4 on Habryka's Shortform Feed

Sure, I was just responding to this literal quote:

Couldn't you please just set the comment font to the same as the post font?