LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (64)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (2)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan · 2024-07-28T03:30:07.192Z · comments (0)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

Rashomon - A newsbetting site
ideasthete · 2024-10-15T18:15:02.476Z · comments (8)

[LDSL#2] Latent variable models, network models, and linear diffusion of sparse lognormals
tailcalled · 2024-08-09T19:57:56.122Z · comments (2)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

[link] [Talk transcript] What “structure” is and why it matters
Alex_Altair · 2024-07-25T15:49:00.844Z · comments (0)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

leogao on A Rocket–Interpretability Analogy

I think there are a whole bunch of inputs that determine a company's success. Research direction, management culture, engineering culture, product direction, etc. To be a really successful startup you often just need to have exceptional vision on one or a small number of these inputs, possibly even just once or twice. I'd guess it's exceedingly rare for a company to have leaders with consistently great vision across all the inputs that go into a company. Everything else will constantly revert towards local incentives.

steve2152 on Against empathy-by-default

Honest question: Suppose that my friends and other people whom I like and respect and trust all believe that genocide is very bad. I find myself (subconsciously) motivated to fit in with them, and I wind up adopting their belief that genocide is very bad. And then I take corresponding actions, by writing letters to politicians urging military intervention in Myanmar.

In your view, would that count as “selfish” because I “selfishly” benefit from ideologically fitting in with my friends and trusted leaders? Or would it count as “altruistic” because I am now moved by the suffering of some ethnic group across the world that I’ve never met and can’t even pronounce?

elityre on Bitter lessons about lucid dreaming

I don't have much information about your case, but I'd make a 1-to-1 bet that if you got up and wrote down your dreams first thing in the morning every morning, especially if you're woken up by an alarm for the first 3 times, that you'd start remembering your dreams. Just jot dow whatever you remember, however vague or instinct, upto and including "litterally nothing. The the last thing I remember is going to bed last night."

I rarely remember my own dreams, but in periods of my life when I've kept a dream journal, I easily remembered them.

david-johnston on A brief theory of why we think things are good or bad

For what it's worth, one idea I had as a result of our discussion was this:

We form lots of beliefs as a result of motivated reasoning
These beliefs are amenable to revision due to evidence, reason or (maybe) social pressure
Those beliefs that are largely resilient to these challenges are "moral foundations"

So philosophers like "pain is bad" as a moral foundation because we want to believe it + it is hard to challenge with evidence or reason. Laypeople probably have lots of foundational moral beliefs that don't stand up as well to evidence or reason, but (perhaps) are equally attributable to motivated reasoning.

Social pressure is a bit iffy to include because I think lots of people relate to beliefs that they adopted because of social pressure as moral foundations, and believing something because you're under pressure to do so is an instance of motivated reasoning.

I don't think this is a response to your objections, but I'm leaving it here in case it interests you.

thane-ruthenis on The Mask Comes Off: At What Price?

In a transformed-except-corporate-ownership-stays-the-same world, I don't see any reason such lottery winners' portion wouldn't increase asymptotically toward 100 percent, with nobody else getting anything at all.

Well yeah [LW · GW], exactly.

Even without an overtly revolutionary restructuring, I kind of doubt "OpenAI owns everything" would fly. Maybe corporate ownership would stay exactly the same, but there'd be a 99.999995 percent tax rate.

Taxes enforced by whom?

akash-wasil on What AI companies should do: Some rough ideas

Some ideas relating to comms/policy:

Communicate your models of AI risk to policymakers
- Help policymakers understand emergency scenarios (especially misalignment scenarios) and how to prepare for them
- Use your lobbying/policy teams primarily to raise awareness about AGI and help policymakers prepare for potential AGI-related global security risks.
Develop simple/clear frameworks that describe which dangerous capabilities you are tracking (I think OpenAI's preparedness framework is a good example, particularly RE simplicity/clarity/readability.)
Advocate for increased transparency into frontier AI development through measures like stronger reporting requirements, whistleblower mechanisms, embedded auditors/resident inspectors, etc.
Publicly discuss threat models (kudos to DeepMind [? · GW])
Engage in public discussions/debates with people like Hinton, Bengio, Hendrycks, Kokotajlo, etc.
Encourage employees to engage in such discussions/debates, share their threat models, etc.
Make capability forecasts public (predictions for when models would have XYZ capabilities)
Communicate under what circumstances you think major government involvement would be necessary (e.g., nationalization, "CERN for AI" setups).

evolutionbydesign on Advice on Communicating Concisely

Actually, both.

I started a AI club at my high school last year, and I've been (slowly) trying to teach other students the basics of deep learning. They generally come out of my 15-to-20 minute-long explanations confused, rather than understanding.
This too (I don't have a specific example in mind - I'll see if any pop up during school tomorrow)

I normally think what I'm saying is clear, but the result is that others don't understand what I mean when I finish saying it - which causes me to tack on hasty clarifications of my intentions / ideas.

cubefox on Alexander Gietelink Oldenziel's Shortform

Yeah. I think the technical term for that would be cringe.

gwern on Alexander Gietelink Oldenziel's Shortform

Sunglasses can be too cool for most people to be able to wear in the absence of a good reason. Tom Cruise can go around wearing sun glasses any time he wants, and it'll look cool on him, because he's Tom Cruise. If we tried that, we would look like dorks because we're not cool enough to pull it off [LW · GW] and it would backfire on us. (Maybe our mothers would think we looked cool.) This could be said of many things: Tom Cruise or Kanye West or fashionable celebrities like them can go around wearing a fedora and trench coat and it'll look cool and he'll pull it off; but if anyone else tries it...

jackson-wagner on A Rocket–Interpretability Analogy

Satellites were also plausibly a very important military technology. Since the 1960s, some applications have panned out, while others haven't. Some of the things that have worked out:

GPS satellites were designed by the air force in the 1980s for guiding precision weapons like JDAMs, and only later incidentally became integral to the world economy. They still do a great job guiding JDAMs, powering the style of "precision warfare" that has given the USA a decisive military advantage ever since 1991's first Iraq war.
Spy satellites were very important for gathering information on enemy superpowers, tracking army movements and etc. They were especially good for helping both nations feel more confident that their counterpart was complying with arms agreements about the number of missile silos, etc. The Cuban Missile Crisis was kicked off by U-2 spy-plane flights photographing partially-assembled missiles in Cuba. For a while, planes and satellites were both in contention as the most useful spy-photography tool, but eventually even the U-2's successor, the incredible SR-71 blackbird, lost out to the greater utility of spy satellites.
Systems for instantly detecting the characteristic gamma-ray flashes of nuclear detonations that go off anywhere in the world (I think such systems are included on GPS satellites), and giving early warning by tracking ballistic missile launches during their boost phase (the Soviet version of this system famously misfired and almost caused a nuclear war in 1983, which was fortunately forestalled by one Lieutenant colonel Stanislav Petrov) are obviously a critical part of nuclear detterence / nuclear war-fighting.

Some of the stuff that hasn't:

The air force initially had dreams of sending soldiers into orbit, maybe even operating a military base on the moon, but could never figure out a good use for this. The Soviets even test-fired a machine-gun built into one of their Salyut space stations: "Due to the potential shaking of the station, in-orbit tests of the weapon with cosmonauts in the station were ruled out.The gun was fixed to the station in such a way that the only way to aim would have been to change the orientation of the entire station. Following the last crewed mission to the station, the gun was commanded by the ground to be fired; some sources say it was fired to depletion".
Despite some effort in the 1980s, were were unable to figure out how to make "Star Wars" missile defence systems work anywhere near well enough to defend us against a full-scale nuclear attack.
Fortunately we've never found out if in-orbit nuclear weapons, including fractional orbit bombardment weapons, are any use, because they were banned by the Outer Space Treaty. But nowadays maybe Russia is developing a modern space-based nuclear weapon as a tool to destroy satellites in low-earth orbit.

Overall, lots of NASA activities that developed satellite / spacecraft technology seem like they had a dual-use effect advancing various military capabilities. So it wasn't just the missiles. Of course, in retrospect, the entire human-spaceflight component of the Apollo program (spacesuits, life support systems, etc) turned out to be pretty useless from a military perspective. But even that wouldn't have been clear at the time!