LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (11)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

Survey on the acceleration risks of our new RFPs to study LLM capabilities
Ajeya Cotra (ajeya-cotra) · 2023-11-10T23:59:52.515Z · comments (1)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization
Jonas Hallgren · 2024-01-11T12:03:41.763Z · comments (3)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

On the 2nd CWT with Jonathan Haidt
Zvi · 2024-04-05T17:30:05.223Z · comments (3)

[link] Goodhart's Law Example: Training Verifiers to Solve Math Word Problems
Chris_Leong · 2023-11-25T00:53:26.841Z · comments (2)

A short dialogue on comparability of values
cousin_it · 2023-12-20T14:08:29.650Z · comments (7)

EA Infrastructure Fund's Plan to Focus on Principles-First EA
Linch · 2023-12-06T03:24:55.844Z · comments (0)

[link] [Linkpost] Concept Alignment as a Prerequisite for Value Alignment
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-11-04T17:34:36.563Z · comments (0)

AISC Project: Modelling Trajectories of Language Models
NickyP (Nicky) · 2023-11-13T14:33:56.407Z · comments (0)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

When and why should you use the Kelly criterion?
Garrett Baker (D0TheMath) · 2023-11-05T23:26:38.952Z · comments (25)

[link] Link Collection: Impact Markets
Saul Munn (saul-munn) · 2023-12-26T09:01:48.815Z · comments (0)

The economy is mostly newbs (strat predictions)
lukehmiles (lcmgcd) · 2024-02-01T19:15:49.420Z · comments (6)

Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

Appraising aggregativism and utilitarianism
Cleo Nardo (strawberry calm) · 2024-06-21T23:10:37.014Z · comments (10)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

Weak vs Quantitative Extinction-level Goodhart's Law
VojtaKovarik · 2024-02-21T17:38:15.375Z · comments (1)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

[question] What Software Should Exist?
Tomás B. (Bjartur Tómas) · 2024-01-19T21:43:50.112Z · answers+comments (27)

How to develop a photographic memory 2/3
PhilosophicalSoul (LiamLaw) · 2023-12-30T20:18:14.255Z · comments (7)

Probably Not a Ghost Story
George Ingebretsen (george-ingebretsen) · 2024-06-12T22:55:26.264Z · comments (4)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

andy_mckenzie on Science advances one funeral at a time

The examples you provided don't actually support the "one funeral at a time" narrative in your title. Take Barbara McClintock's jumping genes or Barry Marshall's H. pylori discovery -- in both cases, many scientists changed their views based on compelling evidence while very much alive. There are plenty of other examples of this. For example, the acceptance of prions as disease agents, the role of microbiomes in health, dark energy, and mitochondria's bacterial origins all show how consensus can shift rapidly once a sufficient amount of evidence has accumulated. Scientists change their minds all. the. time.

This is not to say that there are not fads or incorrect beliefs in science -- of course there are. And sometimes it can takes years or decades for them to be overwhelmed. But the "funeral" framing in particular is not only historically inaccurate but also promotes a harmful view that death is necessary for progress. What we actually see in these examples is that scientific views change when sufficient evidence accumulates and a sufficient number of people are convinced, regardless of generational turnover. Suggesting we need scientists to die rather than be convinced by evidence is both incorrect and ethically fraught. I am saddened to see it here and therefore strong downvoted this post.

edmund-nelson on Prediction markets and Taxes

Yeah that's fair, I'm just so used to American odds for gambling that I mentally use them all the time for these sorts of things.

Probably should have used good old fashioned odds instead.

The reason casino's show something like "Yankee's +110 Red sox -120" is so you can easily see the casino's rake or something.

t3t on dirk's Shortform

I'm pretty sure Ryan is rejecting the claim that the people hiring for the roles in question are worse-than-average at detecting illegible talent.

t3t on dirk's Shortform

Depends on what you mean by "resume building", but I don't think this is true for "need to do a bunch of AI safety work for free" or similar. i.e. for technical research, many people that have gone through MATS and then been hired at or founded their own safety orgs have no prior experience doing anything that looks like AI safety research, and some don't even have much in the way of ML backgrounds. Many people switch directly out of industry careers into doing e.g. ops or software work that isn't technical research. Policy might seem a bit trickier but I know several people who did not spend anything like years doing resume building before finding policy roles or starting their own policy orgs and getting funding. (Though I think policy might actually be the most "straightforward" to break into, since all you need to do to demonstrate compentence is publish a sufficiently good written artifact; admittedly this is mostly for starting your own thing. If you want to get hired at a "larger" policy org resume building might matter more.)

matthew4244 on Chapter 69: Self Actualization, Pt 4

Good, I was feeling bad for Hermione. +1.

nathan-helm-burger on dirk's Shortform

But legibility is a separate issue. If there are people who would potentially be good safety reseachers, but they get turned away by recruiters because they don't have a legibly impressive resume, then you have the companies lacking employees they would do well with if they had.

So, companies could be less constrained on people if they were more thorough in evaluating people on more than shallow easily-legible qualities.

Spending more money on this recruitment evaluation would thus help alleviate lack of good researchers. So money is tied into person-shortage in this additional way.

elizabeth-1 on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

see also: https://www.lesswrong.com/posts/Wiz4eKi5fsomRsMbx/change-my-mind-veganism-entails-trade-offs-and-health-is-one

lucas-teixeira on Toward Safety Case Inspired Basic Research

Re "big science": I'm not familiar with the term, so I'm not sure what the exact question being asked is. I am much more optimistic in the worlds where we have large scale coordination amongst expert communities. If the question is around what the relationship between governments, firms and academia, I'm still developing my gears around this. Jade Leung's thesis seems to have an interesting model but I have yet to dig very deep into it.

curiousmeta on Information vs Assurance

And this is how talking is anchrored in Costly Signaling.

(Note that "I dunno, probably around 9 pm." is still an assurance, though of a different kind: You're assuring that 9 pm is an honest estimate. If it turns out you make such statements up at random, it will cost you.)

And that's why talking can convey information at all.

dmitry-vaintrob on Science advances one funeral at a time

It's neat to remember stories like this, but I want to note that this shouldn't necessarily update scientists to criticize novel work less. If an immune system doesn't sometimes overreact, it's not doing its job right, and for every story like this there are multiple other stories of genuinely false exciting-sounding ideas that got shut down by experts (for instance I learned about Schekhtman from the Constant podcast, where his story was juxtaposed with that of genuine quacks). Looking back at my experience of excited claims that were generally dismissed by more skeptical experts in fields I was following, the majority of them (for instance the superluminal neutrino, the room-temperature superconductor, various hype about potentially proving the Riemann hypothesis by well-established mathematicians) have been false.

I think there is a separate phenomenon (which was the explanation for the study about funerals), that older high-status scientists in funding-hungry fields will often continue to get funding and set priorities after they have stopped working on genuinely exciting stuff -- whether because of age, because of age-related conservatism bias, or simply because their area of expertise has become too well-developed to generate new ideas. In my experience in math and physics, from inside the field, this phenomenon generally does not look like a consensus that only the established people know what's going on (as in most of the stories here), but either conversely a quiet consensus that so-and-so famous person is starting to go crazy, or alternatively the normal disagreement between more conservative and more innovation-minded people about the value of a new idea. For example the most exciting development in my professional life as a mathematician was Jacob Lurie's development of "higher category theory", a revolution that allowed algebraists to seamlessly use tools from topology. There were many haters of this theory (many very young), but there was enough of a diffuse understanding that this is exciting and potentially revolutionary that his ideas did percolate and end up converting many of the haters (similarly with Grothendieck and schemes). Note that here I think math avoids the worst aspects of these dynamics because it doesn't require funding and is less competitive.

The upshot here is that I think it's valuable to try to resolve the issue of good ideas being shot down by traditionalists, but the solution might not be to "adopt lower standards for criticizing new / surprising ideas" but rather something more like pulling the rope sideways and looking for better standards that do better at separating promising innovation from hype.