LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

Monthly Roundup #17: April 2024
Zvi · 2024-04-15T12:10:03.126Z · comments (4)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (8)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (5)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao · 2023-12-16T05:39:10.558Z · comments (5)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

[link] [Closed] Agent Foundations track in MATS
Vanessa Kosoy (vanessa-kosoy) · 2023-10-31T08:12:50.482Z · comments (1)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

AMA: Earning to Give
jefftk (jkaufman) · 2023-11-07T16:20:10.972Z · comments (8)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

Zvi's Manifold Markets House Rules
Zvi · 2023-11-13T00:28:02.147Z · comments (6)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (6)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

Self-Blinded L-Theanine RCT
niplav · 2023-10-31T15:24:57.717Z · comments (12)

[link] Open Phil releases RFPs on LLM Benchmarks and Forecasting
LawrenceC (LawChan) · 2023-11-11T03:01:09.526Z · comments (0)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:20.031Z · comments (20)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

linda-linsefors on AI Safety Camp 10

You can find their prefeed contact info in each document in the Team section.

mikkel-wilson on The hostile telepaths problem

For the record, this sentence popped into my head while reading this: "Wait, but what if I'm Omega-V, and [Valentine] is a two boxer?"

abramdemski on Three Notions of "Power"

It seems to me that the importance and interaction of these different types of power in the future depends a lot on our choices now, ie, what kind of future we shape. Hierarchies could get smashed in one way or another, making John's prediction correct, or we could engineer a future that evolves from the present more smoothly, in which case you'd be correct.

sarahconstantin on sarahconstantin's Shortform

links 10/30/2024: https://roamresearch.com/#/app/srcpublic/page/10-30-2024

https://pmc.ncbi.nlm.nih.gov/articles/PMC10136898/ FRET is a biosensor modality.
- "FRET is a non-radiative transfer of energy from an excited donor fluorophore molecule to a nearby acceptor fluorophore molecule...When the biomolecule of interest is present, it can cause a change in the distance between the donor and acceptor, leading to a change in the efficiency of FRET and a corresponding change in the fluorescence intensity of the acceptor. This change in fluorescence can be used to detect and quantify the biomolecule of interest."
- advantages:
  - real-time
  - non-destructive
  - sensitive to very low concentrations (picomolar and nanomolar)
  - highly specific because it detects conformational changes in biological molecules
- this article is from a not-great journal and the author clearly does not have English as a first language... at some point i will need a more reputable source, this was from googling FRET quickly
https://www.astralcodexten.com/p/the-case-against-proposition-36 Clara Collier gives the narrow, evidence-based case that shorter jail sentences didn't cause California's property crime wave or drug overdose death epidemic, and longer jail sentences won't fix those problems
- I'm pretty convinced but I don't follow this topic in great detail
metastatic malignant peripheral nerve sheath tumor is pretty bad -- median survival is only 8 months after metastases are detected. but one M.O. that seems to help in several case studies is "sequence the tumor, find a mutation, use a drug that's approved for other cancer types with the same mutation."
- PD-L1 overexpression? use a PD-1 inhibitor! checkpoint immunotherapy stays winning.
- BRAF V600E mutation? try a BRAF inhibitor!
  - https://jnccn.org/view/journals/jnccn/11/12/article-p1466.xml vemurafenib
- other Raf stuff: maybe sorafenib?
  - https://www.tandfonline.com/doi/abs/10.4161/cbt.7.6.5932
- shit that doesn't work:
  - sirolimus https://onlinelibrary.wiley.com/doi/full/10.1155/2020/5784876
- chemo is...not great but better than nothing. some partial responses, no complete responses, survival extended by maybe a few months. mostly it seems best to have doxorubicin in the mix.
  - https://onlinelibrary.wiley.com/doi/full/10.1155/2017/8685638
  - https://ascopubs.org/doi/abs/10.1200/JCO.2024.42.16_suppl.11583
  - https://www.sciencedirect.com/science/article/pii/S0923753419377907
  - https://ascopubs.org/doi/abs/10.1200/jco.2010.28.15_suppl.e20512
  - https://onlinelibrary.wiley.com/doi/full/10.1155/2011/705345 ok here's a complete response to chemo + surgery. it can ever happen.
  - https://ar.iiarjournals.org/content/40/3/1619.short case of long-term survival after keeping chemotherapy going a *really long time* at gradually decreasing dose and widening inter-treatment interval.
- https://onlinelibrary.wiley.com/doi/abs/10.1002/ijc.33201 pazopanib, an angiogenesis inhibitor, similarly has a low response rate but can extend survival a bit
https://proof-scaling-meeting.vercel.app/ formal verification conference
https://chalmermagne.substack.com/p/death-by-a-thousand-roundtables what it's actually like to work in UK policy. sounds dismal.
https://www.972mag.com/lavender-ai-israeli-army-gaza/ AI bombing. critical perspective on Israel.
https://goingon.org/ a timeline-based, "citizen journalism" news site.
https://statistics.berkeley.edu/about/news/steinhardt-announces-co-founding-transluce-non-profit-ai-research-lab AI interpretability nonprofit, Jacob Steinhardt
- mech-interp seems like straightforwardly real and good work from a variety of perspectives on AI. helps with many risk scenarios including some x-risk scenarios; helps make the technology stronger & more reliable, which is good for the industry in the long run.
https://blog.benjaminreinhardt.com/young-people-technical-training this is straightforwardly true, yes, you should learn technical stuff.
https://www.washingtonpost.com/opinions/2024/10/28/jeff-bezos-washington-post-trust/ Jeff Bezos on why the Washington Post isn't endorsing a Presidential candidate. this is a solidly written persuasive essay; it seemed legit to me, but I could be persuaded otherwise.

romeostevensit on The hostile telepaths problem

It's worth noting that many therapists break therapeutic alliance for ideological or liability reasons and this is one of the reasons that self therapy, peer therapy, llms, and workbooks can sometimes be better.

anthonyc on Occupational Licensing Roundup #1

The obvious follow up is indeed ‘why only military’? License portability should be universal.

I assume the short answer is: This is a group that is small enough not to credibly threaten anyone's wages, sympathetic/virtuous enough that people feel uncomfortable or expect to pay a price for arguing against them, and not actually in control of where they live. If nothing else it opens up surface area for other attack vectors, once people realize with data that those dastardly New Mexican plumbers can also safely fix toilets in Denver.

On music therapists - something like 15-20 states require this, yes. My wife is a therapeutic musician, not a music therapist (trained to play live music for patients in clinical settings, not interactive, not part of an ongoing treatment plan) - not a whole lot of overlap despite the similar names). There was have been times where MTs have tried to get legislation passed to prevent TMs from practicing.

Personally I'm amazed that this hasn't come under fire over the decades by more litigation under the Interstate Commerce and Full Faith and Credit clauses. Somehow it applies to driver's licenses and vehicle registrations but not (again in CO because I have the list open) landscape architects and... boxers? Really? Especially for things like telemedicine, where it's illegal for my therapist to talk to me on the phone or send me a text if I'm not physically in their state. Theoretically if they're part of a larger practice it's illegal for their own office to talk to them if they're traveling. I'm honestly baffled how this would stand up to real legal/constitutional scrutiny. say, I'm traveling in NH, and call my doctor's office in MA to ask them to refill a prescription at a pharmacy near me, and the office calls my doctor who happens to be at a conference in CT that that to confirm what they should do. On what grounds could NH and CT claim this is accomplishing anything whatsoever? As the patient I can move (most) prescriptions around between pharmacies and states whenever I want, I can see doctors in whatever state I want. As a doctor, you can see patients from whatever state you want as long as they travel to you and not the other way around.

romeostevensit on The hostile telepaths problem

Agree with the approach with the caveat that some people in group 2 are naive cooperators and therefore second order defectors since they are suckers for group 1. Eg the person who will tell the truth to the Nazis out of mistaken theories of ethics or just behavioral conditioning.

romeostevensit on The hostile telepaths problem

I was reading this earlier and it dovetails very well with this post. Framing defending yourself against hostile people and processes as primarily selfish itself serves the hostile.

steve2152 on The Alignment Trap: AI Safety as Path to Power

When we develop mechanisms to control AI systems, we are essentially creating tools that could be used by any sufficiently powerful entity - whether that's a government, corporation, or other organization. The very features that make an AI system "safe" in terms of human control could make it a more effective instrument of power consolidation.

…And if we fail to develop such mechanisms, AI systems will still be an “instrument of power consolidation”, but the power being consolidated will be the AI’s own power, right?

I mean, 90% of this article—the discussion of offense-defense balance, and limits on human power and coordination—applies equally to “humans using AI to get power” versus “AI getting power for its own purposes”, right?

E.g. out-of-control misaligned AI is still an “enabler of coherent entities”, because it can coordinate with copies of itself.

I guess you’re not explicitly arguing against “open publication of safety advances” but just raising a point of consideration? Anyway, a more balanced discussion of the pros and cons of “open publication of safety advances” would include:

Is “humans using AI to get power” less bad versus more bad than “AI getting power for its own purposes”? (I lean towards “probably less bad but it sure depends on the humans and the AI”)
If AI obedience is an unsolved technical problem to such-and-such degree, to what extent does that lead to people not developing ever-more-powerful AI anyway? (I lean towards “not much”, cf. Meta / LeCun today [LW · GW], or the entire history of AI)
Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)

It’s probably a question that needs to be considered on a case-by-case basis anyway. ¯\_(ツ)_/¯

daphne_w on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

That's probably the only "military secret" that really matters.

The soldiers guarding the outer wall and the Citadel treasurer that pays their overtime wages would beg to differ.