LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (33)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (6)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

E.T. Jaynes Probability Theory: The logic of Science I
Jan Christian Refsgaard (jan-christian-refsgaard) · 2023-12-27T23:47:52.579Z · comments (20)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kave on Are Your Enemies Innately Evil?

Yes, though I'm not confident.

d0themath on The Median Researcher Problem

I don’t see how this is any evidence against John’s point.

Presumably the reason you need such crushingly obvious results which can be seen regardless of the validity of your statistical tool before the field can move on is because you need to convince the median researchers.

The sharp researchers have predictions about where the field is going based on statistical evidence and mathematical reasoning, and presumably can be convinced of the ultimate state far before the median, and work toward proving or disproving their hypotheses, and then once its clear to them, making the case stupidly obvious for the lowest common denominator in the room. And I expect this is where most of the real conceptual progress lies.

Even in the word where as you claim this is a marginal effect, if we could speed up any given advance in academic biomedicine by a year, that is an incredible achievement! Many people may die in that year who could’ve been saved had the median not wasted time (assuming the year saved carries over to clinical medicine).

steve2152 on [Intuitive self-models] 3. The Homunculus

Hmm, I still might not be following, but I’ll write something anyway. :)

Take some “concept” in your world-model, operationalized as a particular cluster C of neurons in some part of your cortex that tend to activate together.

How might we figure out what what C “means”?

One part of the answer is entirely within the cortex world-model: C has particular relationships to other things in the cortex world-model, which in term have relationships to still other things etc. Clusters of neurons related to “bird” have some connection to clusters of neurons related to “flying”. That by itself might already be enough to pin down the “meanings” of different things, just because there’s so much structure there, and we can try to match it up with structures in the world, by analogy with unsupervised machine translation. But if not…

The other part of the answer is about how the cortex world-model relates to the real world. Maybe C directly predicts some particular pattern in low-level sensory inputs. Maybe C directly activates some particular pattern in motor output. Or maybe the connection is less direct—a certain abstract pattern in the space of abstract patterns in the space of abstract patterns in the space of low-level sensory inputs, or whatever. If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.

So, how about “cold”? Our body has a couple relevant sensors: peripheral nerves that express TRPM8 (“cold and menthol receptor 1”), hypothalamus neurons that detect blood temperature via TRPV1, etc. (I’m not an expert on the details.) As usual [LW · GW], these sensory signals are processed in two areas in parallel. In the hypothalamus & brainstem (“Steering Subsystem”), they trigger innate reactions like shivering, unpleasant feelings / desire to warm up, and so on. And in the cortex, they’re treated as just so many more channels of unlabeled input data that the world-model needs to predict.

In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.

I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output. And we can explain those latter associations via the structure of the world and body in conjunction with a learning algorithm running throughout your life experience.

You can sorta write code for a relevant part of what's happening in the mind when e.g. the freezing emotion/sensation is triggered.

I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.

Of course, we can chat about the trained model level to some extent. Why is “cold” associated with shivering? Because in the training environment of life experience, those two things have tended to go together, such that each provides nonzero Bayesian evidence that the other should be active, or will be soon. Ditto with the connection between cold and ice cream, and everything else. So we can chat about it, but it would take forever to directly write code for all those things. Hence the learning algorithm. Does that help?

john-huang on Advisors for Smaller Major Donors?

Here's my solution to your problem. Small major donors should collectively organize together and make decisions democratically.

I would therefore expand the donor lottery into a democratic committee. Instead of selecting only 1 participant, select ~10 participants, similar to jury duty. With more participants, we enjoy more diverse opinion and a better representative sample (Yes 10 is a terrible sample, but it's way better than 1. If the number of members in the pool increase, the sample size should be increased). More people also facilitate better deliberative discussion and information sharing.

The rationale of a lottocratically selected committee is also different from a donor lottery. Lottocratic committees have democratic legitimacy (often called "sortition"). They are created similarly to how jury pools are created, with similar democratic credentials.

As the sample size of the committee increases, it becomes more and more legitimate as a representative statistical sample of the donor membership.

The tradeoff is cost. A lottocratic body of 10 is 10 times more costly than a body of 1. But it's also much more efficient than individual action. Imagine 50 people are in your pool. A lottocratic body of 10 reduces cognitive load by 5x. In my opinion, the body of 10 will also make better decisions than a single temporary dictator.

A variety of reasons why collective decisions are often better include:

The practice of deliberative democracy - Deliberation can produce better informed results.
Division of labor - Effective committees can organize research tasks and divide up cognitive labor to enhance decision making.
Condorcet's Jury Theorem - Greater number of participants increases decision accuracy.
Median voter theorem - Greater number of participants activate the possibility of tending towards the median preferences of the pool.

A single winner in contrast invites chaos to charitable selections, and is unrepresentative of the whole.

steve2152 on Alexander Gietelink Oldenziel's Shortform

I disagree with “He seems to have no inside information.” He presented himself as having no inside information, but that’s presumably how he would have presented himself regardless of whether he had inside information or not. It’s not like he needed to convince others that he knows what he’s doing, like how in the stock market you want to buy then pump then sell. This is different—it’s a market that’s about to resolve. The smart play from his perspective would be to aggressively trash-talk his own competence, to lower the price in case he wants to buy more.

zach-stein-perlman on Three Sketches of ASL-4 Safety Case Components

My quick reactions:

Interp: no inside view; my impression is that my friends are pessimistic about such safety cases in the next few years
Control: hooray for Anthropic finally writing something about control; some details seem good; some details seem odd ("contexts that were specifically whitelisted"); I think the naive pre-deployment honeypots path doesn't work because a scheming model wins with the defect rarely strategy (related: https://www.lesswrong.com/posts/vYWDEtJ2b8tkbG7Rv/behavioral-red-teaming-is-unlikely-to-produce-clear-strong [LW · GW]) [very sophisticated or expensive honeypotting setups could work but the post doesn't seem to be imagining that or noticing the issue]; that's important and the current language in the post is concerning
Incentives: I don't really understand, or I'm dubious, or I expect that A3.1 is very hard to demonstrate

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Mindmeld

In theory AIs can transmit information far faster and more directly than humans. They can directly send weight/activation vectors to one another. The most important variable on whether entities (cells, organisms, polities, companies, ideologies, empire etc) stay individuals or amalgate into a superorganism is communication bandwith & copy fidelity.
Both of these differ many order of magnitude for humans versus AIs. At some point, mere communication becomes a literal melding of minds. It seems quite plausibly then that AIs will tend to mindmeld if left alone.

The information rate of human speech is around 39 bits per second, regardless of the language being spoken or how fast or slow it is spoken. This is roughly twice the speed of Morse code.
Some say that the rate of 39 bits per second is the optimal rate for people to convey information. Others suggest that the rate is limited by how quickly the brain can process or produce information. For example, one study found that people can generally understand audio recordings that are sped up to 120%.
While the rate of speech is relatively constant, the information density and speaking rate can vary. For example, the information density of Basque is 5 bits per syllable, while Vietnamese is 8 bits per syllable.

Current state of the art fibre optic cables can transmit up to 10 terabits a second.

That's probably a wild overestimate for AI communication though. More relevant bottlenecks are limits on processing informations [plausibly more in the megabits range], limits on transferability of activation vectors (but training could improve this).

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

>> 'a massive transfer of wealth from "sharps" '.

no. That's exactly the point.

1. there might no be any real sharps (=traders having access to real private arbitragiable information that are consistently taking risk-neutral bets on them) in this market at all.

This is because a) this might simple be a noisy, high entropy source that is inherently difficult to predict, hence there is little arbitragiable information and/or b) sharps have not been sufficiently incenticiz

2. The transfer of wealth is actually disappointing because Theo the French Whale moved the price so much.

For an understanding of what the trading decisions of a verifiable sharp looks like one should take a look at Jim Simons' Medaillon fund. They do enormous hidden information collection, ?myssterious computer models, but at the end of the day take a large amount of very hedged tiny edge positions.

***************************************************

You are misunderstanding my argument (and most of the LW commentariat with you). I might note that I made my statement before the election result and clearly said 'win or lose' but it seems that even on LW people think winning on a noisy N=1 sample is proof of rationality.

jesseclifton on Winning isn't enough

mildly disapprove of words like "a widely-used strategy"

The text says “A widely-used strategy for arguing for norms of rationality involves avoiding dominated strategies”, which is true* and something we thought would be familiar to everyone who is interested in these topics. For example, see the discussion of Dutch book arguments in the SEP entry on Bayesianism and all of the LessWrong discussion on money pump/dominance/sure loss arguments (e.g., see all of the references in and comments on this post [LW · GW]). But fair enough, it would have been better to include citations.

"we often encounter claims"

We did include (potential) examples in this case. Also, similarly to the above, I would think that encountering claims like “we ought to use some heuristic because it has worked well in the past” is commonplace among readers so didn’t see to provide extensive evidence.

*Granted, we are using “dominated strategy” in the wide sense of “strategy that you are certain is worse than something else”, which glosses over technical points like the distinction between dominated strategy and sure loss.

sarahconstantin on sarahconstantin's Shortform

links 11/6/2024: https://roamresearch.com/#/app/srcpublic/page/11-06-2024

https://angrystaffofficer.com/2018/09/19/if-the-hoth-crash-was-an-air-force-investigation/
- this taught me the phrase "mishap pilot"
https://pmc.ncbi.nlm.nih.gov/articles/PMC5656536/ this is measles virus used against relapsed multiple myeloma; one complete response out of 32 patients.
- https://www.nature.com/articles/s41375-020-0828-7.pdf the one patient with the CR had strong T-cell responses to measles virus proteins. suggests that when this works it's via immune response.
- https://ajronline.org/doi/pdf/10.2214/AJR.09.3672 it works on mouse pancreatic cancer
- https://ascopubs.org/doi/abs/10.1200/JCO.2022.40.6_suppl.509 seems to be able to treat bladder cancer?
- https://pmc.ncbi.nlm.nih.gov/articles/PMC3018921/ blocks medulloblastoma growth in mice
- https://en.wikipedia.org/wiki/CD46 the receptor for measles virus is also frequently expressed by cancer cells
https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2023.1095219/full targeting CDKs in sarcomas -- there are some clinical trials happening
- https://ascopubs.org/doi/abs/10.1200/PO.24.00219 palcociclib: one partial response out of 42 sarcoma patients
- https://aacrjournals.org/clincancerres/article/29/17/3484/728559 suggestive in-vitro/animal evidence
targeting FGFRs in advanced solid tumors with FGFR mutations/overexpression: https://dial.uclouvain.be/pr/boreal/object/boreal%3A285422/datastream/PDF_01/view
- 3% complete response, 25% partial response with erdafitinib
- https://pmc.ncbi.nlm.nih.gov/articles/PMC8231807/ FGFR inhibitors are typically toxic
MPNSTs cluster into two distinct types of genomic alteration with different drug vulnerabilities https://www.nature.com/articles/s41467-023-38432-6.pdf
targeting MDM2 in advanced solid tumors: there's a trial. https://clinicaltrials.gov/study/NCT03611868
- https://ascopubs.org/doi/10.1200/JCO.2022.40.16_suppl.9517 2 complete responses in melanoma, 1 PR each in liposarcoma, urothelial, and NSCLC, but none in MPNST.
- http://www.annclinlabsci.org/content/46/6/627.full it's being explored as a target in cancer
https://www.sciencedirect.com/science/article/abs/pii/S0959804920313228 21% partial response in soft tissue sarcoma to a XPO1 inhibitor + chemo
- review article on XPO1 inhibition https://www.nature.com/articles/s41571-020-00442-4
https://www.centauri-dreams.org/2024/11/05/vegas-puzzling-disk/ the star Vega looks like it has a disc but no planets
https://www.nature.com/articles/s41598-024-58899-7 CD74 in cancer is an indicator for M1 macrophage infiltration, across cancer types
https://bibliome.ai/ is a resource for looking up specific genome variants and their references in the literature and open-access databases.
- when i click through to references they're often inaccurate (they are claimed to reference a variant that they do not, in fact, contain) but tbh this is also true of Google Search and Google Scholar when it comes to rare variants.