LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What does a Gambler's Verity world look like?
ErioirE (erioire) · 2024-07-25T22:03:56.447Z · comments (6)

Species as Canonical Referents of Super-Organisms
Yudhister Kumar (randomwalks) · 2024-10-18T07:49:52.944Z · comments (6)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

[link] Against AI As An Existential Risk
Noah Birnbaum (daniel-birnbaum) · 2024-07-30T19:10:41.156Z · comments (13)

[link] Solutions to problems with Bayesianism
B Jacobs (Bob Jacobs) · 2024-07-31T14:18:27.910Z · comments (0)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

[question] Request for AI risk quotes, especially around speed, large impacts and black boxes
Nathan Young · 2024-08-02T17:49:48.898Z · answers+comments (0)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[link] Demography and Destiny
Zero Contradictions · 2024-07-21T20:34:07.176Z · comments (11)

[link] Labelling, Variables, and In-Context Learning in Llama2
Joshua Penman (joshua-penman) · 2024-08-03T19:36:34.721Z · comments (0)

Modelling Social Exchange: A Systematised Method to Judge Friendship Quality
Wynn Walker · 2024-08-04T18:49:30.892Z · comments (0)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom
Ghdz (gal-hadad) · 2024-08-05T18:27:20.709Z · comments (2)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (1)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

[link] Should we abstain from voting? (In nondeterministic elections)
B Jacobs (Bob Jacobs) · 2024-10-02T10:07:43.167Z · comments (5)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

Biasing VLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

[link] The AI regulator’s toolbox: A list of concrete AI governance practices
Adam Jones (domdomegg) · 2024-08-10T21:15:09.265Z · comments (1)

The Carnot Engine of Economics
StrivingForLegibility · 2024-08-09T15:59:40.458Z · comments (0)

Toy Models of Superposition: what about BitNets?
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-08-08T16:29:02.054Z · comments (1)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

Differential knowledge interconnection
Roman Leventov · 2024-10-12T12:52:36.267Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

roger-dearnaley on Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

I expect (1) and (2) to go mostly fine (though see footnote^[6]), and for (3) to completely fail. For example, suppose has a bunch of features for things like “true according to smart humans.” How will we distinguish those from features for “true”? I think there’s a good chance that this approach will reduce the problem of discriminating $C_{g}$ vs. $C_{b}$ to an equally-difficult problem of discriminating desired vs. undesired features.

However, if we found that the classifier was using a feature for "How smart is the human asking the question?" to decide what answer to give (as opposed to how to then phrase it), that would be a red flag.

gavriel-kleinwaks on If far-UV is so great, why isn't it everywhere?

Why there isn't more ground-up interest: it's expensive and people can't easily tell if it's worth the cost. Also anything where UV is touching you has to overcome people's safety concerns.

Good question on the large effects are easily measured thing--has to do with the distinction between: 1) in what environments you are cleaning the air, 2) how much you are cleaning the air there, 3) how much pathogen people inhale, and 4) how much pathogen is required to actually make people sick. It's not just a far-UV problem, it'd be a problem for any air cleaner, it's just that far-UV is especially expensive to install and especially faces negative "UV" associations.

Far-UV has a large effect on airborne pathogen concentration, and that large effect is in fact easy to measure in a chamber! But once you add it to a room where people are moving around and talking to each other, how much pathogen are they actually inhaling? Is the air in the room well-mixed? Is the far-UV reaching the infectious air before people inhale it? Even if they inhale air that has living pathogens, are they getting sick? If they get sick, did they get sick from that room or from a different environment that they were in previously? Study endpoints matter a lot.

Being able to understand intervention efficacy especially becomes a problem if a disease is largely spread via superspreaders/had high variance in infection sources. COVID, at least early in the pandemic, had very high variance, whereas eg seasonal flu doesn't usually. Therefore, if your study intervention is installed in public spaces, it's possible for it not to show much effect on seasonal flu but a large effect on a disease with early-COVID-like dynamics, which means you have to wait for that COVID-like disease to come along to see the effect--but that would be worth it; high-variance diseases are very concerning!

Another way of saying all this is that it's not the case that the effect will be super hard to measure given enough people and time, it's that the effect is hard to measure given that you need a lot of people in your study to account for the distinctions listed above, and/or you need a highly controlled environment, and that's just expensive.

seth-herd on Chris_Leong's Shortform

We know well enough what people mean by "world" - the stuff they care about. The fact that physics keeps on happening if humanity is snuffed out is no comfort at all to me or to most humans.

Arguing epistemology is not going to prevent a nuclear apocalypse or us being wiped out by the new intelligent species we are inventing. The fact that you don't know what's happening on the other side of the world has no bearing on existential dangers facing those people. That's what I mean by saving the world, and I expect what the author meant. This is a different thing than just helping people by your own values and estimates.

I very much agree that pithy mysterious statements for others to argue over is not a good use of the quick takes here.

johnswentworth on Why I’m not a Bayesian

Is the complaint that you can't do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities.

And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we'd want to do that? (Self-reference and things in that cluster would be the obvious use-case, I'm mostly curious if there's any other use-case.)

seth-herd on Chris_Leong's Shortform

I have no interest in honor if it's celebrated on a field of the dead. Virtue ethics is fine, as long as it's not an excuse to not figure out what needs doing and how it's going to get done.

Doing ones own part and trusting that the other parts are done by anonymous unknown others is a very silly coordination strategy. We need plans that amount to success, not just everyone doing whatever sounds nice to them.

Edit: I very much agree that saving the world is a team sport. Perhaps it's relevant that successful teams always do some planning and coordinating.

christiankl on Why I’m not a Bayesian

the thing a logician would call a "logic" or possibly a "logic augmented with some probabilities"

The main point of the article is that once you add probabilities you can't do predicate calculus anymore. It's a mathematical operation that's not defined for the entities that you get when you do your augmentation.

seth-herd on When is reward ever the optimization target?

Interesting. There's certainly a lot going on in there, and some of it very likely is at least vague models of future word occurrences (and corresponding events). The definition of model-based gets pretty murky outside of classic RL, so it's probably best to just directly discuss what model properties give rise to what behavior, e.g. optimizing for reward.

Model-free systems can produce goal-directed behavior. The do this if they have seen some relevant behavior that achieves a given goal, and their input or some internal representation includes the current goal, and they can generalize well enough to apply what they've experienced to the current context. (This is by the neuroscience definition of habitual vs goal-directed: behavior changes to follow the current goal, usually hungry, thirsty or not).

So if they're strong enough generalizers, I think even a model-free system actually optimizes for reward.

I think the claim should be stronger: for a smart enough RL system, reward is the optimization target.

sherrinford on Open Thread Fall 2024

So you think that looking up "random idiots" helps me find "arguments related to or a more detailed discussion about this disagreement"?

ablue on Chris_Leong's Shortform

You want to help? Figure out what kind of incremental changes you can begin to introduce in any of them, in order to begin extinguishing the sort of problems you've now elevated to the rank of "saving-worthy" in your own head. Note that, in all likelihood, by extinguishing one you will merrily introduce a whole bunch of others - something you won't get to discover until much later one. Yet that is, realistically, what you can actually go on to accomplish.

I read this paragraph as saying ~the same thing as the original post in a different tone

jenniferrm on Monthly Roundup #23: October 2024

It is true that there are some favorable properties that many systems other than the best system has compared to FPTP.

I like methods that are cloneproof and which can't be spoofed by irrelevant alternatives, and if there is ONLY a choice between "something mediocre" and "something mediocre with one less negative feature" then I guess I'll be in favor of hill climbing since "some mysterious force" somehow prevents "us" from doing the best thing.

However, I think cloning and independence are "nice to haves" whereas the condorcet criterion is probably a "need to have"

((The biggest design fear I have is actually the "participation criterion". One of the very very few virtues of FPTP is that it at least satisfies the criterion where someone showing up and "wasting their vote on a third party" doesn't cause their least preferred candidate to jump ahead of a more preferred candidate. But something similar can happen in every method I know of that reliably selects the Condorcet Winner when one exists :-(

Mathematically, I've begun to worry that maybe I should try to prove that Condorcet and Participation simply cannot both be satisfied at the same time?

Pragmatically, I'm not sure what it looks like to "attack people's will to vote" (or troll sad people into voting in ways that harm their interests and have the sad people fight back righteously by insisting that they shouldn't vote, because voting really will net harm their interests).

One can hope that people will simply "want to vote" because it make civic sense, but it actually looks like a huge number of humans are biased to feel like a peasant, and to have a desire to be ruled? Or something? And maybe you can just make it "against the law to not vote" (like in Australia) but maybe that won't solve the problems that could hypothetically "sociologically arise" from losing the participation criterion in ways that might be hard to foresee.))

In general, I think people should advocate for the BEST thing. The BEST thing I currently know of for picking an elected civilian commander in chief is "Ranked Pairs tabulation over Preference Ballots (with a law that requires everyone to vote during the two day Voting Holiday)".