LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot_Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

A thought about the constraints of debtlessness in online communities
mako yass (MakoYass) · 2023-10-07T21:26:44.480Z · comments (23)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (10)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

[link] Prediction markets covered in the NYT podcast “Hard Fork”
Austin Chen (austin-chen) · 2023-10-13T18:43:29.644Z · comments (6)

Degeneracies are sticky for SGD
Guillaume Corlouer (Tancrede) · 2024-06-16T21:19:53.362Z · comments (1)

Noticing Panic
Cole Wyeth (Amyr) · 2024-02-05T03:45:51.794Z · comments (8)

Coalitional agency
Richard_Ngo (ricraz) · 2024-07-22T00:09:51.525Z · comments (6)

[link] Meditations on Mot
Richard_Ngo (ricraz) · 2023-12-04T00:19:19.522Z · comments (11)

FixDT
abramdemski · 2023-11-30T21:57:11.950Z · comments (14)

some thoughts on LessOnline
Raemon · 2024-05-08T23:17:41.372Z · comments (5)

AI Is Not Software
Davidmanheim · 2024-01-02T07:58:04.992Z · comments (29)

Phallocentricity in GPT-J's bizarre stratified ontology
mwatkins · 2024-02-17T00:16:15.649Z · comments (37)

Evidence against Learned Search in a Chess-Playing Neural Network
p.b. · 2024-09-13T11:59:55.634Z · comments (3)

AI Alignment Research Engineer Accelerator (ARENA): call for applicants
CallumMcDougall (TheMcDouglas) · 2023-11-07T09:43:41.606Z · comments (0)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

Experiment on repeating choices
KatjaGrace · 2024-04-19T04:20:03.992Z · comments (1)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

Calculating Natural Latents via Resampling
johnswentworth · 2024-06-06T00:37:42.127Z · comments (4)

Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt · 2024-09-16T16:07:01.119Z · comments (7)

A quick investigation of AI pro-AI bias
Fabien Roger (Fabien) · 2024-01-19T23:26:32.663Z · comments (1)

... Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth · 2024-08-26T16:38:53.924Z · comments (12)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (39)

[link] AI Safety Hub Serbia Official Opening
DusanDNesic · 2023-10-28T17:03:34.607Z · comments (0)

A gentle introduction to mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:06:16.778Z · comments (0)

AI things that are perhaps as important as human-controlled AI
Chi Nguyen · 2024-03-03T18:07:24.291Z · comments (4)

[link] Building intuition with spaced repetition systems
Jacob G-W (g-w1) · 2024-05-12T15:49:04.860Z · comments (6)

Why I no longer identify as transhumanist
Kaj_Sotala · 2024-02-03T12:00:04.389Z · comments (33)

Why Care About Natural Latents?
johnswentworth · 2024-05-09T23:14:30.626Z · comments (3)

The Best of Don’t Worry About the Vase
Zvi · 2023-12-13T12:50:02.510Z · comments (4)

Conditional prediction markets are evidential, not causal
philh · 2024-02-07T21:52:47.476Z · comments (10)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (2)

How do you actually obtain and report a likelihood function for scientific research?
Peter Berggren (peter-berggren) · 2024-02-11T17:42:49.956Z · comments (4)

[link] Datasets that change the odds you exist
dynomight · 2024-06-29T18:45:14.385Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cubefox on Another argument against utility-centric alignment paradigms

Admittedly, I can't judge all the technical details. But I notice that neither So8res nor Wentworth have engaged with the EJT post (neither directly in the comments nor in their posts you have linked), despite being published later. And EJT's engagement [LW(p) · GW(p)] with the Wentworth post didn't elicit much of a reaction either. So from an outside view, the viability of coherence arguments seems questionable.

yurii-burak-1 on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles

As someone who agrees with Stalin in, well, most of the questions (with a few exceptions, some of them are quite important, but it's not that important to explain there), what could I say?

There are newer papers on IQ and race, and a lot of critique of that one.
Ability to solve IQ tests can differ in a single person, for like 10+ points. https://www.reuters.com/article/business/healthcare-pharmaceuticals/study-finds-poverty-reduces-brain-power-idUSL6N0GU1PO/
My dad have scored around 170, and considers IQ tests pointless.
I agree with your point about the male-looking face.
It's pointless to pretend that LGBT+ is not an organisation with ideology. By the way, their ideology is too backwards and reactionary in relation to what do WE plan, regarding the sex and family.
Yes, setting a question of racial IQ dependency is bad. Human intelligence is dependent on its development around their whole life and usage. Making such questions, you begin to act to create proofs for them (self-fulfilling prediction, aha), and thus go against the basic principles of democracy.
Yes, "high IQ jews" goes into the same dumpster.

niplav on Nathan Young's Shortform

Relevant: When pooling forecasts, use the geometric mean of odds [? · GW].

jeremy-gillen on Another argument against utility-centric alignment paradigms

That post is clickbait. It only argues that the incompleteness money pump doesn't work. The reasons the incompleteness money pump does work is well summarized at a high level here [LW · GW] or here [LW · GW] (more specifically here [LW(p) · GW(p)] and here [LW(p) · GW(p)], if we get into details).

nathan-young on Nathan Young's Shortform

What is the best way to take the average of three probabilities in the context below?

There is information about a public figure
Three people read this information and estimate the public figure's P(doom)
(It's not actually p(doom) but it's their probability of something
How do I then turn those three probabilities into a single one?

Thoughts.

I currently think the answer is something like for probability a,b,c then the group median is 2^((log2a + log2b + log2c)/3). This feels like a way to average the bits that each person gets from the text.

I could just take the geometric or arithmetic mean, but somehow that seems off to me. I guess I might write my intuitions for those here for correction.

Arithmetic mean (a + b + c)/3. So this feels like uncertain probabilities will dominate certain ones. eg (.0000001 + .25)/2 = approx .125 which is the same as if the first person was either significantly more confident or significantly less. It seems bad to me for the final probability to be uncorrelated with very confident probabilities if the probabilities are far apart.

On the other hand in terms of EV calculations, perhaps you want to consider the world where some event is .25 much more than where it is .0000001. I don't know. Is the correct frame possible worlds or the information each person brings to the table?

Geometric mean (a * b * c)^ 1/3. I dunno, sort of seems like a midpoint.

Okay so I then did some thinking. Ha! Whoops.

While trying to think intuitively about what the geometric mean was, I noticed that 2^((log2a + log2b + log2c)/3) = 2^ (log2 (abc) /3) = 2 ^ log 2 (abc)^1/3 = (abc) ^1/3. So the information mean I thought seemed right is the geometric mean. I feel a bit embarrassed, but also happy to have tried to work it out.

This still doesn't tell me whether the arithmetic worlds intuition or the geometric information interpretation is correct.

Any correction or models appreciated.

cousin_it on What are the best arguments for/against AIs being "slightly 'nice'"?

I think there isn't much hope in this direction. Most AI resources will probably be spent on competition between AIs, and AIs will self-modify to remove wasteful spending. It's not enough to have a weak value that favors us, if there's a stronger value that paves over us. We're teaching AI based on human behavior and with a goal of chasing money, but people chasing money often harm other people, so why would AI be nicer than that. It's all just wishful thinking.

anon_28102 on Search 5000 books, speed up your research and personal growth

Please consider this comment [LW(p) · GW(p)].

anon_28102 on Search 5000 books, speed up your research and personal growth

Please consider this comment [LW(p) · GW(p)]

anon_28102 on Search 5000 books, speed up your research and personal growth

This is useful part of code:

curl --header "Content-Type: application/json" --request POST --data '{"searchString": "How to improve the communication skills in Japanese?", "N": 5}' "http://[2600:1f18:17c:2d43:338d:2669:3fa5:82f8]:3000/search"

No execution of script. Returns libgen MD5 and epub CFI

Error may be due to Wifi NAT (use mobile internet) or no IPv6 support

rohinmshah on Estimating Tail Risk in Neural Networks

Fwiw the thing that scares me is the combination of three things:

Producing an accurate estimate requires "understanding" the input
Our standard for the quality of estimates will be very high
The approach is purely algorithmic (rather than e.g. using the flexible intelligence of LLMs / humans)

You presumably agree with (1) and (3). For (2), it seems like there's a lot of correlations you need to take into account for this to produce estimates that aren't wildly off:

Underestimates of risk: These are cases where the presumption of independence neglects correlations between things that happen for catastrophe-causing events. For example, maybe a model is planning to defect when RSA-2048 is factored, and it has N different consistency checks it runs to make sure we aren't fooling it about whether it has been factored. The method has to see that there's a common cause for all the consistency checks to return 1, otherwise it will underestimate risk by a factor exponential in N. (And tbc I think even separately from more exotic examples like RSA-2048, it will be relatively common for this pattern to arise with networks, e.g. it happens any time that in reality there is a common cause that produces many pieces of evidence, and then the network aggregates those pieces of evidence to infer the common cause.)
Overestimates of risk: These are cases where the presumption of independence neglects correlations that rule out catastrophe, and so lead to overestimates. For example, the model might be willing to make up fake instructions for creating bioweapons to help with scifi writing, but never provide real instructions. So the method needs to capture the "understanding" of which instructions are fake vs real.

I agree this isn't a proof of impossibility, since a purely algorithmic approach (SGD) produced the "understanding" in the first place, so in theory a purely algorithmic approach could still capture all that understanding to produce accurate estimates. But it does seem heuristically like you should assign a fairly low probability that this pans out.