LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

Interpreting the Learning of Deceit
RogerDearnaley (roger-d-1) · 2023-12-18T08:12:39.682Z · comments (14)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

Protestants Trading Acausally
Martin Sustrik (sustrik) · 2024-04-01T14:46:26.374Z · comments (4)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (7)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

[link] GDP per capita in 2050
Hauke Hillebrandt (hauke-hillebrandt) · 2024-05-06T15:14:30.934Z · comments (8)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

Results from the Turing Seminar hackathon
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-07T14:50:38.377Z · comments (1)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

[link] My MATS Summer 2023 experience
James Chua (james-chua) · 2024-03-20T11:26:14.944Z · comments (0)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

[question] How does it feel to switch from earn-to-give?
Neil (neil-warren) · 2024-03-31T16:27:22.860Z · answers+comments (4)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)

Non-myopia stories
lberglund (brglnd) · 2023-11-13T17:52:31.933Z · comments (10)

Please Understand
samhealy · 2024-04-01T12:33:20.459Z · comments (11)

[link] What fuels your ambition?
Cissy · 2024-01-31T18:30:53.274Z · comments (1)

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis · 2023-12-17T20:28:57.854Z · comments (7)

Deception Chess: Game #2
Zane · 2023-11-29T02:43:22.375Z · comments (17)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

elizabeth-1 on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

There’s a lot here and if my existing writing didn’t answer your questions, I’m not optimistic another comment will help. Instead, how about we find something to bet on? It’s difficult to identify something both cruxy and measurable, but here are two ideas:

I see a pattern of:
1. CEA takes some action with the best of intentions
2. It takes a few years for the toll to come out, but eventually there’s a negative consensus on it.
3. A representative of CEA agrees the negative consensus is deserved, but since it occurred under old leadership, doesn’t think anyone should draw conclusions about new leadership from it.
4. CEA announces new program with the best of intentions.

So I would bet that within 3 years, a CEA representative will repudiate a major project occurring under Zach’s watch.

I would also bet on more posts similar to Bad Omens in Current Community Building or University Groups Need Fixing coming out in a few years, talking about 2024 recruiting.

cubefox on Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion

Arguably, "basic logical principles" are those that are true in natural language. Otherwise nothing stops us from considering absurd logical systems where "true and true" is false, or the like. Likewise, "one plus one is two" seems to be a "basic mathematical principle" in natural language. Any axiomatization which produces "one plus one is three" can be dismissed on grounds of contradicting the meanings of terms like "one" or "plus" in natural language.

The trouble with set theory is that, unlike logic or arithmetic, it often doesn't involve strong intuitions from natural language. Sets are a fairly artificial concept compared to natural language collections (empty sets, for example, can produce arbitrary nestings), especially when it comes to infinite sets.

screwtape on 2024 Unofficial LW Community Census, Request for Comments

So, I think Fight 1 is funny, but it is kind of high context, involving reading two somewhat long stories. (Planecrash in particular is past a million words long!) I'd considered "Who would win in a fight, Eliezer Yudkowsky or Scott Alexander? ["Eliezer", "Scott", "Wait, what's this? It's Aella with a steel chair!"]" and "Who is the rightful caliph? ["Eliezer Yudkowsky","Scott Alexander", "Wait, what's this? It's Robin Hanson with a steel chair!"]" but feel a bit weird about including real people.

I think they're just as funny though, and far more people will understand it, so maybe I should switch. Anyone have convincing thoughts here?

programcrafter on 2024 Unofficial LW Community Census, Request for Comments

It would be nice to see at least three questions which would demonstrate how person extracts evidence from others' words, how much time and emotions could they spend if they needed to communicate a point precisely, etc.

I'll have to sleep on that, actually. Will return tomorrow, presumably with more concrete ideas)

jan-betley on 2024 Unofficial LW Community Census, Request for Comments

I don't think there's a lot of value in distinguishing 3000 and 1,000,000 and probably for any aggregate you'll want to show this will just be "later than 2200" or something like that. But yes this way they can't make a statement that this will be 1,000,000 which is some downside.

I'm not a big fan of looking at the neighbors to decide whether this is a missing answer or high estimate (it's OK to not want to answer this one question). So some N/A or -1 should be ok.

(Just to make it clear, I'm not saying this is an important problem)

programcrafter on What can we learn from insecure domains?

99.9% of all cryptocurrency projects are complete scams (conservative estimate).

On first skim, I agree with the estimate as stated and would post a limit order for either side. I'd also like to note that "crypto in general is terrible" instead of "all crypto is terrible", as there have been applications developed that do not allow you to lose all funds without explicit acknowledgement.

Similarly, Cyber Security is terrible. Basically every computer on the internet is infected with multiple types of malware.

It is presumably terrible (or, 30%, result of availability bias), and I've observed bugs happen because functionality upgrade did not consider its interaction with all other code. However, I disagree that every computer is infected; probably you meant that it is under constant stream of attack attempts?

The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small. As a matter of IT, it would be really nice to have systems which don't logically fail at all, but that requires good education and pressure-resistance skills for software developers.

screwtape on 2024 Unofficial LW Community Census, Request for Comments

I have no opinion on the difference and chatgpt agrees with you, so sure, changed to "eighty percent of the benefit."

raemon on JargonBot Beta Test

The most important thing is "There is a small number of individuals who are paying attention, who you can argue with, and if you don't like what they're doing, I encourage you to write blogposts or comments complaining about it. And if your arguments make sense to me/us, we might change our mind. If they don't make sense, but there seems to be some consensus that the arguments are true, we might lose the Mandate of Heaven or something."

I will personally be using my best judgment to guide my decisionmaking. Habryka is the one actually making final calls about what gets shipped to the site, insofar as I update that we're doing a wrong thing, I'll argue about it."

It happening at all already constitutes “going wrong”.

This particular sort of comment doesn't particularly move me. I'm more likely to be moved by "I predict that if AI used in such and such a way it'll have such and such effects, and those effects are bad." Which I won't necessarily automatically believe, but, I might update on if it's argued well or seems intuitively obvious once it's pointed out.

I'll be generally tracking a lot of potential negative effects and if it seems like it's turning out "the effects were more likely" or "the effects were worse than I thought", I'll try to update swiftly.

screwtape on 2024 Unofficial LW Community Census, Request for Comments

Thanks for the year catch.

I could check their expected price of bitcoin, but that feels like more weight than I want to put on bitcoin- it's already a little bit overlapping with the S&P question. What I'd like to replace it with is something that 1. will have a definitive answer by next summer, 2. people have enough context to understand the question, and 3. isn't at obvious.

The questions are not checking for social skills. I am not sure how I'd do that on an online survey that's going to be self reported, and if you have thoughts about that I'm kind of curious? What percentage of the survey being about social skills would be sufficient? (I'm heavily into meetups and in-person gatherings for LessWrong events, so I might be one of the more receptive audiences for this line of argument!)

programcrafter on What TMS is like

I think TMS doesn't rewrite anything, instead activating neural circuits in another pattern? Then, new pattern is not depressed, brain can notice that (on either conscious or subconscious level) and make appropriate changes to neural connections.

Basically, I believe that whatever resulting patterns (including "other parts of you changed into something non-native and alien") you dis-endorse, are "committed" with significantly lower probability.