LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

Probably Not a Ghost Story
George Ingebretsen (george-ingebretsen) · 2024-06-12T22:55:26.264Z · comments (4)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (22)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

The Sequences on YouTube
Neil (neil-warren) · 2024-01-07T01:44:39.663Z · comments (9)

Bayesian inference without priors
DanielFilan · 2024-04-24T23:50:08.312Z · comments (8)

Essaying Other Plans
Screwtape · 2024-03-06T22:59:06.240Z · comments (4)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

[question] Thoughts on Francois Chollet's belief that LLMs are far away from AGI?
O O (o-o) · 2024-06-14T06:32:48.170Z · answers+comments (17)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)

The Limitations of GPT-4
p.b. · 2023-11-24T15:30:30.933Z · comments (12)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

AI debate: test yourself against chess 'AIs'
Richard Willis · 2023-11-22T14:58:10.847Z · comments (35)

Am I going insane or is the quality of education at top universities shockingly low?
ChrisRumanov (pseudonymous-ai) · 2023-11-20T03:53:30.056Z · comments (30)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

Facebook is Paying Me to Post
jefftk (jkaufman) · 2023-11-14T19:10:07.303Z · comments (5)

[link] How to Upload a Mind (In Three Not-So-Easy Steps)
aggliu · 2023-11-13T18:13:32.893Z · comments (0)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

[question] How are you preparing for the possibility of an AI bust?
Nate Showell · 2024-06-23T19:13:45.247Z · answers+comments (16)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

[link] Forecasting future gains due to post-training enhancements
elifland · 2024-03-08T02:11:57.228Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gilch on Open Thread Fall 2024

Not sure I understand what you mean by that. The Universe seems to follow relatively simple deterministic laws. That doesn't mean you can use quantum field theory to predict the weather. But chaotic systems can be modeled as statistical ensembles. Temperature is a meaningful measurement even if we can't calculate the motion of all the individual gas molecules.

If you're referring to human irrationality in particular, we can study cognitive bias [? · GW], which is how human reasoning diverges from that of idealized agents in certain systematic ways. This is a topic of interest at both the individual level of psychology, and at the level of statistical ensembles in economics.

buck on Thomas Kwa's Shortform

In terms of developing better misalignment risk countermeasures, I think the most important questions are probably:

How to evaluate whether models should be trusted or untrusted: currently I don't have a good answer and this is bottlenecking the efforts to write concrete control proposals.
How AI control should interact with AI security tools inside labs.

More generally:

How can we get more evidence on whether scheming is plausible?
How scary is underelicitation? How much should the results about password-locked models [LW · GW] or arguments about being able to generate small numbers of high-quality labels or demonstrations [AF · GW] affect this?

benito on The Median Researcher Problem

Curated. I think this model is pretty useful and well-compressed, and I'm glad to be able to concisely link to it.

The policy implications are still much open to debate, for here on LessWrong and for other ecosystems in the world.

hastings-greer on Both-Sidesism—When Fair & Balanced Goes Wrong

I have observed a transition. 12 years ago, the left-right split was based on many loosely correlated factors and strategic/inertial effects, creating bizarre situations like near perfect correlation between opinions on Gay Marriage and privatization of social security. I think at that time you could reason much better if you could recognize that the separation between left and right was not natural. I at least have a ton of cached arguments from this era because it became such a familiar dynamic.

Nowadays, I don't think this old schema really applies, especially among the actual elected officers and party leadership. The effective left right split is mono-factor: you are right exactly in proportion to your personal loyalty to one Donald J. Trump, resulting in bizarre situations like Dick Cheney being classified as "Left."

elityre on Eli's shortform feed

That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better

The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that's better than the past, by creating new products that add value to people's lives.

I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda.

It's been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better.

Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more profitable. At minimum, someone could have forgone venture funding and built this as a cashflow business.

It's true that this is a market that depends on economies of scale, because the quality of your product is proportional to the size of your matching pool. But I don't buy that this is insurmountable. Just like with any startup, you start by serving a niche market really well, and then expand outward from there. (The first niche I would try for is by building an amazing match-making experience for female grad students at a particular top university. If you create a great experience for the women, the men will come, and I'd rather build an initial product for relatively smart customers. But there are dozens of niches one could try for.)

But it seems like no one tried to recreate OkC, much less creating something better, until the manifold team built manifold.love (currently in maintenance mode)? Not that no one succeeded. To my knowledge, no else one even tried. Possibly Luna counts, but I've heard through the grapevine that they spent substantial effort running giant parties, compared to actually developing and launching their product—from which I infer that they were not very serious. I've been looking for good dating apps. I think if a serious founder was trying seriously, I would have heard about it.

Thousands of funders a year, and no one?!

That's such a massive failure, for almost a decade, that it suggests to me that the SV ideology of building things that make people's lives better is broadly propaganda. The best founders might be relentlessly resourceful, but a tiny fraction of them seem to be motivated by creating value for the world, or this low hanging fruit wouldn't have been left hanging for so long.

This is of course in addition to the long list of big tech companies who exploit their network-effect monopoly power to extract value from their users (often creating negative societal externalities in the process), more than creating value for them. But it's a weaker update that there are some tech companies that do ethically dubious stuff, compared to the stronger update that there was no startup that took on this obvious, underserved, human problem.

My guess is that the tech world is a silo of competence (because competence is financially rewarded), but operates from an ideology with major distortions / blindspots, that are disconnected from commonsense reasoning about what's Good. eg following profit incentives, and excitement about doing big things (independent from whether those good things have humane or inhumane impacts) off a cliff.

ryan_b on The Median Researcher Problem

To me memetic normally reads something like "has a high propensity to become a meme" or "is meme-like" I had no trouble interpreting the post from this basis.

I push back against trying to hew closely to usages from the field of genetics. Fundamentally I feel like that is not what talking about memes is for; it was an analogy from the start, not meant for the same level of rigor. Further, memes and how meme-like things are is much more widely talked about than genetics, so insofar as we privilege usage considerations I claim switching to one matching genetics would require more inferential work from readers overall because the population of readers conversant with genetics is smaller.

I also feel like the value of speaking in terms of memes in the post is that the replication crises is largely the fault of non-rigorous treatment; that is to say in many fields the statistical analysis parts really were/are more of a meme inside the field rather than a rigorous practice. People just read other people's published papers analysis sections, and write something shaped like that, replicability be damned.

buck on Anthropic: Three Sketches of ASL-4 Safety Case Components

I don't think this makes any sense. How are you hoping to get the model to attack except by telling it that it's in a control evaluation and you want it to attack? It seems that you are definitely going to have to handle the sandbagging.

james-stephen-brown on Both-Sidesism—When Fair & Balanced Goes Wrong

Hi notfnofn, thanks again for the well considered comment, and for responding to my edited response. I think you've made good points which have revealed clarifications I could have made within the post.

Okay Trump is president now. Hoping that things go well regardless.

Me too. And we'll see if the right-wing and online media's concern that Harris is an equal threat to democracy over the next couple of months. Because if she is an equal threat we shouldn't expect to see a peaceful transfer of power, like when Trump lost. Although, she has already graciously conceded as would be expected of any political candidate except Trump (who has continued to lie about the result of the 2020 election and require his followers and compatriots to do the same) due to the fact that he is held to a different standard. Obviously no one seriously expects Harris to lead an insurrection on the capitol, but they have been convinced that both-sides are equally dangerous, giving a permission structure to vote for Trump.

It's not necessarily that it was the worst issue, but the easiest target.

First of all, I am part of the majority that believe that trans-women shouldn't be competing the women's category in sport. It's dangerous, and undermines the integrity of the category due to the natural physical advantages of being born male, particularly on the extremes.

But my point is, as you say "it's not necessarily the worst issue" whereas the promise to "root out" the "enemy within" is the literally the worst issue. The radical left want to fight for the rights of trans-people in all areas, and unfortunately, I believe, have over-stepped in terms of sport—an entirely optional recreational activity of little to no consequence in my opinion. This is an issue that is adjudicated largely independently of government by international sports' bodies, and I hope that over time a fair and consistent ruling will prevail.

Rooting out the enemy within, on the other hand, is not even considered radical on the right, it's said out loud by a mainstream candidate with popular support. This is how far the centre has shifted.

I'm a bit concerned that you referred to cancel culture as "accountability culture"

I think this is a fair use of both-sidesism, if I'm going to use the loaded term 'cancel culture', I'm going to qualify that this is opposed by others who see this as 'accountability culture'. I'm a believer in the free market of ideas, and my support for this principle doesn't stop when a group of less powerful people collectively use their ideas to combat powerful individuals, I also think companies should be able to act so as to protect themselves from public backlash—I largely believe in free markets in general. Where there are instances of top-down cancelling, which as you mention happens on both sides, I'm opposed to this, and would happily call this cancel culture without qualification. But in my experience that's a small proportion of what people call "cancel culture".

Do you not see this as a false equivalence?—Yes...

Great.

Are you comparing the opinions of US politicians on the left with US politicians on the right?

I'm comparing activists on the left with activists on the right. Both the Democratic and Republican parties profess strong pro-Israel support.

How seriously have you investigated the claim that "Harris's plan is based on what many top economists think is best" and not "Economists find Harris' plan overall better than Trump's, despite its many weaknesses"? Have you controlled for the likelihood that they have other reasons to prefer Kamala to Trump?

The first I'd heard of this was in the debate, as a claim of Harris' that Goldman Sachs and the Wharton School supported her plan, and that 16 Nobel laureates had said that Trump's plan would invite a recession and increase the deficit. This demonstrated her respect for those experts. Trump wasn't able to make any similar claims. Since then I have tried to understand more about tariffs, looking to the Wall St Journal and their explanation of Tariffs, Trump's own interview with John Micklethwait, where he claimed the room full of economists didn't understand tariffs, and this interview with The Economist Editor in Chief Zanny Minton Beddoes where she underscores the strength of Harris's plan relative to Trump's.

These are all respected, relatively right-leaning sources who all agree with Harris, and who's opinions are respected by Harris, as opposed to Trump who has shown disdain for the opinion of the majority of these experts, in favour of his own expertise, borne out of his experience going bankrupt 6 times. I expect that when developing their plans, this same respect for expertise was also at play. So, I think I've investigated this claim seriously enough to have a fair opinion on it.

I'd like to thank you again for this response. I believe you've raised important clarifications that I will consider making in the text itself. As you might know, this cross-posted from my blog, and the blog is actually a series of webpages that I edit continually comprising a growing philosophical framework, and I will likely attempt to make it more ever-green by relying less on a current event. Posting here helps guide my editing process by getting critical feedback from smart people like yourself, so I appreciate your time and efforts.

towards_keeperhood on What are the primary drivers that caused selection pressure for intelligence in humans?

My prior intuitive guess would be that H1 seems quite a decent chunk more likely than H2 or H3.

Actually I changed my mind.

Why I thought this before: H1 seems like a potential runaway-process and is clearly about individual selection which has stronger effects than group selection (and it was mentioned in HPMoR).

Why I don't think this anymore:

It would also be incredibly huge coincidence if intelligence mostly evolved because of social dynamics but happened to be useful for all sorts of other survival techniques hunters and gatherers use. See e.g. Scott Alexander's Book review of "The Secret of our success" [LW · GW].
If there was only individual benefits for intelligence but it was not very useful otherwise then over long timelines group selection would actually select against smarter humans because their neurons would use up more metabolic energy.

However, there's a possibly very big piece of evidence for H3: Humans are both the smartest land animals and have the best interface for using tools, and that would seem like a suspicious coincidence.

I think this is not a coincidence but rather that tool use let humans fall into an attractor basin where payoffs of intelligence were more significant.

sharmake-farah on Thomas Kwa's Shortform

I'd say 1 important question is whether the AI control strategy works out as they hope.

I agree with Bogdan that making adequate safety cases for automated safety research is probably one of the most important technical problems to answer (since conditional on the automating AI safety direction working out, then it could eclipse basically all safety research done prior to the automation, and this might hold even if LWers really had basically perfect epistemics given what's possible for humans, and picked closer to optimal directions, since labor is a huge bottleneck, and allows for much tighter feedback loops of progress, for the reasons Tamay Besiroglu identified):

https://x.com/tamaybes/status/1851743632161935824

https://x.com/tamaybes/status/1848457491736133744