LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks
Tom DAVID (tom-david) · 2024-11-27T02:54:16.263Z · comments (0)

Compute and size limits on AI are the actual danger
Shmi (shminux) · 2024-11-23T21:29:37.433Z · comments (5)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (5)

A Sober Look at Steering Vectors for LLMs
Joschka Braun (joschka-braun) · 2024-11-23T17:30:00.745Z · comments (0)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

ARENA 4.0 Impact Report
Chloe Li (chloe-li-1) · 2024-11-27T20:51:54.844Z · comments (0)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

Winning isn't enough
JesseClifton · 2024-11-05T11:37:39.486Z · comments (14)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[question] Which things were you surprised to learn are metaphors?
Gordon Seidoh Worley (gworley) · 2024-11-22T03:46:02.845Z · answers+comments (17)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[question] Any real toeholds for making practical decisions regarding AI safety?
lemonhope (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

Causal inference for the home gardener
braces · 2024-11-27T17:55:52.629Z · comments (1)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Sahil · 2024-11-07T05:27:20.276Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

bart-bussmann on Visible Thoughts Project and Bounty Announcement

Three years later, and we actually got LLMs with visible thoughts, such as Deepseek, QwQ, and (although partially hidden from the user) o1-preview.

I (Nate) find it plausible that there are capabilities advances to be had from training language models on thought-annotated dungeon runs.

Good call!

t14n on Raemon's Shortform

Skill ceilings across humanity is quite high. I think of super genius chess players, Terry Tao, etc.

A particular individual's skill ceiling is relatively low (compared to these maximally gifted individuals). Sure, everyone can be better at listening, but there's a high non-zero chance you have some sort of condition or life experience that makes it more difficult to develop it (hearing disability, physical/mental illness, trauma, an environment of people who are actually not great at communicating themselves, etc).

I'm remindness of what Samo Burja calls "completeness hypothesis":

> It is the idea that having all of the important contributing pieces makes a given effect much, much larger than having most of the pieces. Having 100% of the pieces of a car produces a very different effect than having 90% of the pieces. The four important pieces for producing mastery in a domain are good feedback mechanisms, extreme motivation, the right equipment, and sufficient time. According to the Completeness Hypothesis, people that stably have all four of these pieces will have orders-of-magnitude greater skill than people that have only two or three of the components.

This is not a fatalistic recommendation to NOT invest in skill development. Quite the opposite.

I recommend Dan Luu's 95th %-tile is not that good.

Most people do not approach anywhere near their individual skill ceiling because they lack the four things that Burja lists. As Luu points out, most people don't care that much to develop their skills. People do not care to find good feedback loops, cultivate the motivation, or carve out sufficent time to develop skills. Certain skills may be limited by resources (equipment), but there are hacks that can lead to skill development at a sub-optimal rate (e.g. calisthenics for muscle mass development vs weighted training. Maybe you can't afford a gym membership but push-ups are free).

As @sunwillrise mentioned, there are diminishing returns for developing a skill. The gap from 0th % -> 80th % is actually quite narrow. 80th % -> 98% requires work but is doable for most people, and you probably start to experience diminishing returns around in this range (in my experience).

98%+ results are reserved for those who can have long-term stable environments to cultivate the skill, or the extremely talented.

notfnofn on A very strange probability paradox

Jumping in here: the whole point of the paragraph right after defining "A" and "B" was to ensure we were all on the same page. I also don't understand what you mean by:

Most ordinary people will assume it means that all the rolls were even

and much else of what you've written. I tell you I will roll a die until I get two 6s and let you know how many odds I rolled in the process. I then do so secretly and tell you there were 0 odds. All rolls are even. You can now make a probability distribution on the number of rolls I made, and compute its expectation.

donatas-luciunas on Alignment is not intelligent

philosophy is prone to various kinds of mistakes, such as anthropomorphization

Yes, common mistake, but not mine. I prove orthogonality thesis to be wrong using pure logic.

For example, I don't think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.

Me and LessWrong would probably disagree with you, consensus is that AI will optimize [? · GW] itself.

I am not really interested in debating this

OK, thanks. I believe that my concern is very important, is there anyone you could put in me in touch with so I could make sure it is not overlooked? I could pay.

viliam on Alignment is not intelligent

That's probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.

Great point!

In defense of my position... well, I am going to skip the part about "the AI will ultimately be written in code", because it could be some kind of inscrutable code like the huge matrices of weights in LLMs, so for all practical purposes the result may resemble philosophy-as-usual more than code-as-usual...

Instead I will says that philosophy is prone to various kinds of mistakes, such as anthropomorphization: judging an inhuman system (such as AI) by attributing it human traits (even if there is no technical reason why it should have them). For example, I don't think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.

Thanks for the video.

Sorry, I am not really interested in debating this, and definitely not on the philosophical level; that is exhausting and not really enjoyable to me. I guess we have figure out the root causes of our disagreement, and I would leave it here.

darrenreynolds on A very strange probability paradox

I'm not sure about the off-topic rules here, but how about this:

Why are some of the drinks so expensive, given that all of them are mostly water?

Sometimes we use the phrase "given that" to mean, "considering that". Here, we do not mean, some of the drinks are not mostly water but we are not talking about them. We mean that literally all the drinks are mostly water.

nat-martin on How to use bright light to improve your life.

So glad to hear!

viliam on Noosphere89's Shortform

Huh, I just realized there are two different meanings/goals of moderation/censorship, and it is too easy to conflate them if you don't pay attention.

One is the kind where you don't want the users of your system to e.g. organize a crime. The other is where you want discussions to be disrupted e.g. by trolls.

Superficially, they seem like the same thing: you have moderators, they make the rules, and give bans to people who break them. But now this seems mostly coincidental to me: you have some technical tools, so you use them for both purposes, because that's all you have. However, from the perspective of the people who want to organize a crime, those who try to prevent them are the disruptive trolls.

I guess, my point is that when we try to think about how to improve the moderation, we may need to think about these purposes as potential opposites. Things that make it easier to ban trolls may also make it easier to organize the crime. Which is why people may simultaneously be attracted to Substack or Telegram, and also horrified by what happens at Substack or Telegram.

Maybe there is a more general lesson for the society, unrelated to tech. If you allow people to organize bottom-up, you can get a lot of good things, but you will also get groups dedicated to doing bad things. Western countries seem to optimize for the bottom-up organizations: companies, non-profits, charities, churches, etc. Soviet Union used to optimize for top-down control: everything was controlled by the state, any personal initiative was viewed as suspicious and potentially disruptive. As a result, Soviet Union collapsed economically, but the West got its anti-vaxers and flat-Eathers and everything. During the Cold War, USA was good at pushing the Soviet economical buttons. These days, Russia is good at pushing the Western free speech buttons.

Huh, maybe the analogies go deeper. Soviet Union was surprisingly tolerant of petty crime (people stealing from each other, not from the state). There were some ideological excuses, the petty criminals being technically part of the proletariat. But from the practical perspective, the more people worry about being potential victims of crime, the less attention they pay to organizing a revolution; they may actually wish for more state power, as a protection. So there was an unspoken alliance between the ruling class and the undesirables at the bottom, against everyone in between. And perhaps similarly, big platforms such as Facebook or Twitter seem to have an unspoken alliance with trolls; their shared goal is to maximize user engagement. By reacting to trolls, you don't only make the trolls happy, you also make Zuck happy, because you have spent more time on Facebook, and more ads were displayed to you. It would be naive to expect Facebook to make the discussions better; if they knew how to do that, they do not have the incentive; they actually want to hit exactly the level of badness where most people are frustrated but won't leave yet.

Finding the technical solution against trolls isn't that difficult; you basically need invite-only clubs. The things that the members write could be public or private; the important part is that in order to become a member, you need to get some kind of approval first. This can be implemented in various ways: a member needs to send you an invitation link by an e-mail, a moderator needs to approve your account before you can post. A weaker version of this is the way Less Wrong uses: anyone can join, but the new accounts are fragile and can be downvoted out of existence by the existing members, if necessary. (Works well against individual accounts created infrequently. Wouldn't work against hundred people joining at the same time and mass-upvoting each other. But I assume that the moderators have a red button that could simply disable creating new accounts for a while until the chaos is sorted out.)

But when you look at the offline analogy, these things are usually called "old boy networks", and some people think they should be disrupted. Whether you agree with that or not, probably depends on your value judgment about the network versus the people who are trying to get inside. Do you support the rights of new people to join the groups they want to join, or the rights of the existing members to keep out the people they want to keep out? One person's "trolls" are other person's "diverse voices that deserve to be heard".

So there are two lines of conflict: the established groups versus potential disruptors, and the established groups versus the owners of the system. The owners of the system may want some groups to stop existing, or to change so much that from the perspective of the current members they become different groups under the same name. Offline, the owner of the system could be a dictator, or could be a democratically elected government; I am not proposing a false equivalence here, just saying that from the perspective of the group survival, both can be seen as the strong hand crushing the community. Online, the owners are the administrators. And it is a design choice whether "the owners crushing the community, should they choose so" is made easy or difficult. If it is easy, it will make the groups feel uneasy, especially once the crushing of other groups start. If it is difficult, at least politically if not technically (e.g. Substack or Telegram advertising themselves as the uncensored spaces), we should not be surprised if some really bad things come out of there, because that is the system working exactly as designed.

In case of Less Wrong, we are a separate island, where the owners of the system are simultaneously the moderators of the group, so this level of conflict is removed. But such solutions are very expensive; we are lucky to have enough people with high tech skills and a lot of money available if the group really wants it. For most groups this is not an option; they need to build their community on someone else's land, and sometimes the owners evict them, or increase the rent (by pushing more ads on them).

If you are a free speech absolutist, or if you believe that the world is not fragile, the right way seems kinda obvious: you need an open protocol for decentralized communication with digital signatures. And you should also provide a few reference implementations that are easy to use: a website, a smartphone app, and maybe a desktop app.

At the bottom layer, you have users who provide content on demand; the content is digitally signed and can be cached and further distributed by third parties. A "user" could be a person, a pseudonym, or a technical user. (For example, if you tried to implement Facebook or Reddit on top of this protocol, its "users" would be the actual users, and the groups/subreddits, and the website itself.) This layer would be content-agnostic; it would provide any kind of content for given URI, just like you can send anything using an e-mail attachment, HTTP GET, or a torrent. The content would be digitally signed, so that the third parties (mostly servers, but also peer-to-peer for smaller amounts of data) can cache it and further distribute. In practice, most people wouldn't host their own servers, so they would publish by on a website that is hosted on a server, or using their application which would most likely upload it to some server. (Analogically to e-mail, which can be written in an app and sent by SMTP, or written directly in some web mail.) The system would automatically support downloading your own content, so you could e.g. publish using a website, then change your mind, install a desktop app, download all your content from the website (just like anyone who reads your content could do), and then delete your account on the website and continue publishing using the app. Or move to another website, create an account, and then upload the content from your desktop app. Or skip the desktop app entirely; create a new web account, and import everything from your old web account.

The next layer is versioning; we need some way to say "I want the latest version of this user's 'index.html' file". Also, some way to send direct messages between users (not just humans, but also technical users).

The next layer is about organizing the content. The system can already represent your tweets as tiny plain-text files, your photos as bitmap files, etc. Now you need to put it all together and add some resource descriptors, like XML or JSON files that say "this is a tweet, it consists of this text and this image or video, and was written at this date and time" or "this is a list of links to tweets, ordered chronologically, containing items 1-100 out of 5678 total" or "this is a blog post, with this title, its contents are in this HTML file". To support groups, you also need resource descriptors that say "this is a group description: name, list of members, list of tweets". Now make the reference applications that support all of this, with optional encryption, and you basically have Telegram, but decentralized. Yay freedom; but also expect this system to be used for all kinds of horrible crimes. :(

celarix on What epsilon do you subtract from "certainty" in your own probability estimates?

My opinion is that whatever value of epsilon you pick should be low enough such that it never happens once in your life. "I flipped a coin but it doesn't actually exist" should never happen. Maybe it would happen if you lived for millions of years, but in a normal human lifespan, never once.

anders-lindstroem on Dave Kasten's AGI-by-2027 vignette

Yes, the soon-to-be-here "human level" AGI people talk about is for all intent and purposes ASI. Show me one person who is at the highest expert level on thousands of subjects and that have the content of all human knowledge memorized and can draw the most complex inferences on that knowledge across multiple domains in seconds.