LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Educational CAI: Aligning a Language Model with Pedagogical Theories
Bharath Puranam (bharath-puranam) · 2024-11-01T18:55:26.993Z · comments (1)

Differential knowledge interconnection
Roman Leventov · 2024-10-12T12:52:36.267Z · comments (0)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

Scattered thoughts on what it means for an LLM to believe
TheManxLoiner · 2024-11-06T22:10:29.429Z · comments (4)

Project Adequate: Seeking Cofounders/Funders
Lorec · 2024-11-17T03:12:12.995Z · comments (7)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate
Liron · 2024-11-09T23:39:30.039Z · comments (0)

Agency overhang as a proxy for Sharp left turn
Eris (anton-zheltoukhov) · 2024-11-07T12:14:24.333Z · comments (0)

Bellevue Library Meetup - Nov 23
Cedar (xida-ren) · 2024-11-09T23:05:02.452Z · comments (3)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

AI Compute governance: Verifying AI chip location
Farhan · 2024-10-12T17:36:45.942Z · comments (0)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms
Lorec · 2024-11-12T06:45:26.039Z · comments (5)

[link] Formalize the Hashiness Model of AGI Uncontainability
Remmelt (remmelt-ellen) · 2024-11-09T16:10:05.032Z · comments (0)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

Biasing VLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

If I care about measure, choices have additional burden (+AI generated LW-comments)
avturchin · 2024-11-15T10:27:15.212Z · comments (11)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

Tbilisi Georgia - ACX Meetups Everywhere Fall 2024
Dmitrii (dmitrii) · 2024-08-29T18:36:43.223Z · comments (4)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

Likelihood calculation with duobels
Martin Gerdes (martin-gerdes) · 2024-10-01T16:21:01.268Z · comments (0)

Visualizing small Attention-only Transformers
WCargo (Wcargo) · 2024-11-19T09:37:42.213Z · comments (0)

Ultralearning in 80 days
aproteinengine · 2024-11-26T00:01:23.679Z · comments (7)

Bellevue-Redmond USA - ACX Meetups Everywhere Fall 2024
Cedar (xida-ren) · 2024-08-29T18:43:57.014Z · comments (8)

Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (6)

Germany-wide ACX Meetup
Fernand0 · 2024-11-17T10:08:54.584Z · comments (0)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

[question] What are the primary drivers that caused selection pressure for intelligence in humans?
Towards_Keeperhood (Simon Skade) · 2024-11-07T09:40:20.275Z · answers+comments (15)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

[link] 2024 Election Forecasting Contest
mike20731 · 2024-10-05T20:43:16.203Z · comments (0)

Effects of Non-Uniform Sparsity on Superposition in Toy Models
Shreyans Jain (shreyans-jain) · 2024-11-14T16:59:43.234Z · comments (3)

What are Emotions?
Myles H (zarsou9) · 2024-11-15T04:20:27.388Z · comments (13)

On AI Detectors Regarding College Applications
Kaustubh Kislay (kaustubh-kislay) · 2024-11-27T20:25:48.151Z · comments (0)

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke (antonio-clarke) · 2024-09-29T18:48:23.308Z · comments (0)

Tokyo (日本語) Japan - ACX Meetups Everywhere Fall 2024
Emi (emi-2) · 2024-08-29T18:35:28.013Z · comments (0)

Distillation Of DeepSeek-Prover V1.5
IvanLin (matthewshing) · 2024-10-15T18:53:11.199Z · comments (1)

[link] A Logical Proof for the Emergence and Substrate Independence of Sentience
rife (edgar-muniz) · 2024-10-24T21:08:09.398Z · comments (31)

[link] Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams · 2024-10-22T16:35:13.999Z · comments (0)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions
glykokalyx · 2024-11-10T22:34:58.956Z · comments (0)

[question] How do you follow AI (safety) news?
PeterH · 2024-09-24T13:58:48.916Z · answers+comments (2)

A better “Statement on AI Risk?”
Knight Lee (Max Lee) · 2024-11-25T04:50:29.399Z · comments (4)

Some Comments on Recent AI Safety Developments
testingthewaters · 2024-11-09T16:44:58.936Z · comments (0)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

notfnofn on A very strange probability paradox

Jumping in here: the whole point of the paragraph right after defining "A" and "B" was to ensure we were all on the same page. I also don't understand what you mean by:

Most ordinary people will assume it means that all the rolls were even

and much else of what you've written. I tell you I will roll a die until I get two 6s and let you know how many odds I rolled in the process. I then do so secretly and tell you there were 0 odds. All rolls are even. You can now make a probability distribution on the number of rolls I made, and compute its expectation.

donatas-luciunas on Alignment is not intelligent

philosophy is prone to various kinds of mistakes, such as anthropomorphization

Yes, common mistake, but not mine. I prove orthogonality thesis to be wrong using pure logic.

For example, I don't think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.

Me and LessWrong would probably disagree with you, consensus is that AI will optimize [? · GW] itself.

I am not really interested in debating this

OK, thanks. I believe that my concern is very important, is there anyone you could put in me in touch with so I could make sure it is not overlooked? I could pay.

viliam on Alignment is not intelligent

That's probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.

Great point!

In defense of my position... well, I am going to skip the part about "the AI will ultimately be written in code", because it could be some kind of inscrutable code like the huge matrices of weights in LLMs, so for all practical purposes the result may resemble philosophy-as-usual more than code-as-usual...

Instead I will says that philosophy is prone to various kinds of mistakes, such as anthropomorphization: judging an inhuman system (such as AI) by attributing it human traits (even if there is no technical reason why it should have them). For example, I don't think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.

Thanks for the video.

Sorry, I am not really interested in debating this, and definitely not on the philosophical level; that is exhausting and not really enjoyable to me. I guess we have figure out the root causes of our disagreement, and I would leave it here.

darrenreynolds on A very strange probability paradox

I'm not sure about the off-topic rules here, but how about this:

Why are some of the drinks so expensive, given that all of them are mostly water?

Sometimes we use the phrase "given that" to mean, "considering that". Here, we do not mean, some of the drinks are not mostly water but we are not talking about them. We mean that literally all the drinks are mostly water.

nat-martin on How to use bright light to improve your life.

So glad to hear!

viliam on Noosphere89's Shortform

Huh, I just realized there are two different meanings/goals of moderation/censorship, and it is too easy to conflate them if you don't pay attention.

One is the kind where you don't want the users of your system to e.g. organize a crime. The other is where you want discussions to be disrupted e.g. by trolls.

Superficially, they seem like the same thing: you have moderators, they make the rules, and give bans to people who break them. But now this seems mostly coincidental to me: you have some technical tools, so you use them for both purposes, because that's all you have. However, from the perspective of the people who want to organize a crime, those who try to prevent them are the disruptive trolls.

I guess, my point is that when we try to think about how to improve the moderation, we may need to think about these purposes as potential opposites. Things that make it easier to ban trolls may also make it easier to organize the crime. Which is why people may simultaneously be attracted to Substack or Telegram, and also horrified by what happens at Substack or Telegram.

Maybe there is a more general lesson for the society, unrelated to tech. If you allow people to organize bottom-up, you can get a lot of good things, but you will also get groups dedicated to doing bad things. Western countries seem to optimize for the bottom-up organizations: companies, non-profits, charities, churches, etc. Soviet Union used to optimize for top-down control: everything was controlled by the state, any personal initiative was viewed as suspicious and potentially disruptive. As a result, Soviet Union collapsed economically, but the West got its anti-vaxers and flat-Eathers and everything. During the Cold War, USA was good at pushing the Soviet economical buttons. These days, Russia is good at pushing the Western free speech buttons.

Huh, maybe the analogies go deeper. Soviet Union was surprisingly tolerant of petty crime (people stealing from each other, not from the state). There were some ideological excuses, the petty criminals being technically part of the proletariat. But from the practical perspective, the more people worry about being potential victims of crime, the less attention they pay to organizing a revolution; they may actually wish for more state power, as a protection. So there was an unspoken alliance between the ruling class and the undesirables at the bottom, against everyone in between. And perhaps similarly, big platforms such as Facebook or Twitter seem to have an unspoken alliance with trolls; their shared goal is to maximize user engagement. By reacting to trolls, you don't only make the trolls happy, you also make Zuck happy, because you have spent more time on Facebook, and more ads were displayed to you. It would be naive to expect Facebook to make the discussions better; if they knew how to do that, they do not have the incentive; they actually want to hit exactly the level of badness where most people are frustrated but won't leave yet.

Finding the technical solution against trolls isn't that difficult; you basically need invite-only clubs. The things that the members write could be public or private; the important part is that in order to become a member, you need to get some kind of approval first. This can be implemented in various ways: a member needs to send you an invitation link by an e-mail, a moderator needs to approve your account before you can post. A weaker version of this is the way Less Wrong uses: anyone can join, but the new accounts are fragile and can be downvoted out of existence by the existing members, if necessary. (Works well against individual accounts created infrequently. Wouldn't work against hundred people joining at the same time and mass-upvoting each other. But I assume that the moderators have a red button that could simply disable creating new accounts for a while until the chaos is sorted out.)

But when you look at the offline analogy, these things are usually called "old boy networks", and some people think they should be disrupted. Whether you agree with that or not, probably depends on your value judgment about the network versus the people who are trying to get inside. Do you support the rights of new people to join the groups they want to join, or the rights of the existing members to keep out the people they want to keep out? One person's "trolls" are other person's "diverse voices that deserve to be heard".

So there are two lines of conflict: the established groups versus potential disruptors, and the established groups versus the owners of the system. The owners of the system may want some groups to stop existing, or to change so much that from the perspective of the current members they become different groups under the same name. Offline, the owner of the system could be a dictator, or could be a democratically elected government; I am not proposing a false equivalence here, just saying that from the perspective of the group survival, both can be seen as the strong hand crushing the community. Online, the owners are the administrators. And it is a design choice whether "the owners crushing the community, should they choose so" is made easy or difficult. If it is easy, it will make the groups feel uneasy, especially once the crushing of other groups start. If it is difficult, at least politically if not technically (e.g. Substack or Telegram advertising themselves as the uncensored spaces), we should not be surprised if some really bad things come out of there, because that is the system working exactly as designed.

In case of Less Wrong, we are a separate island, where the owners of the system are simultaneously the moderators of the group, so this level of conflict is removed. But such solutions are very expensive; we are lucky to have enough people with high tech skills and a lot of money available if the group really wants it. For most groups this is not an option; they need to build their community on someone else's land, and sometimes the owners evict them, or increase the rent (by pushing more ads on them).

If you are a free speech absolutist, or if you believe that the world is not fragile, the right way seems kinda obvious: you need an open protocol for decentralized communication with digital signatures. And you should also provide a few reference implementations that are easy to use: a website, a smartphone app, and maybe a desktop app.

At the bottom layer, you have users who provide content on demand; the content is digitally signed and can be cached and further distributed by third parties. A "user" could be a person, a pseudonym, or a technical user. (For example, if you tried to implement Facebook or Reddit on top of this protocol, its "users" would be the actual users, and the groups/subreddits, and the website itself.) This layer would be content-agnostic; it would provide any kind of content for given URI, just like you can send anything using an e-mail attachment, HTTP GET, or a torrent. The content would be digitally signed, so that the third parties (mostly servers, but also peer-to-peer for smaller amounts of data) can cache it and further distribute. In practice, most people wouldn't host their own servers, so they would publish by on a website that is hosted on a server, or using their application which would most likely upload it to some server. (Analogically to e-mail, which can be written in an app and sent by SMTP, or written directly in some web mail.) The system would automatically support downloading your own content, so you could e.g. publish using a website, then change your mind, install a desktop app, download all your content from the website (just like anyone who reads your content could do), and then delete your account on the website and continue publishing using the app. Or move to another website, create an account, and then upload the content from your desktop app. Or skip the desktop app entirely; create a new web account, and import everything from your old web account.

The next layer is versioning; we need some way to say "I want the latest version of this user's 'index.html' file". Also, some way to send direct messages between users (not just humans, but also technical users).

The next layer is about organizing the content. The system can already represent your tweets as tiny plain-text files, your photos as bitmap files, etc. Now you need to put it all together and add some resource descriptors, like XML or JSON files that say "this is a tweet, it consists of this text and this image or video, and was written at this date and time" or "this is a list of links to tweets, ordered chronologically, containing items 1-100 out of 5678 total" or "this is a blog post, with this title, its contents are in this HTML file". To support groups, you also need resource descriptors that say "this is a group description: name, list of members, list of tweets". Now make the reference applications that support all of this, with optional encryption, and you basically have Telegram, but decentralized. Yay freedom; but also expect this system to be used for all kinds of horrible crimes. :(

celarix on What epsilon do you subtract from "certainty" in your own probability estimates?

My opinion is that whatever value of epsilon you pick should be low enough such that it never happens once in your life. "I flipped a coin but it doesn't actually exist" should never happen. Maybe it would happen if you lived for millions of years, but in a normal human lifespan, never once.

anders-lindstroem on Dave Kasten's AGI-by-2027 vignette

Yes, the soon-to-be-here "human level" AGI people talk about is for all intent and purposes ASI. Show me one person who is at the highest expert level on thousands of subjects and that have the content of all human knowledge memorized and can draw the most complex inferences on that knowledge across multiple domains in seconds.

chris_leong on New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters

If it works, maybe it isn't slop?

alexander-gietelink-oldenziel on John Fisher's Shortform

I was about to delete my message because I was afraid it was a bit much but then the likes started streaming in and god knows how much of a sloot i am for internet validation.