LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

[link] Launching the Respiratory Outlook 2024/25 Forecasting Series
ChristianWilliams · 2024-07-17T19:51:05.380Z · comments (0)

Biasing LLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

Establishing a Connection (Ch 21-22)
a littoral wizard · 2024-08-06T19:21:33.054Z · comments (1)

Freedom and Privacy of Thought Architectures
JohnBuridan · 2024-07-20T21:43:11.419Z · comments (2)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

The Carnot Engine of Economics
StrivingForLegibility · 2024-08-09T15:59:40.458Z · comments (0)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

[link] Risk Overview of AI in Bio Research
J Bostock (Jemist) · 2024-07-15T00:04:41.818Z · comments (0)

[question] Can UBI overcome inflation and rent seeking?
Gordon Seidoh Worley (gworley) · 2024-08-01T00:13:51.693Z · answers+comments (34)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

Reasoning is not search - a chess example
p.b. · 2024-08-06T09:29:40.451Z · comments (3)

Effective Empathy
Thac0 · 2024-07-11T15:14:22.430Z · comments (1)

[question] What are the strategic implications if aliens and Earth civilizations produce similar utilities?
Maxime Riché (maxime-riche) · 2024-08-06T21:16:21.719Z · answers+comments (1)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

Tokyo (日本語) Japan - ACX Meetups Everywhere Fall 2024
Emi (emi-2) · 2024-08-29T18:35:28.013Z · comments (0)

Can Current LLMs be Trusted To Produce Paperclips Safely?
Rohit Chatterjee (rohit-c) · 2024-08-19T17:17:07.530Z · comments (0)

[link] So You've Learned To Teleport by Tom Scott
landscape_kiwi · 2024-07-17T18:04:37.272Z · comments (0)

Interest poll: A time-waster blocker for desktop Linux programs
nahoj · 2024-08-22T20:44:04.479Z · comments (5)

[link] Clopen sandwiches
dkl9 · 2024-07-14T13:07:58.345Z · comments (0)

Likelihood calculation with duobels
Martin Gerdes (martin-gerdes) · 2024-10-01T16:21:01.268Z · comments (0)

The great Enigma in the sky: The universe as an encryption machine
Alex_Shleizer · 2024-08-14T13:21:58.713Z · comments (1)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Crafting Polysemantic Transformer Benchmarks with Known Circuits
Evan Anders (evan-anders) · 2024-08-23T22:03:15.288Z · comments (0)

Want to work on US emerging tech policy? Consider the Horizon Fellowship.
Elika · 2024-07-31T18:12:14.247Z · comments (0)

[question] How do you finish your tasks faster?
Cipolla · 2024-08-21T20:01:41.306Z · answers+comments (2)

Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)

Tbilisi Georgia - ACX Meetups Everywhere Fall 2024
Dmitrii (dmitrii) · 2024-08-29T18:36:43.223Z · comments (4)

[link] Game Theory and Society
Zero Contradictions · 2024-08-05T04:27:37.275Z · comments (0)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (0)

On predictability, chaos and AIs that don't game our goals
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-07-15T17:16:32.766Z · comments (8)

[question] How do you follow AI (safety) news?
PeterH · 2024-09-24T13:58:48.916Z · answers+comments (2)

Bellevue-Redmond USA - ACX Meetups Everywhere Fall 2024
Cedar (xida-ren) · 2024-08-29T18:43:57.014Z · comments (6)

Madrid - ACX Meetups Everywhere Fall 2024
Pablo Villalobos (pvs) · 2024-08-05T18:36:55.136Z · comments (0)

[question] Is there a Schelling point for group house room listings?
NoSignalNoNoise (AspiringRationalist) · 2024-07-23T03:03:29.639Z · answers+comments (0)

[link] Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Karolis Jucys (karolis-ramanauskas) · 2024-07-18T17:02:06.179Z · comments (0)

Auckland New Zealand - ACX Meetups Everywhere Fall 2024
Mark Gilmour (mark-gilmour) · 2024-08-29T18:35:31.852Z · comments (0)

[question] Calibration training for 'percentile rankings'?
david reinstein (david-reinstein) · 2024-09-14T21:51:55.705Z · answers+comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

quetzal_rainbow on DanielFilan's Shortform Feed

Not only "good ", but "obedient", "non-deceptive", "minimal impact", "behaviorist" and don't even talk about "mindcrime".

cubefox on DanielFilan's Shortform Feed

In this sense agent foundations research seems similar to research on normative ethics.

ape-in-the-coat on Alignment by default: the simulation hypothesis

It’s certainly possible for simulations to differ from reality, but they seem less useful the more divergent from reality they are.

Depends on what the simulation is being used for, which you also can't deduce from inside of it.

Maybe the simulation could be for pure entertainment (more like a video game), but you should ascribe a relatively low prior to that IMO.

Why? This statement requires some justification.

I'd expect a decent chunk of high fidelity simulations made by humans to be made for entertainment, maybe even absolute majority, if we take into account how we've been using similar technologies so far.

It does if you simultaneously think your creator will eternally reward you for doing so, and/or eternally punish you for failing to.

Not at all. You still have to evaluate this offer using your own mind and values. You can't sidestep this process by simply assuming that Creator's will by definition is the purpose of your life, and therefore you have no choice but to obey.

niplav on AI #84: Better Than a Podcast

Building a superintelligence under current conditions will turn out fine.
No one will build a superintelligence under anything like current conditions.
We must prevent at almost all costs anyone building superintelligence soon.

I don't think this is a valid trilemma: Between fine and worth preventing at "almost all costs" there is a pretty large gap. I think "fine" was intended to mean "we don't all die" or something as bad as that.

anders-lindstroem on Anders Lindström's Shortform

That is a very interesting perspective and mindset! Do you in that scenario think you will focus on value created in terms of solving technical problems or do you think you will focus on "softer" problems that are more human wellbeing centric?

anders-lindstroem on Anders Lindström's Shortform

Thanks for your input. I really like that you pointed out that AI is just one of many things that could go wrong, perhaps people like me and others are to caught up in the p(doom) buzz that we don't see all the other stuff.

But I wounder one thing about your Plan B, which seems rational, that what if a lot of people have entry-level care work as their back-up. How will you stave of that competition? Or do you think its a matter of avoiding loss aversion and get out of your Plan A game early and not linger (if some pre-stated KPI of yours goes above or below a certain threshold) to grab one of those positions?

leogao on leogao's Shortform

people love to find patterns in things. sometimes this manifests as mysticism- trying to find patterns where they don't exist, insisting that things are not coincidences when they totally just are. i think a weaker version of this kind of thinking shows up a lot in e.g literature too- events occur not because of the bubbling randomness of reality, but rather carry symbolic significance for the plot. things don't just randomly happen without deeper meaning.

some people are much more likely to think in this way than others. rationalists are very far along the spectrum in the "things just kinda happen randomly a lot, they don't have to be meaningful" direction.

there are some obvious cognitive bias explanations for why people would see meaning/patterns in things. most notably, it's comforting to feel like we understand things. the idea of the world being deeply random and things just happening for no good reason is scary.

but i claim that there is something else going on here. I think an inclination towards finding latent meaning is actually quite applicable when thinking about people. people's actions are often driven by unconscious drives to be quite strongly correlated with those drives. in fact, unconscious thoughts are often the true drivers, and the conscious thoughts are just the rationalization. but from the inside, it doesn't feel that way; from the inside it feels like having free will, and everything that is not a result of conscious thought is random or coincidental. this is a property that is not nearly as true of technical pursuits, so it's very reasonable to expect a different kind of reasoning to be ideal.

not only is this useful for modelling other people, but it's even more useful for modelling yourself. things only come to your attention if your unconscious brain decides to bring them to your attention. so even though something happening to you may be a coincidence, whether you focus on it or forget about it tells you a lot about what your unconscious brain is thinking. from the inside, this feels like things that should obviously be coincidence nonetheless having some meaning behind them. even the noticing of a hypothesis for the coincidence is itself a signal from your unconscious brain.

I don't quite know what the right balance is. on the one hand, it's easy to become completely untethered from reality by taking this kind of thing too seriously and becoming superstitious. on the other hand, this also seems like an important way of thinking about the world that is easy for people like me (and probably lots of people on LW) to underappreciate.

linda-linsefors on [Intuitive self-models] 3. The Homunculus

The way I understand it the homunculus is part of self. So if you put the wanting in the homunculus, it's also inside self. I don't know about you, but my self concept has more than wanting. To be fair, he homunculus concept is also a bit richer than wanting (I think?) but less encompassing than the full self (I think?).

rom on Cryonics is free

I don't know.

If the aldehyde preservation method is as good as traditional cryopreservation, then this looks like a pretty glaring market inefficiency—someone should be able to swoop in and undercut the established cryo companies.

I just don't know enough about the object level arguments to say much with confidence, but I'm a bit skeptical such a gap in the market exists.

lelapin on Dialogue introduction to Singular Learning Theory

I don't strongly disagree but do weakly disagree on some points so I guess I'll answer

Re first- if you buy into automated alignment work by human level AGI, then trying to align ASI now seems less worth it. The strongest counterargument to this I see is that "human level AGI" is impossible to get with our current understanding, as it will be superhuman in some things and weirdly bad at others.

Re second- disagreements might be nitpicking on "few other approaches" vs "few currently pursued approaches". There are probably a bunch of things that would allow fundamental understanding if they panned out (various agent foundations agendas, probably safe ai agendas like davidad's), though one can argue they won't apply to deep learning or are less promising to explore than SLT