LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

Grounding self-reference paradoxes in reality
Fiora from Rosebloom · 2024-09-29T05:50:30.559Z · comments (3)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

[question] Can UBI overcome inflation and rent seeking?
Gordon Seidoh Worley (gworley) · 2024-08-01T00:13:51.693Z · answers+comments (34)

Ethical Deception: Should AI Ever Lie?
Jason Reid (jason-reid) · 2024-08-02T17:53:38.744Z · comments (2)

Toy Models of Superposition: what about BitNets?
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-08-08T16:29:02.054Z · comments (1)

The Carnot Engine of Economics
StrivingForLegibility · 2024-08-09T15:59:40.458Z · comments (0)

[link] The AI regulator’s toolbox: A list of concrete AI governance practices
Adam Jones (domdomegg) · 2024-08-10T21:15:09.265Z · comments (1)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

[link] Should we abstain from voting? (In nondeterministic elections)
B Jacobs (Bob Jacobs) · 2024-10-02T10:07:43.167Z · comments (5)

Biasing VLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

Differential knowledge interconnection
Roman Leventov · 2024-10-12T12:52:36.267Z · comments (0)

[link] How long should political (and other) terms be?
ohmurphy · 2024-10-14T21:38:43.050Z · comments (0)

[question] Artificial V/S Organoid Intelligence
10xyz (10xyz-coder) · 2024-10-23T14:31:46.385Z · answers+comments (0)

[question] A Different Perspective on Rationality - Would This Be Valuable?
Gabriel Brito (gabriel-brito) · 2024-10-26T18:47:46.416Z · answers+comments (2)

Can Current LLMs be Trusted To Produce Paperclips Safely?
Rohit Chatterjee (rohit-c) · 2024-08-19T17:17:07.530Z · comments (0)

Likelihood calculation with duobels
Martin Gerdes (martin-gerdes) · 2024-10-01T16:21:01.268Z · comments (0)

Methodology: Contagious Beliefs
James Stephen Brown (james-brown) · 2024-10-19T03:58:17.966Z · comments (0)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)

[link] Game Theory and Society
Zero Contradictions · 2024-08-05T04:27:37.275Z · comments (0)

Tbilisi Georgia - ACX Meetups Everywhere Fall 2024
Dmitrii (dmitrii) · 2024-08-29T18:36:43.223Z · comments (4)

[link] A Logical Proof for the Emergence and Substrate Independence of Sentience
rife (edgar-muniz) · 2024-10-24T21:08:09.398Z · comments (31)

[link] The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind
Roko · 2024-10-16T01:24:51.102Z · comments (18)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Bellevue-Redmond USA - ACX Meetups Everywhere Fall 2024
Cedar (xida-ren) · 2024-08-29T18:43:57.014Z · comments (8)

[question] How do you follow AI (safety) news?
PeterH · 2024-09-24T13:58:48.916Z · answers+comments (2)

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke (antonio-clarke) · 2024-09-29T18:48:23.308Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on Three Notions of "Power"

Slavery was abolished and remains abolished through dominance:

first by getting outlawed by the Northern US and Great Britain, who drew strong economic benefit from higher labor prices due to them industrializing earlier for geographic reasons,
secondly by leveraging state dominance during the great depression to demand massive increases in quality and quantity of production, to make it feasible to maintain a non-slave-holding society without having excess labor forces being forced to starve,
thirdly, endless policies that use state violence and reduce the total fertility rate as a side-effect,

Throughout most of history, there has been excess labor, making the value of work fall down close to the cost of subsistence, being only sustainable because landowners see natural fluctuations in their production and therefore desire to keep people around even if it doesn't make short-term economic sense. This naturally creates serfdom and indentured servitude.

It's only really prisoners of war (e.g. African-American chattel slaves) who are slaves due to dominance; ordinary slavery is just poor bargaining power.

tapatakt on I turned decision theory problems into memes about trolleys

Death in Damascus is easy, but boring.

Doomsday Argument is not a decision theory problem... but it can be turned into one... I think the trolley version would be too big and complicated, though.

Obviously, only problems with discrete choice can be expressed as a Trolley problems.

quetzal_rainbow on I turned decision theory problems into memes about trolleys

Wow, XOR-trolley really made me think

going-durden on Three Notions of "Power"

Dominance underlies the things that can be done most efficiently with dominance. The moment dominance is no longer the most efficient force, it collapses, because in the vast majority of cases, dominating others takes a lot of time, energy and effort. This is actually how and why slavery (pretty much the most powerful example of dominance) was abolished: it started to make less economic sense than Bargaining (paid employment of freemen) and just Getting Things Done (through better tools and ultimately machines), so even its most ardent supporters became dispirited.

templarrr on Occupational Licensing Roundup #1

Two things to note.

First - I feel like putting every occupation in the same pile and deciding are you for or against licensing isn't helpful? I personally don't need licensed lawnmower, but I would very much prefer licensed doctor. The cost of mistake in two occupations differs a lot and can be used for a threshold which jobs should require a license.

Second - there should be a difference between doing a thing to yourself (argument can be made even that here we shouldn't have any limits), doing things for free to your friends/relatives with their full knowledge of your skill level and experience (most of the non life-threatening things can probably be allowed here) and selling your craft for money.

cubefox on What TMS is like

Interesting, I really hope TMS gains more acceptance. By the way, according to studies, ECT (the precursor of TMS) is even more effective, though it does have more side effects, due to the required anesthesia, and it is gatekept even more strongly. In my youth I suffered from depression for several years, and all of this likely would have been avoidable with a few ECT sessions (TMS wasn't a thing yet), if it wasn't for the medical system's irrational bias in favor of exclusively using SSRIs and CBT. I think this happens because most medical staff have no idea how terrible depression can be, so they don't get the sense of urgency they'd get from more visible diseases.

vgillioz on vgillioz's Shortform

Pre-registering bcbe94590cf03bb7fbcb1ef04e0f779b1e39037d247ebb83b82ee18d5bcbd59f

going-durden on When do "brains beat brawn" in Chess? An experiment

A related thought: an intelligence can only work on the information that it has, regardless of its veracity, and it can only work on information that actually exists.

My hunch is that the plan of "AI boostraps itself to superintelligence, then superpower, then wipes out humanity" relies on it having access to information that is too well hidden to divine through sheer calculation and infogathering, regardless of its intelligence (ex: the location of all the military bunkers, and nuclear submarines humanity has), or simply does not exist (ex: future Human strategic choices based on coin-flips).

Most AI Apocalypse scenarios depend not only on the AI being superhumanly smart, but being inexplicably Omniscient about things that nobody could be plausibly Omniscient about.

lone17 on Refusal in LLMs is mediated by a single direction

Thanks for the insight on the locality check experiment.

For inducing refusal, I used the code from the demo notebook provided in your post. It doesn't have a section on inducing refusal but I just invert the difference-in-means vector and set the intervention layer to the single layer where said vector was extracted. I believe this has the same effect as what you described, which is to apply the intervention to every token at a single layer. Will checkout your repo to see if I missed something. Thank you for the discussion.

going-durden on Bitter lessons about lucid dreaming

this might not actually be always beneficial. Lucid dreaming also means you remember much more from the dreams, which can extend the lifespan of your recurring nightmares. Not to mention, if you dream lucidly, your consciousness is not resting, and intrusive thoughts will pile up.