LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Switching to a 4GB SD
jefftk (jkaufman) · 2024-09-23T11:20:05.432Z · comments (1)

Can startups be impactful in AI safety?
Esben Kran (esben-kran) · 2024-09-13T19:00:33.306Z · comments (0)

[question] Does life actually locally *increase* entropy?
tailcalled · 2024-09-16T20:30:33.148Z · answers+comments (27)

Keyboard Gremlins
jefftk (jkaufman) · 2024-09-20T02:30:07.140Z · comments (0)

[link] How harmful is music, really?
dkl9 · 2024-09-17T14:53:25.426Z · comments (6)

A Policy Proposal
phdead · 2024-09-29T20:45:34.745Z · comments (4)

Contextual Constitutional AI
aksh-n · 2024-09-28T23:24:43.529Z · comments (0)

[link] When to join a respectability cascade
B Jacobs (Bob Jacobs) · 2024-09-24T07:54:16.051Z · comments (1)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[question] I want a good multi-LLM API-powered chatbot
rotatingpaguro · 2024-09-08T09:40:52.736Z · answers+comments (3)

The Other Existential Crisis
James Stephen Brown (james-brown) · 2024-09-21T01:16:38.011Z · comments (24)

Just How Good Are Modern Chess Computers?
nem · 2024-09-19T18:57:21.254Z · comments (1)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

Electric Mandola
jefftk (jkaufman) · 2024-09-21T13:40:04.772Z · comments (0)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (9)

[question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal
doomyeser · 2024-09-11T18:07:19.385Z · answers+comments (9)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

Will AI and Humanity Go to War?
Simon Goldstein (simon-goldstein) · 2024-10-01T06:35:22.374Z · comments (4)

[link] AISafety.info: What are Inductive Biases?
Algon · 2024-09-19T17:26:24.581Z · comments (4)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

[link] Comparing Forecasting Track Records for AI Benchmarking and Beyond
ChristianWilliams · 2024-09-25T21:01:15.975Z · comments (0)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (7)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Grounding self-reference paradoxes in reality
Fiora from Rosebloom · 2024-09-29T05:50:30.559Z · comments (3)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

taras-kutsyk on Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?

Thanks! We'll take a closer look at these when we decide to extend our results for more models.

taras-kutsyk on Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?

Let me make sure I understand your idea correctly:

We use a separate single-layer model (analogous to the SAE encoder) to predict the SAE feature activations
We train this model on the SAE activations of the finetuned model (assuming that the SAE wasn't finetuned on the finetuned model activations?)
We then use this model to determine "what direction most closely maps to the activation pattern across input sequences, and how well it maps".

I'm most unsure about the 2nd step - how we train this feature-activation model. If we train it on the base SAE activations in the finetuned model, I'm afraid we'll just train it on extremely noisy data, because feature activations essentially do not mean the same thing, unless your SAE has been finetuned to appropriately reconstruct the finetuned model activations. (And if we finetune it, we might just as well use the SAE and feature-universality techniques I outlined without needing a separate model).

bohaska on Open Thread Fall 2024

If spaced repetition is the most efficient way of remembering information, why do people who learn a music instrument practice every day instead of adhering to a spaced repetition schedule?

pazzaz on Any Trump Supporters Want to Dialogue?

I think you are missing something. The lawsuits were fine, though maybe a little silly as most of them were thrown out because of lack of standing. I'm thinking more of the "fake elector plot", where Trump pressured Mike Pence to certify fake electors on Jan 6 (as Pence said: "choose between [Trump] and the constitution"). I think trying to execute that plan was wrong, because if they would have succeeded then Trump would have stolen the election.

And Trump may not have supported everything the J6 rioters did, but he was the reason that they were there. He said that the election was stolen. He said that it "allows for the termination of all rules, regulations, and articles, even those found in the Constitution". On Jan 6 he called on them to pressure Mike Pence and other lawmakers to go through with his plan: to steal the election.

deepthoughtlife on Any Trump Supporters Want to Dialogue?

I don't think people believe that asking the legal system to rule on whether the laws were properly followed is somehow disqualifying, so unless I am mistaken about what they are claiming, it didn't happen in any meaningful way.

The media has intentionally misrepresented this. He believed there was cheating from the other side, and said so. He used the normal methods to complain about that, and the normal lawsuits about it to get the court to rule on the matter. It's all very normal. When the courts decided to not consider the matter (which was itself improper since they generally did that without considering the actual merits of the cases, generally claiming that it was somehow moot because the election was already over) he did nothing and just let his opponent become president (while continuing to vociferously complain).

Both Al Gore and Hilary Clinton made roughly the same level of complaining about the result as Trump. (Since I think both of them are terrible, that is a negative comparison, and I dislike that Trump matched them, but it isn't disqualifying.) You could actually argue it was his job to make these lawsuits (to see that the federal election was properly executed). Coming up with who the electors would be if the lawsuit changed the results is normal (and not at all new). There was no attempt to go outside the legal system.

In all likelihood there was a nonzero amount of cheating (but we don't actually know if it was favoring Biden, Republicans can cheat too) but I doubt there was any major conspiracy of it. I expect there is always some cheating by both sides, and we should try to reduce it, though I have no opinion on exactly how much or little there is. There were enough anomalies that investigating it would have made sense, if only to prevent them in the future.

The J6 riots were just normal riots, on a small scale, that Trump didn't support at all and were nothing approaching a coup at all. Congress was not in any real danger, and there were no plans to take over the government. While I strongly disapprove of riots, this has been dramatically overplayed for purely partisan purposes by people who want to tar him with it. The people who support actual riots for political purposes are his opponents, not him.

gunnar_zarncke on Consciousness As Recursive Reflections

Does LessWrong need link posts for astralcodexten?

Not in general, no.

Aren't LessWrong readers already pretty aware of Scott's substack?

I would be surprised if the overlap is > 50%

I'm linkposting it because I think this fits into a larger pattern of understanding cognition that will play an important role in AI safety and AI ethics.

crissman on Crissman's Shortform

Thanks for the comments. You're right that "will not extend your life" is too strong. I revised it to "is unlikely to significantly extend your life." Given the impact of other factors on longevity (strength training: 25%, aerobic exercise: 37%, walking 12k steps: 65%, 20g nuts daily: 15%), I do feel the reduction in all-cause mortality from weight loss shouldn't be the top priority.

pazzaz on Any Trump Supporters Want to Dialogue?

People often say one of the reasons they won't vote for Trump is his attempts to overturn the results of the 2020 election. What is your view on that?

deepthoughtlife on Any Trump Supporters Want to Dialogue?

I'm a lifelong independent centrist who leans clearly Republican in voting despite my nature. I actually mostly read Dem or Dem leaning sources for the majority of my political reading (though it isn't super skewed and I do make sure to seek out both sides). I would definitely vote for a Democrat that seemed like a good candidate with good policies (and have at the state level). I believe it is my duty as an American citizen to know a lot more about politics than I would prefer. (I kind of hate politics.)
I really wished there was a valid third option in 2016, but unfortunately I couldn't even find a third party candidate that seemed better than Trump even then. Hilary was a truly abysmal candidate. That isn't actually enough to get me to vote for someone, and I would rather do a protest vote than vote for someone that would be a bad choice. In the end, I only decided to vote for Trump two weeks before the election instead of a protest vote.
Due to preexisting animus, I probably would have ignored Trump's actual words and deeds on the 2016 campaign trail if the media hadn't constantly lied about them, but the things he said were both much truer than claimed, and actually made a lot of sense. The media has never stopped lying about Trump since then, but they just shredded their credibility with me and many others. I don't recall the sources, but unless I am remembering incorrectly (which is always a possibility), this lying campaign against Trump was directly suggested through op-eds in major newspapers, and then implemented. You can probably make good points against Trump, but no one seems to actually complain about things that are both true and actually bad? (I am very pedantic about truth, and find that I'm not interested in listening to people who twist things and then pretend they are true.)
I deeply want to change and improve things, and chafe at the idea of being restricted to some old formula for life but I've been forced to realize that conservatism is necessary. I am very much a centrist in terms of ideology, but in the current state of America, that means being very open to conservative ways and values. Understanding why what you are changing is the way it is, and being careful with the changes are both very necessary; since most things are actually pretty well tuned, incautious changes usually make things much worse.
The extremely obvious reason why people support Trump is because he was a good and effective president (in comparison to other presidents, which is a low bar, I know). President is a very difficult job where people mostly screw up, and much of it is ceremonial, but Trump had a large number of successes compared to what I expected going in. The state of the country was clearly improved by his actions, and would be again. The country and world situation got much worse under his successor, Joe Biden (in a way that also mirrored the failures of Trump's predecessor).
I've had a strong personal distaste for Trump for decades, but he was either the best president in my lifetime or very near. I'm loyal to the America, so I've been forced to upgrade my opinion of him dramatically. I still wouldn't like to hang out with him, and wouldn't encourage others to do so, but that isn't the point of selecting a president. It's for the good of the country.
He's actually a centrist; there is a reason he was comfortable as a democrat before, and as a Republican now. None of his personal positions are extreme, and though he's willing to work with people more extreme than him, and it tends to be to the right, the only reason he worked primarily with Republicans is because the Democrats were busy trying to score political points instead of advancing their policies. Since Trump actually wants things, people can work with him if they choose to, and they don't have to do anything extreme to do so.
His opponents are pretty unimpressive at best. Kamala Harris was a terrible state politician (I'm Californian and saw what she did in my state); either corrupt as hell or incompetent as hell (likely both), though I don't have particular evidence at hand of it. She was a completely ineffectual VP. Her VP choice is deeply unimpressive. She obviously helped cover up Biden's decline into clearly being an unfit president. Her ideas are either stolen from Trump's campaign or incredibly harmful. Almost no one actually expects her to be a good president? Just a few months ago even the Democrats thought she was completely incompetent, and she was only selected for contingent reasons that had nothing to do with her quality as a candidate.
Additionally, the Democrats have gone too far, and the Republicans need to be given another turn. As an independent, I wouldn't want a particular party to grow too powerful, and the democrats are in a much stronger overall position in terms of controlling the country outside of politics. If Trump wins, the Democrats will still be in a strong position and have a lot of control over the next four years, while the Democrats might get away with going completely overboard like they have been trying to.
Some short points: He's not the most accurate speaker, and really doesn't care to be (which rubs me the wrong way), but he means what he says, and actually tries to follow through. I actually think he lies less than the standard politician, which is, I admit, mostly an inditement of his fellow politicians. He's the only president I know of to reduce regulation. He appointed very capable judges whose legal reasoning seems pretty good (even when I disagree with them). He was willing to work with the opposition, but had genuine goals for the good of the country rather than just being political. I know what I'm getting with him at this point (Trump is who he is). Trump will be term limited, so the next election would be an even fight between the Dems and Reps.
If you want to respond to my post, I'm open to pushback, though I don't know how much I would say since I am a long term lurker who rarely comments (mostly in bursts). I would prefer responding to things much shorter than my post here I admit. I'm not happy with how long this is, but I tend to be long winded on each point in a discussion and have to consciously dial it back. Since each part of my reply would likely be long, it would be difficult to respond to something this length personally. I would prefer to talk in general, rather get bogged down in details that are not actually important to how people actually view the situation. That said, important details are obviously important to talk about.

jonathan-kutasov on Interpretability of SAE Features Representing Check in ChessGPT

Thanks for the suggestion! This sounds pretty cool and I think would be worth trying.

One thing that might make this a bit tricky is finding the right subset of the data to feed into Claude. Each feature only fires very rarely so it can be easy to fool yourself into thinking that you found a good classifier when you haven’t.

For example, many of the features we found only fire when they see check. However, many cases of check don’t activate the feature. The problem we ran into is that check is such an infrequent occurrence that you can only get a good number of samples showing check by taking a ton of examples overall, or by upweighting the check class in your sampling.

So if we show Claude all the examples where a feature fired and then some equal number of randomly chosen examples where it didn’t, chances are that just using “is in check” will be a great classifier. I think we can get around this with prompting Claude to find as many restrictions as possible, but sort of an interesting thing that might come up.