LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
I think there are approximately zero people actively trying to take actions which, according to their own world model, are likely to lead to the destruction of the world. As such, I think it's probably helpful on the margin to publish stuff of the form "model internals are surprisingly interpretable, and if you want to know if your language model is plotting to overthrow humanity there will probably be tells, here's where you might want to look". More generally "you can and should get better at figuring out what's going on inside models, rather than treating them as black boxes" is probably a good norm to have.
I could see the argument against, for example if you think "LLMs are a dead end on the path to AGI, so the only impact of improvements to their robustness is increasing their usefulness at helping to design the recursively self-improving GOFAI that will ultimately end up taking over the world" or "there exists some group of alignment researchers that is on track to solve both capabilities and alignment such that they can take over the world and prevent anyone else from ending it" or even "people who thing about alignment are likely to have unusually strong insights about capabilities, relative to people who think mostly about capabilities".
I'm not aware of any arguments that alignment researchers specifically should refrain from publishing that don't have some pretty specific upstream assumptions like the above though.
elizabeth-1 on Elizabeth's ShortformAll of the problems you list seem harder with repeated within-person trials.
redman on How would you navigate a severe financial emergency with no help or resources?Aella has written a bunch on camgirling, including questions to ask yourself about suitability. The advice is probably applicable to twitch streaming or tiktok video creation too.
Content or product creation online and sales has never been easier, but it's hard work with no guarantee of payoff.
There is a lot written/on youtube about retail arbitrage, if you have stores nearby you might be able to do that.
Fully remote entry level call center/sales jobs are pretty much always hiring, they're pretty demanding though. Staffing agencies can potentially set you up, be ready to do things like convince elderly people to give to a charity.
Longer term, professional certifications in healthcare or IT can usually make a big difference in someone's life.
I'm guessing the funding environment for entrepreneurs isn't great right now, but something is always happening somewhere.
Amazon Mechanical Turk used to be a decent way of doing boring work for a little money.
Free money from the government is a thing, but services for people who don't have kids are few and far between.
Starting from zero today is hard, but the best thing you can do is get out there and start trying stuff. You don't have any opportunity cost for trying things, which isn't true for a lot of people. You can go into the unknown knowing that the alternative (your present situation) is complete crap.
Good luck!
sharmake-farah on Please stop publishing ideas/insights/research about AIPrivacy of communities isn't a solvable problem in general, as soon as your community is large enough to compete with the adversary, it's large enough and conspicuous enough that the adversary will pay attention to it and send in spies and extract leaks.
I disagree with this in theory as a long-term concern, but yes in practice the methods to have privacy of communities haven't been implemented or tested at all, and I agree with the general sentiment that it isn't worth the steep drawbacks of privacy to protect secrets, which does unfortunately make me dislike the post due to it's strength of recommendations.
So while I could in theory disagree with you, in practice right now I mostly have to agree with the comment that there will not be such an infrastructure for private alignment ideas.
Also to touch on something here that isn't too relevant and could be considered a tangent:
If your acceptable lower limit for basically anything is zero you wont be allowed to do anything, really anything.
This is why perfectionism is such a bad thing, and why you need to be able to accept that failure happens. You cannot have 0 failures IRL.
chakshu-mira on Ophiology (or, how the Mamba architecture works)## Discretize B ## # [B,N] [E->N] [B,E] B = layer.W_B(x[b,l]) # no bias
Shouldn't this be x[:,l] instead of x[b,l]?
mako-yass on Please stop publishing ideas/insights/research about AIThere never will be an infrastructure for this.
I should be less resolute about this. It would kind of be my job to look for a design that could do it.
One thing we've never seen is a system where read receipts are tracked and analyzed on the global level and read permissions are suspended and alerts are sent to admins if an account is doing too many unjustified reads.
This would prevent a small number of spies from extracting a large number of documents.
I suppose we could implement that today.
If there are actual crimes going on, I'd imagine the police should be called.
If a student is genuinely acting in bad faith - attending a class and ruining it for their peers - then they should be removed from the class and sent to a counselor/social worker.
Otherwise, "disruptive" is a difficult thing to pin down when there's no actual instruction to be interrupting.
linda-linsefors on LessWrong Community Weekend 2024 [Applications Open]The EA SummerCamp takes place the next weekend
I've not been to any of these, but would like to. Is there any info up yet for this years EA SummerCamp?
strictly weaker
add "than Condorcet" in this sentence since its only implied but not said
review-bot on the QACI alignment plan: table of contentsThe LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?