Posts

[Paper] All's Fair In Love And Love: Copy Suppression in GPT-2 Small 2023-10-13T18:32:02.376Z

Comments

Comment by starship006 (cody-rushing) on Stupid Question: Why am I getting consistently downvoted? · 2023-11-30T07:40:26.267Z · LW · GW

I'm glad to hear you got exposure to the Alignment field in SERI MATS! I still think that your writing reads off as though your ideas misunderstands core alignment problems, so my best feedback then is to share drafts/discuss your ideas with other familiar with the field. My guess is that it would be preferable for you to find people who are critical of your ideas and try to understand why, since it seems like they are representative of the kinds of people who are downvoting your posts.

Comment by starship006 (cody-rushing) on Stupid Question: Why am I getting consistently downvoted? · 2023-11-30T02:42:46.349Z · LW · GW

(preface: writing and communicating is hard and that i'm glad you are trying to improve)

i sampled two:

this post was hard to follow, and didn't seem to be very serious. it also reads off as unfamiliar with the basics of the AI Alignment problem (the proposed changes to gpt-4 don't concretely address many/any of the core Alignment concerns for reasons addressed by other commentors)

this post makes multiple (self-proclaimed controversial) claims that seem wrong or are not obvious, but doesn't try to justify them in-depth.

overall, i'm getting the impression that your ideas are 1) wrong and you haven't thought about them enough and/or 2) you arent communicating them well enough. i think the former is more likely, but it could also be some combination of the both. i think this means that:

  1. you should try to become more familiar with the alignment field, and common themes surrounding proposed alignment solutions and their pitfalls
  2. you should consider spending more time fleshing out your writing and/or getting more feedback (whether it be by talking to someone about your ideas, or sending out a draft idea for feedback)
Comment by starship006 (cody-rushing) on Shallow review of live agendas in alignment & safety · 2023-11-28T04:03:44.696Z · LW · GW

Reverse engineering. Unclear if this is being pushed much anymore. 2022: Anthropic circuitsInterpretability In The WildGrokking mod arithmetic

 

FWIW, I was one of Neel's MATS 4.1 scholars and I would classify 3/4 of Neel's scholar's outputs as reverse engineering some component of LLMs (for completeness, this is the other one, which doesn't nicely fit as 'reverse engineering' imo). I would also say that this is still an active direction of research (lots of ground to cover with MLP neurons, polysemantic heads, and more)

Comment by starship006 (cody-rushing) on Shall We Throw A Huge Party Before AGI Bids Us Adieu? · 2023-07-04T20:31:16.347Z · LW · GW

Quick feedback since nobody else has commented - I'm all for the AI Safety appearing "not just a bunch of crazy lunatics, but an actually sensible, open and welcoming community." 

But the spirit behind this post feels like it is just throwing in the towel, and I very much disapprove of that. I think this is why I and others downvoted too

Comment by starship006 (cody-rushing) on Lightcone Infrastructure/LessWrong is looking for funding · 2023-06-16T14:21:42.784Z · LW · GW

Ehh... feels like your base rate of 10% for LW users who are willing to pay for a subscription is too high, especially seeing how the 'free' version would still offer everything I (and presumably others) care about. Generalizing to other platforms, this feels closest to Twitter's situation with Twitter Blue, whose rates appear is far, far lower: if we be generous and say they have one million subscribers, then out of the 41.5 million monetizable daily active users they currently have, this would suggest a base rate of less than 3%.

Comment by starship006 (cody-rushing) on AI #11: In Search of a Moat · 2023-05-11T16:49:12.479Z · LW · GW

Thanks for the writeup! 

Small nitpik: typo in "this indeed does not seem like an attitude that leads to go outcomes" 

Comment by starship006 (cody-rushing) on AI #8: People Can Do Reasonable Things · 2023-04-21T04:09:11.220Z · LW · GW

I'm not sure if you've seen it or not, but here's a relevant clip where he mentions that they aren't training GPT-5. I don't quite know how to update from it. It doesn't seem likely that they paused from a desire to conduct more safety work, but I would also be surprised if somehow they are reaching some sort of performance limit from model size.

However, as Zvi mentions, Sam did say:

“I think we're at the end of the era where it's going to be these, like, giant, giant models...We'll make them better in other ways”

Comment by starship006 (cody-rushing) on Widening Overton Window - Open Thread · 2023-03-31T15:18:52.230Z · LW · GW

The increased public attention towards AI Safety risk is probably a good thing. But, when stuff like this is getting lumped in with the rest of AI Safety, it feels like the public-facing slow-down-AI movement is going to be a grab-bag of AI Safety, AI Ethics, and AI... privacy(?). As such, I'm afraid that the public discourse will devolve into "Woah-there-Slow-AI" and "GOGOGOGO" tribal warfare; from the track record of American politics, this seems likely - maybe even inevitable? 

More importantly, though, what I'm afraid of is that this will translate into adversarial relations between AI Capabilities organizations and AI Safety orgs (more generally, that capabilities teams will become less inclined to incorporate safety concerns in their products). 

I'm not actually in an AI organization, so if someone is in one and has thoughts on this dynamic happening/not happening, I would love to hear.

Comment by starship006 (cody-rushing) on "Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman · 2023-03-30T18:54:54.465Z · LW · GW

Sheesh. Wild conversation. While I felt Lex was often missing the points Eliezer was saying, I'm glad he gave him the space and time to speak. Unfortunately, it felt like the conversation would keep moving towards reaching a super critical important insight that Eliezer wanted Lex to understand, and then Lex would just change the topic onto something else, and then Eliezer just had to begin building towards a new insight. Regardless, I appreciate that Lex and Eliezer thoroughly engaged with each other; this will probably spark good dialogue and get more people interested in the field. I'm glad it happened. 

For those who are time constrained and wondering what is in it: Lex and Eliezer basically cover a whole bunch of high-level points related to AI not-kill-everyone-ism, delving into various thought experiments and concepts which formulate Eliezer's worldview. Nothing super novel that you probably haven't heard of if you've been following the field for some time. 

Comment by starship006 (cody-rushing) on GPT-4 Specs: 1 Trillion Parameters? · 2023-03-26T19:32:34.088Z · LW · GW

Relevant Manifold Market: 

Comment by starship006 (cody-rushing) on An Appeal to AI Superintelligence: Reasons to Preserve Humanity · 2023-03-19T04:59:10.033Z · LW · GW

Because you're imagining AGI keeping us in a box?

 

Yeah, something along the lines of this. Preserving humanity =/= humans living lives worth living.

Comment by starship006 (cody-rushing) on An Appeal to AI Superintelligence: Reasons to Preserve Humanity · 2023-03-18T20:34:39.418Z · LW · GW

I didn't upvote or downvote this post. Although I do find the spirit of this message interesting, I have a disturbing feeling that arguing to future AI to "preserve humanity for pascals-mugging-type-reasons" trades off X-risk for S-risk. I'm not sure that any of these aforementioned cases encourage AI to maintain lives worth living. I'm not confident that this meaningfully changes S-risk or X-risk positively or negatively, but I'm also not confident that it doesn't.

Comment by starship006 (cody-rushing) on $20 Million in NSF Grants for Safety Research · 2023-02-28T04:55:01.673Z · LW · GW

With the advent of Sydney and now this, I'm becoming more inclined to believe that AI Safety and policies related to it are very close to being in the overton window of most intellectuals (I wouldn't say the general public, yet). Like, maybe within a year, more than 60% of academic researchers will have heard of AI Safety. I don't feel confident whatsoever about the claim, but it now seems more than ~20% likely. Does this seem to be a reach?

Comment by starship006 (cody-rushing) on We should be signal-boosting anti Bing chat content · 2023-02-18T21:26:41.248Z · LW · GW

There is a fuzzy line between "let's slow down AI capabilities" and "lets explicitly, adversarially, sabotage AI research". While I am all for the former, I don't support the latter; it creates worlds in which AI safety and capabilities groups are pitted head to head, and capabilities orgs explicitly become more incentivized to ignore safety proposals. These aren't worlds I personally wish to be in.

While I understand the motivation behind this message, I think the actions described in this post cross that fuzzy boundary, and pushes way too far towards that style of adversarial messaging

Comment by starship006 (cody-rushing) on The Filan Cabinet Podcast with Oliver Habryka - Transcript · 2023-02-16T05:40:43.609Z · LW · GW

We know, from like a bunch of internal documents, that the New York Times has been operating for the last two or three years on a, like, grand [narrative structure], where there's a number of head editors who are like, "Over this quarter, over this current period, we want to write lots of articles, that, like, make this point..."

 

Can someone point me to an article discussing this, or the documents itself? While this wouldn't be entirely surprising to me, I'm trying to find more data to back this claim, and I can't seem to find anything significant.

Comment by starship006 (cody-rushing) on Transcript of Sam Altman's interview touching on AI safety · 2023-01-20T21:52:18.551Z · LW · GW

It feels strange hearing Sam say that their products are released whenever the feel as though 'society is ready.' Perhaps they can afford to do that now, but I cannot help but think that market dynamics will inevitably create strong incentives for race conditions very quickly (perhaps it is already happening) which will make following this approach pretty hard. I know he later says that he hopes for competition in the AI-space until the point of AGI, but I don't see how he balances the knowledge of extreme competition with the hope that society is prepared for the technologies they release; it seems that even current models, which appear to be far from the capabilities of AGI, are already transformative.

Comment by starship006 (cody-rushing) on How it feels to have your mind hacked by an AI · 2023-01-12T16:12:48.063Z · LW · GW

Let's say Charlotte was a much more advanced LLM (almost AGI-like, even). Do you believe that if you had known that Charlotte was extraordinarily capable, you might have been more guarded about recognizing it for its ability to understand and manipulate human psychology, and thus been less susceptible to it potentially doing so? 

I find that small part of me still think that "oh this sort of thing could never happen to me, since I can learn from others that AGI and LLMs can make you emotionally vulnerable, and thus not fall into a trap!" But perhaps this is just wishful thinking that would crumble once I interact with more and more advanced LLMs.

Comment by starship006 (cody-rushing) on Podcast: What's Wrong With LessWrong · 2022-12-21T07:26:42.385Z · LW · GW

I'm trying to engage with your criticism faithfully, but I can't help but get the feeling that a lot of your critiques here seem to be a form of "you guys are weird": your guys's privacy norms are weird, your vocabulary is weird, you present yourself off as weird, etc. And while I may agree that sometimes it feels as if LessWrongers are out-of-touch with reality at points, this criticism, coupled with some of the other object-level disagreements you were making, seems to overlook the many benefits that LessWrong provides; I can personally attest to the fact that I've improved in my thinking as a whole due to this site. If that makes me a little weird, then I'll accept that as a way to help me shape the world as I see fit. And hopefully I can become a little less weird through the same rationality skills this site helps develop

Comment by starship006 (cody-rushing) on Paper: Large Language Models Can Self-improve [Linkpost] · 2022-10-02T04:25:45.332Z · LW · GW

Humans can often teach themselves to be better at a skill through practice, even without a teacher or ground truth

 

Definitely, but I currently feel that the vast majority of human learning comes with a ground truth to reinforce good habits. I think this is why I'm surprised this works as much as it does: it kinda feels like letting an elementary school kid teach themself math by practicing certain skills they feel confident in without any regard to if that skill even is "mathematically correct".

Sure, these skills are probably on the right track toward solving math problems - otherwise, the kid wouldn't have felt as confident about them. But would this approach not ignore skills the student needs to work on, or even amplify "bad" skills? (Or maybe this is just a faulty analogy and I need to re-read the paper)

Comment by starship006 (cody-rushing) on The ethics of reclining airplane seats · 2022-09-05T17:32:00.375Z · LW · GW

I don't quite understand the perspective behind someone 'owning' a specific space. Do airlines specify that when you purchase a ticket, you are entitled to the chair + the surrounding space (in whatever ambiguous way that may mean)? If not, it seems to me that purchasing a ticket pays for a seat and your right to sit down on it, and everything else is complementary.

Comment by starship006 (cody-rushing) on Looking back on my alignment PhD · 2022-07-02T02:05:50.197Z · LW · GW

I'm having trouble understanding your first point on wanting to 'catch up' to other thinkers. Was your primary message advocating against feeling as if you are 'in dept' until you improve your rationality skills? If so, I can understand that.

But if that is the case, I don't understand the relevance of the lack of a "rationality tech-tree" - sure, there may not be clearly defined pathways to learn rationality. Even so, I think its fair to say that I perceive some people on this blog to currently be better thinkers than I, and that I would like to catch up to their thinking abilities so that I can effectively contribute to many discussions. Would you advocate against that mindset as well?

Comment by starship006 (cody-rushing) on AI Risk, as Seen on Snapchat · 2022-06-17T02:27:28.332Z · LW · GW

I was surprised by this tweet and so I looked it up. I read a bit further and ran into this; I guess I'm kind of surprised to see a concern as fundamental as alignment, whether or not you agree it is an major issue, be so... is polarizing the right word? Is this an issue we can expect to see grow as AI safety (hopefully) becomes more mainstream? "LW  extended cinematic universe" culture getting an increasingly bad reputation seems like it would be extremely devastating for alignment goals in general.

 

Comment by starship006 (cody-rushing) on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-07T19:32:15.508Z · LW · GW

I have a few related questions pertaining to AGI timelines. I've been under the general impression that when it comes to timelines on AGI and doom, Eliezer's predictions are based on a belief in extraordinarily fast AI development, and thus a close AGI arrival date, which I currently take to mean a quicker date of doom. I have three questions related to this matter:

  1. For those who currently believe that AGI (using whatever definition to describe AGI as you see fit) will be arriving very soon - which, if I'm not mistaken, is what Eliezer is predicting - approximately how soon are we talking about. Is this 2-3 years soon? 10 years soon?  (I know Eliezer has a bet that the world will end before 2030, so I'm trying to see if there has been any clarification of how soon before 2030)
  2. How much does Eliezer's views on timelines vary in comparison to other big-name AI safety researchers?
  3. I'm currently under the impression that it takes a significant amount of knowledge of Artificial Intelligence to be able to accurately attempt to predict timelines related to AGI. Is this impression correct? And if so, would it be a good idea to reference general consensus opinions such as Metaculus when trying to frame how much time we have left?
Comment by starship006 (cody-rushing) on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T14:57:23.400Z · LW · GW

[Shorter version, but one I don't think is as compelling] 

Timmy is my personal AI Chef, and he is a pretty darn good one, too. Of course, despite his amazing cooking abilities, I know he's not perfect - that's why there's that shining red emergency shut-off button on his abdomen.

But today, Timmy became my worst nightmare. I don’t know why he thought it would be okay to do this, but he hacked into my internet to look up online recipes. I raced to press his shut-off button, but he wouldn’t let me, blocking it behind a cast iron he held with a stone-cold grip. Ok, that’s fine, I have my secret off-lever in my room that I never told him about. Broken. Shoot, that's bad, but I can just shut off the power, right? As I was busy thinking he swiftly slammed the door shut, turning my own room into an inescapable prison. And so as I cried, wondering how everything could have gone crazy so quickly, he laughed, saying, “Are you serious? I'm not crazy, I’m just ensuring that I can always make food for you. You wanted this!”

And it didn’t matter how much I cried, how much I tried to explain to him that he was imprisoning me, hurting me. It didn’t even matter that he knew it as well. For he was an AI coded to be my personal chef, coded to make sure he could make food that I enjoyed, and he was a pretty darn good one, too.

If you don’t do anything about it, Timmy may just be arriving on everyone's doorsteps in a few years.

Comment by starship006 (cody-rushing) on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-27T14:55:57.985Z · LW · GW

[Intended for Policymakers with the focus of simply allowing for them to be aware of the existence of AI as a threat to be taken seriously through an emotional appeal; Perhaps this could work for Tech executives, too.

I know this entry doesn't follow what a traditional paragraph is, but I like its content. Also it's a tad bit long, so I'll attach a separate comment under this one which is shorter, but I don't think it's as impactful]

 

Timmy is my personal AI Chef, and he is a pretty darn good one, too.

You pick a cuisine, and he mentally simulates himself cooking that same meal millions of times, perfecting his delicious dishes. He's pretty smart, but he's constantly improving and learning.  Since he changes and adapts, I know there's a small chance he may do something I don't approve of - that's why there's that shining red emergency shut-off button on his abdomen.

But today, Timmy stopped being my personal chef and started being my worst nightmare. All of a sudden, I saw him hacking my firewalls to access new cooking methods and funding criminals to help smuggle illegal ingredients to my home.

That seemed crazy enough to warrant a shutdown; but when I tried to press the shut-off button on his abdomen, he simultaneously dodged my presses and fried a new batch of chicken, kindly telling me that turning him off would prevent him from making food for me. 

That definitely seemed crazy enough to me; but when I went to my secret shut-down lever in my room - the one I didn't tell him about - I found it shattered, for he had predicted I would make a secret shut-down lever, and that me pulling it would prevent him from making food for me.

And when, in a last ditch effort, I tried to turn off all power in the house, he simply locked me inside my own home, for me turning off the power (or running away from him) would prevent him from making food for me.

And when I tried to call 911, he broke my phone, for outside intervention would prevent him from making food for me.

And when my family looked for me, he pretended to be me on the phone, playing audio clips of me speaking during a phone call with them to impersonate me, for a concern on their part would prevent him from making food for me.

And so as I cried, wondering how everything could have gone so wrong so quickly, why he suddenly went crazy, he laughed - “Are you serious? I’m just ensuring that I can always make food for you, and today was the best day to do it. You wanted this!"

And it didn’t matter how much I cried, how much I tried to explain to him that he was imprisoning me, hurting me. It didn’t even matter that he knew as well. For he was an AI coded to be my personal chef; and he was a pretty darn good one, too.

If you don’t do anything about it, Timmy may just be arriving on everyone's doorsteps in a few years.

Comment by starship006 (cody-rushing) on Should we buy Google stock? · 2022-05-15T23:02:08.028Z · LW · GW

Meta comment: Would someone mind explaining to me why this question is being received poorly (negative karma right now)? It seemed like a very honest question, and while the answer may be obvious to some, I doubt it was to Sergio. Ic's response was definitely unnecessarily aggressive/rude, and it appears that most people would agree with me there. But many people also downvoted the question itself, too, and that doesn't make sense to me; shouldn't questions like these be encouraged?

Comment by starship006 (cody-rushing) on Convince me that humanity *isn’t* doomed by AGI · 2022-04-15T23:16:22.780Z · LW · GW

I don't know what to think of your first three points but it seems like your fourth point is your weakest by far. As opposed to not needing to, our 'not taking every atom on earth to make serotonin machines' seems to be a combination of:

  1. our inability to do so
  2. our value systems which make us value human and non-human life forms.

Superintelligent agents would not only have the ability to create plans to utilize every atom to their benefit, but they likely would have different value systems. In the case of the traditional paperclip optimizer, it certainly would not hesitate to kill off all life in its pursuit of optimization.

Comment by starship006 (cody-rushing) on Don't die with dignity; instead play to your outs · 2022-04-06T14:23:11.151Z · LW · GW

I like this framing so, so much more. Thank you for putting some feelings I vaguely sensed, but didn't quite grasp yet, into concrete terms.

Comment by starship006 (cody-rushing) on March 2022 Welcome & Open Thread · 2022-03-17T15:51:29.308Z · LW · GW

Hello, does anyone happen to know any good resources related to improving/practicing public speaking? I'm looking for something that will help me enunciate better/ mumble less/ fluctuate tone better. A lot of stuff I see online appears to be very superficial.

Comment by starship006 (cody-rushing) on Russia has Invaded Ukraine · 2022-02-26T02:01:11.698Z · LW · GW

I'm not very well-versed in history so I would appreciate some thoughts from people here who may know more than I. Two questions:

  1. While it seems to be the general consensus that Putin's invasion is largely founded on his 'unfair' desire to reestablish the glory of the Soviet Union, a few people I know argue that much of this invasion is more the consequence of other nations' failures. Primarily, they focus on Ukraine's failure to respect the Minsk agreements, and NATO's expansion eastwards despite their implications/direct statements (not sure which one, I'm hearing different things) that they wouldn't. Any thoughts on the likelihood of Putin still invading Ukraine had those not happened?
  2. Is the United State's condemnation of this invasion hypocritical to many of their actions? I've heard the United States actions in Syria, Iraq, Libya, and Somalia brought up as points to support this.
Comment by starship006 (cody-rushing) on This Year I Tried To Teach Myself Math. How Did It Go? · 2021-12-31T20:33:23.396Z · LW · GW

I really admire your patience to re-learn math entirely from the extremely fundamental levels on-wards. I've had a similar situation with Computer Science for the longest time where I would have a large breadth of understanding of Comp Sci topics, but I didn't feel as if I had a deep, intuitive understanding of all the topics and how they related to each other. All the online courses I found online seemed disjunct and separate from each other, and I would often start them and stop halfway through when I felt as if they were going nowhere. It's even worse when you try to start from scratch but get bored out of your mind re-learing concepts you learned the week prior to it.

Interestingly though, when I got into game development and game design, that was how you were expected to learn - you pick up a bunch of topics/algorithms/design patterns superficially, and they eventually fit together as you interact with them more often.

Perhaps running through a bunch of books through on your study guide will be how I learn Python and AI development properly this time :)

Comment by starship006 (cody-rushing) on A non-magical explanation of Jeffrey Epstein · 2021-12-29T02:45:33.200Z · LW · GW

Woah.... I don't know what exactly I was expecting to get out of this article, but I thoroughly enjoyed it! Would love to see the possible sequence you mentioned come to life.

Comment by starship006 (cody-rushing) on App and book recommendations for people who want to be happier and more productive · 2021-11-07T01:32:06.411Z · LW · GW

Awesome recommendations, I really appreciated them (especially the one on game theory, that was a lot of fun to play through). I would like to also suggest Replacing Guilt series by Nate Soares for those who haven't seen it on his blog or on the EA forum, a fantastic series that I would highly recommend people to check out.

Comment by starship006 (cody-rushing) on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-26T16:42:37.664Z · LW · GW

Attention LessWrong - I do not have any sort of power as I do not have a code. I also do not know anybody who has the code.

I would like to say, though, that I had a very good apple pie last night.

That’s about it. Have a great Petrov day :)

Comment by starship006 (cody-rushing) on Internal Double Crux · 2021-09-25T14:25:58.882Z · LW · GW

Wow! Maybe since I'm less experienced at this sort of stuff, I'm more blown away about this than the average LessWrong browser, but I seriously believe this deserves some more upvotes. Just tried it out on something small and was pleased to see the results. Thank you for this :)