Posts

Comments

Comment by Nevin Wetherill (nevin-wetherill) on Open Thread Spring 2024 · 2024-05-10T03:52:21.381Z · LW · GW

Thanks anyway :)

Also, yeah, makes sense. Hopefully this isn't a horribly misplaced thread taking up people's daily scrolling bandwidth with no commensurate payoff.

Maybe I'll just say something here to cash out my impression of the "first post" intro-message in question: its language has seemed valuable to my mentality in writing a post so far.

Although, I think I got a mildly misleading first-impression about how serious the filter was. The first draft for a post I half-finished was a fictional explanatory dialogue involving a lot of extended metaphors... After reading that I had the mental image of getting banned immediately with a message like "oh, c'mon, did you even read the prompt?"

Still, that partially-mistaken mental frame made me go read more documentation on the editor and take a more serious approach to planning a post. A bit like a very mild temperature-drop shock to read "this is like a university application."

I grok the intent, and I'm glad the community has these sorta norms. It seems likely to help my personal growth agenda on some dimensions.

Comment by Nevin Wetherill (nevin-wetherill) on Open Thread Spring 2024 · 2024-05-09T20:57:04.928Z · LW · GW

Thanks! :)

Yeah, I don't know if it's worth it to make it more accessible. I may have just failed a Google + "keyword in quotation marks" search, or failed to notice a link when searching via LessWrong's search feature.

Actually, an easy fix would just be for Google to improve their search tools, so that I can locate any link regardless of how specific for any public webpage just by ranting at my phone.

Anyway, thanks as well to Ben for tagging those mod-staff people.

Comment by Nevin Wetherill (nevin-wetherill) on Open Thread Spring 2024 · 2024-05-09T18:59:09.849Z · LW · GW

Hey, I'm new to LessWrong and working on a post - however at some point the guidelines which pop up at the top of a fresh account's "new post" screen went away, and I cannot find the same language in the New Users Guide or elsewhere on the site.

Does anyone have a link to this? I recall a list of suggestions like "make the post object-level," "treat it as a submission for a university," "do not write a poetic/literary post until you've already gotten a couple object-level posts on your record."

It seems like a minor oversight if it's impossible to find certain moderation guidelines/tips and tricks if you've already saved a draft/posted a comment.

I am not terribly worried about running headfirst into a moderation filter, as I can barely manage to write a comment which isn't as high effort of an explanation as I can come up with - but I do want that specific piece of text for reference, and now it appears to have evaporated into the shadow realm.

Am I just missing a link that would appear if I searched something else?

(Edit: also, sorry if this is the wrong place for this, I would've tried the "intercom" feature, but I am currently on the mobile version of the site, and that feature appears to be entirely missing there - and yes, I checked my settings to make sure it wasn't "hidden")

Comment by Nevin Wetherill (nevin-wetherill) on How do top AI labs vet architecture/algorithm changes? · 2024-05-08T23:35:05.066Z · LW · GW

Thanks! It's no problem :)

Agreed that the interview is worth watching in full for those interested in the topic. I don't think it answers your question in full detail, unless I've forgotten something they said - but it is evidence.

(Edit: Dwarkesh also posts full transcripts of his interviews to his website. They aren't obviously machine-transcribed or anything, more like what you'd expect from a transcribed interview in a news publication. You'll lose some body language/tone details from the video interview, but may be worth it for some people, since most can probably read the whole thing in less time than just watching the interview at normal speed.)

Comment by Nevin Wetherill (nevin-wetherill) on How do top AI labs vet architecture/algorithm changes? · 2024-05-08T21:52:12.853Z · LW · GW

I am not an AI researcher, nor do I have direct access to any AI research processes. So, instead of submitting an answer, I am writing this in the comment section.

I have one definite easily sharable observation. I drew from this a lot of inferences, which I will separate out so that the reader can condition their world-model on their own interpretations of whatever pieces of evidence - if any - are unshared.

This interview in this particular segment, with the most seemingly relevant part to me occuring around roughly the timestamp 40:15.

So, in this segment Dwarkesh is asking Sholto Douglas, a researcher at Google Deepmind a sub-question in a discussion about how researchers see the feasibility of "The Intelligence Explosion."

The intent of this question seems to be to get an object-level description of the workflow of an AI researcher, in order to inform the meta-question of "how is AI going to increase the rate of AI research."

Potentially important additional detail, the other person at that table is Trenton Bricken, a "member of Technical Staff on the Mechanistic Interpretability team at Anthropic" (description according to his website.)

Sholto makes some kind of allusion to the fact that the bulk of his work at the time of this interview does not appear directly relevant to the question, so he seems to be answering for some more generic case of AI researcher.

Sholto's description of his work excerpted from the "About" section of his blog hosted on GitHub.

I’m currently going after end to end learning for robotic manipulation because of the impact it could have on the real world, and the surface area of the problem in contact with understanding how to make agents learn and reason like we do.

I’m currently exploring whether self-supervised learning on play data and clever usage of language to align robot and human video in the same trajectory space can build models which provide a sufficient base that they can be swiftly fine-tuned to any manipulation task.

In the past, I’ve looked into hierarchial RL, energy models for planning, and seeing if we can learn a representation of visual inputs where optimal paths are by definition the shortest path through the transformed space.

In this segment of the podcast, Sholto talks about "scaling laws inference" - seemingly alluding to the fact that researchers will have some compute budget to run experiments, and there will be agreed upon desideratum in the metrics of these experiments which could be used in the process of selecting features for programs which will then be given much larger training runs.

How do the researchers get this compute budget? Do all researchers have some compute resources available beyond just their personal workstation hardware? What does the process look like for spinning up a small-scale training run and reporting its results?

I am unsure, but from context will draw some guesses.

Sholto mentions, in providing further context in this segment:

A lot of good research comes from working backwards from the actual problems you want to solve.

He continues to give a few sentences that seem to gesture at a part of this internal process:

There's a couple of grand problems in making the models better that you identify as issues and then work on "how can I change things to achieve this?" When you scale you also run into a bunch of things and you want to fix behaviors and issues at scale.

This seems to imply that a part of this process is receiving some 'mission' or set of 'missions' (my words not theirs, you could say quests or tasks or assignments) - and then some group(s) of researchers propose and test small scale tests for solutions to those.

Does this involve taking snapshots of these models at the scale where "behaviors or issues" appear and branching them to run shorter, lower compute, continuations of training/reinforcement learning?

Presumably this list of "grand problems" may include some items like:

  • hallucinations
  • failures in reasoning on specific tasks
  • learning patterns which do not generalize well in new domains (learning by 'wrote' instead of learning simpler underlying patterns which can be invoked usefully in a different distribution)

Possibly the "behaviors and issues" which occur "when you scale" include:

  • unforseen differences between observed metrics and projected metrics
  • persistent failures to achieve lower loss on certain sections of the training data
  • tokens or sequences of tokens which cause degenerate behavior (not the human type) across some number of different contexts

Sholto continues:

Concretely, the barrier is a little bit of software engineering, having a code base that's large and capable enough that it can support many people doing research at the same time often makes it complex. If you're doing everything by yourself, your iteration pace is going to be much faster.

Actually operating with other people raises the complexity a lot, for natural reasons familiar to every software engineer and also the inherent running. Running and launching those experiments is easy but there's inherent slowdowns induced by that. So you often want to be parallelizing multiple different streams. You can't be totally focused on one thing necessarily. You might not have fast enough feedback cycles. And then intuiting what went wrong is actually really hard.

This seems to imply that these AI labs have put their finger on the problem of doing work in large teams/titled sub-projects introduces a lot of friction. This could be Sholto's take on the ideal way to run an AI lab which could be informed by AI labs not actually working this way - but I presume Google Deepmind, at least, has a culture where they attempt to prevent individual researchers grumbling a lot about organizational stuff slowing down their projects. It seems, to me, that Sholto is right about it being much faster to do more in "parallel" - where individual researchers can work on these sub problems without having to organize a meeting, submit paperwork, and write memos to 3 other teams to get access to relevant pieces of their work.

The trio continues to talk about the meta-level question and sections relevant to "what does AI research look like" return to being as diffuse as you may expect in a conversation which includes 2/3rds AI researchers and focuses on topics associated with AI research.

One other particular quote that may be relevant to people drawing some inferences - Dwarkesh asks:

That's interesting to think about because at least the compute part is not bottlenecked on more intelligence, it's just bottlenecked on Sam's $7 trillion or whatever, right? If I gave you 10x the [TPUs] to run your experiments, how much more effective a researcher are you?

Sholto:

I think the Gemini program would probably be maybe five times faster with 10 times more compute or something like that.

Dwarkesh:

So that's pretty good. Elasticity of 0.5. Wait, that's insane.

Sholto:

I think more compute would just directly convert into progress.

Dwarkesh goes on to ask why labs aren't reallocating some of the compute they have from running large runs/serving clients to doing experiments if this is such a massive bottleneck.

Sholto replies:

So one of the strategic decisions that every pre-training team has to make is exactly what amount of compute do you allocate to different training runs, to your research program versus scaling the last best thing that you landed on. They're all trying to arrive at an optimal point here. One of the reasons why you need to still keep training big models is that you get information there that you don't get otherwise. So scale has all these emergent properties which you want to understand better.

Remember what I said before about not being sure what's going to fall off the curve. If you keep doing research in this regime and keep on getting more and more compute efficient, you may have actually gone off the path to actually eventually scale. So you need to constantly be investing in doing big runs too, at the frontier of what you sort of expect to work.

What does this actual breakdown look like within Deepmind? Well, obviously Sholto doesn't give us details about that. If you get actual first-hand details about the allocation of compute budgets from this question, I'd be rather surprised...

Well, actually, not terribly surprised. These are modern AI labs, not Eliezer's fantasy-football AI lab from Six Dimensions Of Operational Adequacy. They may just DM you with a more detailed breakdown of what stuff looks like on the inside. I doubt someone will answer publicly in a way that could be tied back to them. That would probably breach a bunch of clauses on a bunch of contracts and get them in actual serious trouble.

What do I infer from this?

Well, first, you can watch the interview and pick up the rhythm. When I've done that, I get the impression that there are some relatively independent researchers who work under the umbrella of departments which have some amount of compute budgeted to them. It seems to me likely that this compute is not budgeted as strictly as something like timeslots on orbital telescopes - such that an individual researchers can have a brilliant idea one day and just go try it using some very-small fraction of their organization's compute for a short period of time. I think there is probably a range of experiment sizes above a certain threshold where you're going to have to have a strong case and make that case to those involved in compute-budgeting in order to get the compute-time to do experiments of that scale.

Does that level of friction with compute available to individual researchers account for the "0.5 elasticity" that Sholto was talking about? I'm not sure. Plausibly there is no "do whatever you want with this" compute-budget for individual researchers beyond what they have plugged into their individual work-stations. This would surprise me, I think? That seems like a dumb decision when you take the picture Sholto was sketching about how progress gets made at face-value. Still, it seems to me like a characteristic dumb decision of large organizations - where they try really hard to have any resource expenditures accounted for ahead of time, such that intangibles like "ability to just go try stuff" get squashed by considerations like "are we utilizing all of our resources with maximum efficiency?"

Hopefully this interview and my analysis is helpful to answering this question. I can probably discuss more, but I've noticed this comment is already rather long, and my brain is telling me that further writing will likely just be meandering and hand-waving.

If there is more content relevant to this discussion able to be mined from this interview, perhaps others will be able to iterate on my attempt and help flesh out all of the parts which seem easy to update our models on.

Comment by Nevin Wetherill (nevin-wetherill) on How do top AI labs vet architecture/algorithm changes? · 2024-05-08T20:27:57.660Z · LW · GW
Comment by Nevin Wetherill (nevin-wetherill) on Introducing AI-Powered Audiobooks of Rational Fiction Classics · 2024-05-06T20:40:34.540Z · LW · GW

(edit: formatting on this appears to have gone all to hell and idk how to fix it! Uh oh!)

(edit2: maybe fixed? I broke out my commentary into a second section instead of doing a spoiler section between each item on the list.)

(edit3: appears fixed for me)

Yep, I can do that legwork!

I'll add some commentary, but I'll "spoiler" it in case people don't wanna see my takes ahead of forming their own, or just general "don't spoil (your take on some of) the intended payoffs" stuff.

  1. https://www.projectlawful.com/replies/1743791#reply-1743791

  2. https://www.projectlawful.com/posts/6334 (Contains infohazards for people with certain psychologies, do not twist yourself into a weird and uncomfortable condition contemplating "Greater Reality" - notice confusion about it quickly and refocus on ideas for which you can more easily update your expectations of future experience within the universe you appear to be getting evidence about. "Sanity checks" may be important. The ability to say to yourself "this is a waste of time/effort to think about right now" may also be important.) (This is a section of Planecrash where a lot of the plot-relevant events have already taken place and are discussed, so MAJOR SPOILERS.) (This is the section that "Negative Coalition" tweet came from.)

  3. https://www.projectlawful.com/posts/5826

  4. https://www.projectlawful.com/replies/1778998#reply-1778998

  5. https://www.projectlawful.com/replies/1743437#reply-1743437

  6. https://www.projectlawful.com/replies/1786657#reply-1786657

  7. https://www.projectlawful.com/replies/1771895#reply-1771895

  1. "No rescuer hath the rescuer. No Lord hath the champion, no mother and no father, only nothingness above." What is the right way to try to become good at the things Eliezer is good at? Why does naive imitation fail? There is a theme here, one which has corners that appear all over Eliezer's work - see Final Words for another thing I'd call a corner of this idea. What is the rest? How does the whole picture fit together? Welp. I started with writing a conversation in the form of Godel Escher Bach, or A Semitechnical Introduction to Solomonoff Induction, where a version of me was having a conversation with an internal model of Eliezer I named "Exiezer" - and used that to work my way through connecting all of those ideas in an extended metaphor about learning to craft handaxes. I may do a LessWrong post including it, if I can tie it to an sufficiently high-quality object-level discussion on education and self improvement.

  2. This is a section titled "the meeting of their minds" where Keltham and Carissa go full "secluded setting, radical honesty, total mindset dump." I think it is one of the most densely interesting parts of the book, and I think represents a few techniques more people should try. "How do you know how smart you really are?" Well, have you ever tried writing a character smarter than you think you are doing something that requires more intelligence than you feel like you have? What would happen if you attempted that? Well, you can have all the time in the world to plan out every little detail, check over your work, list alternatives, study relevant examples/material..m etc. etc. This section has the feeling of people actually attempting at running the race they've been practicing for using the crispest versions of the techniques they've been iterating on. Additionally, "have you ever attempted to 'meet minds' with someone? What sort of skills would you want to single out to practice? What sort of setting seems like it'd work for that?" This section shows two people working through a really serious conflict. It's a place where their values have come seriously into conflict, and yet, to get more of what they both want, they have to figure out how to cooperate. Also, they've both ended up pretty seriously damaged, and they have things they need to untangle.

  3. This is a section called "to earth with science" and... Well, how useful it is depends on how much it's going to be useful to you to think more critically about the academic/scientific institutions we have on this planet. It's very much Eliezer doing a psuedo-rant about what's broken here that echoes the tone of something like Inadequate Equilibria. The major takeaway would be something like the takeaways you get from a piece of accurate satire - the lèse majesté which shatters some of the memes handed down to you by the wiser-than-thou people who grimly say "we know it's not perfect, but it's the best we have" and expect you not to have follow-up questions about that type of assertion.

  4. This is my favorite section from "to hell with science." The entire post is a great lecture about the philosophy and practice of science, but this part in particular touches on a concept I expect to come up in more detail later regarding AIs and agency. One of the cruxes of this whole AI debate is whether you can separate out "intelligence" and "agency" - and this part provides an explanation for why that whole idea is something of a failure to conceptualize these things correctly.

  5. This is Keltham lecturing on responsibility, the design of institutions, and how to critique systems from the lens of someone like a computer programmer. This is where you get some of the juiciest takeaways about Civilization as Eliezer envisions it. The "basic sanity check" of "who is the one person responsible for this" & requisite exception handling is particularly actionable, IMO.

  6. "Learn when/where you can take quick steps and plant your feet on solid ground." There's something about feedback loops here, and the right way to start getting good at something. May not be terribly useful to a lot of people, but it stood out as a prescription for people who want to learn something. Invent a method, try to cheat, take a weird shortcut, guess. Then, check whether your results actually work. Don't go straight for "doing things properly" if you don't have to.

  7. Keltham on how to arrive at Civilization from first-principles. This is one of the best lectures in the whole series from my perspective. The way it's framed in the form of a thought-experiment that I could on-board and play with in spare moments.

Hopefully some of these are interesting and useful to you Mir, as well as others here. There's a ton of other stuff, so I may write a follow-up with more later on if I have more time.

Comment by Nevin Wetherill (nevin-wetherill) on Why I'm not doing PauseAI · 2024-05-05T20:15:49.735Z · LW · GW

I'm pretty sure GPT-N won't be able to do it, assuming they follow the same paradigm.

I am curious if you would like to expand on this intuition? I do not share it, and it seems like one potential crux.

I do not share this intuition. I would hope that if I say a handful of words about synthetic data, that will be sufficient to move your imagination into a less certain condition regarding this assertion. I am tempted to try something else first.

Is this actually important to your argument? I do not see how it would end up factoring into this problem, except by more quickly obviating the advances made with understanding and steering LLM behavior. What difference does it make to the question of "stop" if instead of LLMs in a GPT wrapper, the thing that can in fact solve that task in blender is some RNN-generating/refining action-token optimizer?

"LLMs can't do X" doesn't mean X is going to take another 50 years. The field is red hot right now. In many ways new approaches to architecture are vastly easier to iterate on than new bio-sciences work, and those move blazingly fast compared to things like high energy/nuclear/particle physics experiments - and even those sometimes outpace regulatory bodies' abilities to assess and ensure safety. The first nuclear pile got built under some bleachers on a campus.

Even if you're fully in Gary Marcus's camp on criticism of the capabilities of LLMs, his prescriptions for fixing it don't rule out another approach qualitatively similar to transformers that isn't any better for making alignment easy. There's a gap in abstract conceptualization here, where we can - apparently - make things which represent useful algorithms while not having a solid grasp on the mechanics and abstract properties of those algorithms. The upshot of pausing is that we enter into a period of time where our mastery becomes deeper and broader while the challenges we are using it to address remain crisp, constrained, and well within a highly conservative safety-margin.

How is it obvious that we are far away in time? Certain emergency options like centralized compute resources under international monitoring are going to be on long critical paths, and if someone has a brilliant idea for [Self-Censored, To Avoid Being Called Dumb & Having That Be Actually True] and that thing destroys the world before you have all AI training happening in monitored data centers with some totally info-screened black-box fail-safes - then you end up not having a ton of "opportunity cost" compared to the counterfactual where you prevented the world getting paper-clipped because you were willing, in that counterfactual, to ever tell anyone "no, stop" with the force of law behind it.

Seriously,

by stopping AI progress, we lose all the good stuff that AI would lead to

... that's one side of the cost-benefit analysis over counterfactuals. Hesitance over losing even many billions of dollars in profits should not stop us from preventing the end of the world.

"The average return from the urn is irrelevant if you're not allowed to play anymore!" (quote @ 1:08:10, paraphrasing Nassim Taleb)

not having any reference AI to base our safety work on

Seems like another possible crux. This seems to imply that either there has been literally no progress on real alignment up to this point, or you are making a claim about the marginal returns on alignment work before having scary-good systems.

Like, the world I think I see is one where alignment has been sorely underfunded, but even prior to the ML revolution there was good alignment de-confusion work that got done. Having the entire conceptual framing of "alignment" and resources like Arbital's catalogue pre-2022 and "Concrete Problems in AI Safety" and a bunch of other things all seem like incremental progress towards making a world in which one could attempt to build an AI framework -> AGI instantiation -> ASI direct-causal-descendant and have that endeavor not essentially multiply human values by 0 on almost every dimension in the long run.

Why can't we continue this after liquid nitrogen gets poured onto ML until the whole thing freezes and shatters into people bickering about lost investments? Would we expect a better ratio of good/bad outcomes on our lottery prize wheel in 50 years after we've solved the "AI Pause Button Problem" and "Generalized Frameworks for Robust Corrigibility" and "Otherizing/Satisficing/Safe Maximization?" There seems to be a lot of blueprinting, rocket equation, Newtonian mechanics, astrophysics type work we can do even if people can't make 10 billion dollars 5 years from now selling GPT-6 powered products.

It's not that easy for an unassisted AI to do harm - especially existentially significant harm.

I am somewhat baffled by this intuition.

I suspect what's going on here is that the more harm something is proposed to be capable of, the less likely people think that it is.

Say you're driving fast down a highway, what do you think a split second after seeing a garbage truck pull out in front of you while you are traveling towards it with >150km/hr relative velocity? Say your brain could generate in that moment a totally reflectively coherent probability distribution over expected outcomes. Does the distribution go from the most probability mass in scenarios with the least harm to the least probability mass in scenarios with the most harm? "Ah, it's fine," you think, "it would be weird if this killed me instantly, less weird if I merely had a spinal injury, and even less weird if I simply broke my nose and bruised my sternum."

The underlying mechanism - the actual causal processes involved in determining the future arrangements of atoms or the amount of reality fluid in possible Everett Branch futures grouped by similarity in features - that's what you have to pay attention to. What you find difficult to plan for, or what you observe humans having difficulty planning for, does not mean you can map that same difficulty curve onto AI. AlphaZero did not experience the process of getting better at the games it played in the same way humanity experienced that process. It did not have to spend 26 IRL years learning the game painstakingly from traditions established over hundreds of IRL years - it did not have to struggle to sleep well and eat healthy and remain clean from vices and motivated in order to stay on task and perform at its peak capacity. It didn't even need to solve the problem perfectly - like representing "life and death" robustly - in order to in reality beat the top humans and most (or all, modulo the controversy over StockFish being tested in a suboptimal condition) of the top human engines.

It doesn't seem trivial for a certain value of the word "trivial." Still, I don't see how this consideration gives anyone much confidence in it qualitatively being "really tough" the way getting a rocket carrying humans to Mars is tough - where you don't one day just get the right lines of code into a machine and suddenly the cipher descrambles in 30 seconds when before it wouldn't happen no matter how many random humans you had try to guess the code or how many hours other naively written programs spent attempting to brute-force it.

Sometimes you just hit enter, kick a snowball at the top of a mountain, and 1s and 0s click away, and an avalanche comes down in a rush upon the schoolhouse 2 km below your skiing trail. The badness of the outcome didn't matter one bit to its probability of occuring in those real world conditions in which it occured. The outcome depended merely on the actual properties of the physical universe, and what effects descend from which causes. See Beyond The Reach of God for an excellent extended meditation on this reality.

Comment by Nevin Wetherill (nevin-wetherill) on Introducing AI-Powered Audiobooks of Rational Fiction Classics · 2024-05-05T01:03:41.248Z · LW · GW

I am not sure if this has been well enough discussed elsewhere regarding Project Lawful, but it is worth reading despite some fairly high value-of-an-hour multiplied by the huge time commitment and the specifics of how it is written adds many more elements to "pros" side of the general "pros and cons" considerations of reading fiction.

It is also probably worth reading even if you've got a low tolerance for sexual themes - as long as that isn't so low that you'd feel injured by having to read that sorta thing.

If you've ever wondered why Eliezer describes himself as a decision theorist, this is the work that I'd say will help you understand what that concept looks like in his worldview.

I read it first in the Glowfic format, and since enough time had passed since finishing it when I found the Askwho AI audiobook version, I also started listening to that.

It was taken off of one of the sites hosting for TOS, and so I've since been following it update to update on Spotify.

Takeaways from both formats:

Glowfic is still superior if you have the internal motivation circuits for reading books in text. The format includes reference images for the characters in different poses/expressions to follow along with the role playing. The text often includes equations, lists of numbers, or things written on whiteboards which are hard to follow in pure audio format. There are also in-line external links for references made in the work - including things like background music to play during certain scenes.

(I recommend listening to the music anytime you see a link to a song.)

This being said, Askwho's AI audiobook is the best member of its format I've seen so far. If you have never listened to another AI voiced audiobook, I'd almost recommend not starting with this one, because you risk not appreciating it as much as it deserves, and simultaneously you will ruin your chances of being able to happily listen to other audiobooks done with AI. This is, of course, a joke. I do recommend listening to it even if it's the first AI audiobook you'll ever listen to - it deserves being given a shot, even by someone skeptical of the concept.

I think a good compromise position, with the audio version is to listen to chapters with lecture content with the glowfic in another tab, in "100 posts per page" mode, on the page containing the rough start-to-end transcript for that episode. Some of the discussion you will likely be able to follow in working memory while staring at a waiting room wall, but good luck on heavily-math stuff. If you're driving and get to heavy-math, it'd probably also be a good idea to just have that section open on your phone so you can scroll through those parts again 10 minutes later while you're waiting for your friend to meet you out in the parking lot.

TL;DR - IMO Project Lawful is worth reading for basically everyone, despite length and other tiny flinches from content/genre/format. Glowfic format has major benefits, but Askwho did a extraordinarily good job at making the AI-voiced format work. You should probably have the glowfic open somewhere alongside the audiobook, since some things are going to be lost if you're trying to do it purely as an audiobook.

Comment by Nevin Wetherill (nevin-wetherill) on If you are assuming Software works well you are dead · 2024-05-04T17:26:10.168Z · LW · GW

I have been contemplating Connor Leahy's Cyborgism and what it would mean for us to improve human workflows enough that aligning AGI looks less like:

Sisyphus attempting to roll a 20 tonne version of The One Ring To Rule Them All into the caldera of Mordor while blindfolded and occasionally having to bypass vertical slopes made out of impossibility proofs that have been discussed by only 3 total mathematicians ever in the history of our species - all before Sauron destroys the world after waking up from a restless nap of an unknown length.

I think this is what you meant by "make the ultimate <programming language/text editor/window manager/file system/virtual collaborative environment/interface to GPT/...>"

Intuitively, the level I'm picturing is:

A suite of tools that can be booted up from a single icon on the home screen of a computer which then allows anyone who has decent taste in software to create essentially any program they can imagine up to a level of polish that people can't poke holes in even if you give a million reviewers 10 years of free time.

Can something at this level be accomplished?

Well, what does coding look like currently? It seems to look like a bunch of people with dark circles under their eyes reading long strings of characters in something basically the equivalent to an advanced text editor, with a bunch of additional little windows of libraries and graphics and tools.

This is not as domain where human intelligence performs with as much ease as in other domains like spearfishing or bushcraft.

If you want to build Cyborgs, I am pretty sure where you start is by focusing on building software that isn't god-machines, throwing out the old book of tacit knowledge, and starting over with something that makes each step as intuitive as possible. You probably also focus way more on quality over quantity/speed.

So, plaintext instructions on what kind of software you want to build, or a code repository and a plaintext list of modifications? Like, start with an elevator pitch, see the raw/AI generated material, critique in a step-by-step organized fashion where debugging/feature analysis checklists are generated and scored on whether they included everything you would have thought of/stuff that is valid which you didn't think of.

I think the point in this post is valid, though a bit more in the realm of "angsty shower-thought" rather than a ready-to-import heuristic for analysing the gap between competence-in-craft and unleashed-power-levels.

There is a bit of a bootstrapping problem with Cyborgism. I don't think you get the One Programming Suite To Rule Them All by plugging in a bunch of different LLMs fine tuned to do one part of the process really well - then packaging the whole thing up and running it on 6 gaming GPUs. That is the level of super-software that seems in reach, and it just seems doomed to be full of really hard-to-perceive holes like a super high-dimensional block of swiss cheese.

Does that even get better if we squeeze the weights of LLMs to get LLeMon juice:

Python script that does useful parts of the cool smart stuff LLMs do without all the black box fuzzy embedding-spaces/vectors + filter the juice for pulp/seeds (flaws in the specific decoded algorithm that could cause errors via accidental or deliberate adversarial example) + sweeten it (make the results make sense/be understandable to humans)

... then plug a bunch of that LLeMonade into a programming suite such that the whole thing works with decently competent human programmer(s) to reliably make stuff that actually just works & alert people ahead of time of the exhaustive set of actual issues/edge cases/gaps in capability of a program?

This problem does seem difficult - and probably the whole endeavor just actually won't work well enough IRL, but it seems worth trying?

Like, what does it look like to throw everything and the kitchen sink at Alignment? It probably looks at least a little like the Apollo program, and if you're doing NASA stuff properly, then you end up making some revolutionary products for everyone else.

I think those products - the random externalities of a healthy Alignment field - look more like tools that work simply and reliably, rather than the giant messy LLM hairballs AI labs keep coughing up and dumping on people.

Maybe all of this helps flesh out and make more useful the flinchy heuristic of "consumer software works terribly -> ... -> AI destroys the future."

Alignment as a field goes out ahead of the giant rolling gooey hairball of spaghetti-code Doom - untangles it and weaves it into a beautiful textile - or we are all dead.