Posts

Unnatural abstractions 2024-08-10T22:31:42.949Z
Aprillion (Peter Hozák)'s Shortform 2024-04-10T11:29:01.973Z
The Usefulness Paradigm 2022-12-26T13:23:58.722Z
Why square errors? 2022-11-26T13:40:37.318Z

Comments

Comment by Aprillion on Darwinian Traps and Existential Risks · 2024-08-26T11:55:04.894Z · LW · GW

Evolution isn't just a biological process; it's a universal optimization algorithm that applies to any type of entity

Since you don't talk about the other 3 forces of biological evolution, or about "time evolution" concept in physics...

And since the examples seem to focus on directional selection (and not on other types of selection), and also only on short-term effect illustrations, while in fact natural selection explains most aspects of biological evolution, it's the strongest long-term force, not the weakest one (anti-cancer mechanisms and why viruses don't usualy kill theit host are also well explained by natural selection even if not listed as examples here, evolution by natural selection is the thing that well explains ALL of those billions of years of biology in the real world - including cooperation, not just competition)...

Would it be fair to say that you use "evolution" only by analogy, not trying to build a rigorous causal relationship between what we know of biology and what we observe in sociology? There is no theory of the business cycle because of allele frequency, right?!?

Comment by Aprillion on Limitations on Formal Verification for AI Safety · 2024-08-24T09:01:18.419Z · LW · GW

If anyone here might enjoy a dystopian fiction about a world where the formal proofs will work pretty well, I wrote Unnatural abstractions

Comment by Aprillion on Unnatural abstractions · 2024-08-11T13:44:56.277Z · LW · GW

Thank you for the engagement, but "to and fro" is a real expression, not a typo (and I'm keeping it).. it's used slightly unorthodoxly here, but it sounded right to my ear, so it survived editing ¯\_(ツ)_/¯

Comment by Aprillion on Unnatural abstractions · 2024-08-10T22:50:40.153Z · LW · GW

I tried to be use the technobabble in a way that's usefully wrong, so please also let me know if someone gets inspired by this short story.

I am not making predictions about the future, only commenting on the present - if you notice any factual error from that point of view, feel free to speak up, but as far as the doominess spectrum goes, it's supposed to be both too dystopian and too optimistic at the same time.

And if someone wants to fix a typo or a grammo, I'd welcome a pull request (but no commas shall be harmed in the process). 🙏

Comment by Aprillion on Inspired by: Failures in Kindness · 2024-07-28T08:21:39.476Z · LW · GW

Let me practice the volatile kidness here ... as a European, do I understand it correctly that this advice is targeted for US audience? Or am I the only person to whom it sounds a bit fake?

Comment by Aprillion on Scalable oversight as a quantitative rather than qualitative problem · 2024-07-07T11:41:15.051Z · LW · GW

How I personally understand what it could mean to "understand an action:"

Having observed action A1 and having a bunch of (finite state machine-ish) models, each with a list of states that could lead to action A1, more accurate candidate model => more understanding. (and meta-level uncertainty about which model is right => less understanding)

Model 1            Model 2
S11 -> 50% A1      S21 -> 99% A1
    -> 50% A2          ->  1% A2

S21 -> 10% A1      S22 ->  1% A1
    -> 90% A3          -> 99% A2
                   
                   S23 -> 100% A3
Comment by Aprillion on LLM Generality is a Timeline Crux · 2024-06-25T10:06:36.942Z · LW · GW

Thanks for the clarification, I don't share the intuition this will prove harder than other hard software engineering challenges in non-AI areas that weren't solved in months but were solved in years and not decades, but other than "broad baseline is more significant than narrow evidence for me" I don't have anything more concrete to share.

A note until fixed: Chollet also discusses 'unhobbling' -> Aschenbrenner also discusses 'unhobbling'

Comment by Aprillion on LLM Generality is a Timeline Crux · 2024-06-24T15:04:26.744Z · LW · GW

I agree with "Why does this matter" and with the "if ... then ..." structure of the argument.

But I don't see from where do you see such high probability (>5%) of scaffolding not working... I mean whatever will work can be retroactively called "scaffolding", even if it will be in the "one more major breakthrough" category - and I expect they were already accounted for in the unhobblings predictions.

a year ago many expected scaffolds like AutoGPT and BabyAGI to result in effective LLM-based agents

Do we know the base rate how many years after initial marketing hype of a new software technology we should expect "effective" solutions? What is the usual promise:delivery story for SF startups / corporate presentations around VR, metaverse, crypto, sharing drives, sharing appartments, cybersecurity, industrial process automation, self-driving ..? How much hope should we take from the communication so far that the problem is hard to solve - did we expect before AutoGPT and BabyAGI that the first people who will share their first attempt should have been successful?

Comment by Aprillion on LLM Generality is a Timeline Crux · 2024-06-24T13:42:51.412Z · LW · GW

Aschenbrenner argues that we should expect current systems to reach human-level given further scaling

In https://situational-awareness.ai/from-gpt-4-to-agi/#Unhobbling, "scaffolding" is explicitly named as a thing being worked on, so I take it that progress in scaffolding is already included in the estimate. Nothing about that estimate is "just scaling".

And AFAICT neither Chollet nor Knoop made any claims in the sense that "scaffolding outside of LLMs won't be done in the next 2 years" => what am I missing that is the source of hope for longer timelines, please?

Comment by Aprillion on My AI Model Delta Compared To Christiano · 2024-06-16T15:48:11.181Z · LW · GW

It’s a failure of ease of verification: because I don’t know what to pay attention to, I can’t easily notice the ways in which the product is bad.

Is there an opposite of the "failure of ease of verification" that would add up to 100% if you would categorize the whole of reality into 1 of these 2 categories? Say in a simulation, if you attributed every piece of computation into following 2 categories, how much of the world can be "explained by" each category?

  • make sure stuff "works at all and is easy to verify whether it works at all"
  • stuff that works must be "potentially better in ways that are hard to verify"

Examples:

  • when you press the "K" key on your keyboard for 1000 times, it will launch nuclear missiles ~0 times and the K key will "be pressed" ~999 times
  • when your monitor shows you the pixels for a glyph of the letter "K" 1000 times, it will represent the planet Jupyter ~0 times and "there will be" the letter K ~999 times
  • in each page in your stack of books, the character U+0000 is visible ~0 times and the letter A, say ~123 times
  • tupperware was your own purchase and not gifted by a family member? I mean, for which exact feature would you pay how much more?!?
  • you can tell whether a water bottle contains potable water and not sulfuric acid
  • carpet, desk, and chair haven't spontaneously combusted (yet?)
  • the refrigerator doesn't produce any black holes
  • (flip-flops are evil and I don't want to jinx any sinks at this time)


 

Comment by Aprillion on Humming is not a free $100 bill · 2024-06-08T07:36:00.185Z · LW · GW

This leaves humming in search of a use case.

we can still hum to music, hum in (dis)agreement, hum in puzzlement, and hum the "that's interesting" sound ... without a single regard to NO or viruses, just for fun!

Comment by Aprillion on The case for stopping AI safety research · 2024-06-04T10:32:42.464Z · LW · GW

I agree with the premises (except "this is somewhat obvious to most" 🤷).

On the other hand, stopping AI safety research sounds like a proposal to go from option 1 to option 2:

  1. many people develop capabilities, some of them care about safety
  2. many people develop capabilities, none of them care about safety
Comment by Aprillion on Value Claims (In Particular) Are Usually Bullshit · 2024-06-02T14:42:32.586Z · LW · GW

half of the human genome consists of dead transposons

The "dead" part is a value judgement, right? Parts of DNA are not objectively more or less alive.

It can be a claim that some parts of DNA are "not good for you, the mind" ... well, I rather enjoy my color vision and RNA regulation, and I'm sure bacteria enjoy their antibiotic resistance.

Or maybe it's a claim that we already know everything there is to know about the phenomena called "dead transposons", there is nothing more to find out by studying the topic, so we shouldn't finance that area of research.

Is there such a thing as a claim that is not a value claim?

Is "value claims are usually bullshit" a value claim? Does the mental model pick out bullshit more reliably than to label as value claim from what you want to be bullshit? Is there a mental model behind both, thus explaining the correlation? Do I have model close enough to John's so it can be useful to me too? How do I find out?

Comment by Aprillion on Level up your spreadsheeting · 2024-05-26T15:26:27.945Z · LW · GW

Know some fancier formulas like left/mid/right, concatenate, hyperlink

Wait, I thought basic fancier formulas are like =index(.., match(.., .., 0)) 

I guess https://dev.to/aprillion/self-join-in-sheets-sql-python-and-javascript-2km4 might be a nice toy example if someone wants to practice the lessons from the companion piece 😹

Comment by Aprillion on Duct Tape security · 2024-05-12T11:24:08.409Z · LW · GW

It's duct tapes all the way down!

Comment by Aprillion on Duct Tape security · 2024-05-12T11:09:58.781Z · LW · GW

Bad: "Screw #8463 needs to be reinforced."

The best: "Book a service appointment, ask them to replace screw #8463, do a general check-up, and report all findings to the central database for all those statistical analyses that inform recalls and design improvements."

Comment by Aprillion on Dyslucksia · 2024-05-12T10:48:11.453Z · LW · GW

Oh, I should probably mention that my weakness is that I cannot remember the stuff well while reading out loud (especially when I focus on pronunciation for the benefit of listeners)... My workaround is to make pauses - it seems the stuff is in working memory and my subconscious can process it if I give it a short moment, and then I can think about it consciously too, but if I would read out loud a whole page, I would have trouble even trying to summarize the content.

Similarly a common trick how to remember names is to repeat the name out loud.. that doesn't seem to improve recall for me very much, I can hear someone's name a lot of times and repeating it to myself doesn't seem to help. Perhaps seeing it written while hearing it might be better, but not sure... By far the best method is when I want to write them a message and I have to scroll around until I see their picture, after that I seem to remember names just fine 😹

Comment by Aprillion on Dyslucksia · 2024-05-11T08:37:39.355Z · LW · GW

Yeah, I myself subvocalize absolutely everything and I am still horrified when I sometimes try any "fast" reading techniques - those drain all of the enjoyment our of reading for me, as if instead of characters in a story I would imagine them as p-zombies.

For non-fiction, visual-only reading cuts connections to my previous knowledge (as if the text was a wave function entangled to the rest of the universe and by observing every sentence in isolation, I would collapse it to just "one sentence" without further meaning).

I never move my lips or tongue though, I just do the voices (obviously, not just my voice ... imagine reading Dennett without Dennett's delivery, isn't that half of the experience gone? how do other people enjoy reading with most of the beauty missing?).

It's faster then physical speech for me too, usually the same speed as verbal thinking.

Comment by Aprillion on Ironing Out the Squiggles · 2024-05-03T16:16:53.615Z · LW · GW

ah, but booby traps in coding puzzles can be deliberate... one might even say that it can feel "rewarding" when we train ourselves on these "adversarial" examples

the phenomenon of programmers introducing similar bugs in similar situations might be fascinating, but I wouldn't expect a clear answer to the question "Is this true?" without a slightly more precise definitions of:

  • "same" bug
  • same "bug"
  • "hastily" cobbled-together programs
  • hastily "cobbled-together" programs ...
Comment by Aprillion on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-18T14:12:57.483Z · LW · GW

To me as a programmer and not a mathematitian, the distinction doesn't make practical intuitive sense.

If we can create 3 functions f, g, h so that they "do the same thing" like f(a, b, c) == g(a)(b)(c) == average(h(a), h(b), h(c)), it seems to me that cross-entropy can "do the same thing" as some particular objective function that would explicitly mention multiple future tokens.

My intuition is that cross-entropy-powered "local accuracy" can approximate "global accuracy" well enough in practice that I should expect better global reasoning from larger model sizes, faster compute, algorithmic improvements, and better data.

Implications of this intuition might be:

  • myopia is a quantity not a quality, a model can be incentivized to be more or less myopic, but I don't expect it will be proven possible to enforce it "in the limit"
  • instruct training on longer conversations outght to produce "better" overall conversations if the model simulates that it's "in the middle" of a conversation and follow-up questions are better compared to giving a final answer "when close to the end of this kind of conversation"

What nuance should I consider to understand the distinction better?

Comment by Aprillion on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-17T07:35:25.472Z · LW · GW

transformer is only trained explicitly on next token prediction!

I find myself understanding language/multimodal transformer capabilities better when I think about the whole document (up to context length) as a mini-batch for calculating the gradient in transformer (pre-)training, so I imagine it is minimizing the document-global prediction error, it wasn't trained to optimize for just a single-next token accuracy...

Comment by Aprillion on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-17T07:02:35.196Z · LW · GW

Can you help me understand a minor labeling convention that puzzles me? I can see how we can label  from the Z1R process as  in MSP because we observe 11 to get there, but why  is labeled as  after observing either 100 or 00, please?

Comment by Aprillion on Aprillion (Peter Hozák)'s Shortform · 2024-04-10T11:29:02.135Z · LW · GW

Pushing writing ideas to external memory for my less burned out future self:

  • agent foundations need path-dependent notion of rationality

    • economic world of average expected values / amortized big O if f(x) can be negative or you start very high
    • vs min-maxing / worst case / risk-averse scenarios if there is a bottom (death)
    • pareto recipes
  • alignment is a capability

    • they might sound different in the limit, but the difference disappears in practice (even close to the limit? 🤔)
  • in a universe with infinite Everett branches, I was born in the subset that wasn't destroyed by nuclear winter during the cold war - no matter how unlikely it was that humanity didn't destroy itself (they could have done that in most worlds and I wasn't born in such a world, I live in the one where Petrov heard the Geiger counter beep in some particular patter that made him more suspicious or something... something something anthropic principle)

    • similarly, people alive in 100 years will find themselves in a world where AGI didn't destroy the world, no matter what are the odds - as long as there is at least 1 world with non-zero probability (something something Born rule ... only if any decision along the way is a wave function, not if all decisions are classical and the uncertainty comes from subjective ignorance)
    • if you took quantum risks in the past, you now live only in the branches where you are still alive and didn't die (but you could be in pain or whatever)
    • if you personally take a quantum risk now, your future self will find itself only in a subset of the futures, but your loved ones will experience all your possible futures, including the branches where you die ... and you will experience everything until you actually die (something something s-risk vs x-risk)
    • if humanity finds itself in unlikely branches where we didn't kill our collective selves in the past, does that bring any hope for the future?
Comment by Aprillion on Natural Latents: The Concepts · 2024-03-24T13:44:40.468Z · LW · GW

Now, suppose Carol knows the plan and is watching all this unfold. She wants to make predictions about Bob’s picture, and doesn’t want to remember irrelevant details about Alice’s picture. Then it seems intuitively “natural” for Carol to just remember where all the green lines are (i.e. the message M), since that’s “all and only” the information relevant to Bob’s picture.


(Writing before I read the rest of the article): I believe Carol would "naturally" expect that Alice and Bob share more mutual information than she does with Bob herself (even if they weren't "old friends", they both "decided to undertake an art project" while she "wanted to make predictions"), thus she would weight the costs of remembering more than just the green lines against the expected prediction improvement given her time constrains, lost opportunities, ... - I imagine she could complete purple lines on her own, and then remember some "diff" about the most surprising differences...

Also, not all of the green lines would be equally important, so a "natural latent" would be some short messages in "tokens of remembering", not necessarily correspond to the mathematical abstraction encoded by the 2 tokens of English "green lines" => Carol doesn't need to be able to draw the green lines from her memory if that memory was optimized to predict purple lines.

If the purpose was to draw the green lines, I would be happy to call that memory "green lines" (and in that, I would assume to share a prior between me and the reader that I would describe as: "to remember green lines" usually means "to remember steps how to draw similar lines on another paper" ... also, similarity could be judged by other humans ... also, not to be confused with a very different concept "to remember an array of pixel coordinates" that can also be compressed into the words "green lines", but I don't expect people will be confused about the context, so I don't have to say it now, just keep in mind if someone squirts their eyes just-so which would provoke me to clarify).

Comment by Aprillion on It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood · 2023-11-13T18:37:16.820Z · LW · GW

yeah, I got a similar impression that this line of reasoning doesn't add up...

we interpret other humans as feeling something when we see their reactions

we interpret other eucaryotes as feeling something when we see their reactions 🤷

Comment by Aprillion on The Brain as a Universal Learning Machine · 2023-10-25T11:20:08.695Z · LW · GW

(there are a couple of circuit diagrams of the whole brain on the web, but this is the best.  From this site.)

could you update the 404 image, please? (link to the site still works for now, just the image is gone)

Comment by Aprillion on Features and Adversaries in MemoryDT · 2023-10-22T11:21:08.747Z · LW · GW

S5


What is S5, please?

Comment by Aprillion on Are humans misaligned with evolution? · 2023-10-20T07:27:40.139Z · LW · GW

I agree with what you say. My only peeve is that the concept of IGF is presented as a fact from the science of biology, while it's used as a confused mess of 2 very different concepts.

Both talk about evolution, but inclusive finess is a model of how we used to think about evolution before we knew about genes. If we model biological evolution on the genetic level, we don't have any need for additional parameters on the individual organism level, natural selection and the other 3 forces in evolution explain the observed phenomena without a need to talk about invididuals on top of genetic explanations.

Thus the concept of IF is only a good metaphor when talking approximately about optimization processes, not when trying to go into details. I am saying that going with the metaphor too far will result in confusing discussions.

Comment by Aprillion on Are humans misaligned with evolution? · 2023-10-19T14:50:20.472Z · LW · GW

humans don't actually try to maximize their own IGF


Aah, but humans don't have IGF. Humans have https://en.wikipedia.org/wiki/Inclusive_fitness, while genes have allele frequency https://en.wikipedia.org/wiki/Gene-centered_view_of_evolution ..

Inclusive genetic fitness is a non-standard name for the latter view of biology as communicated by Yudkowsky - as a property of genes, not a property of humans.

The fact that bio-robots created by human genes don't internally want to maximize the genes' IGF should be a non-controversial point of view. The human genes successfully make a lot of copies of themselves without any need whatsoever to encode their own goal into the bio-robots.

I don't understand why anyone would talk about IGF as if genes ought to want for the bio-robots to care about IGF, that cannot possibly be the most optimal thing that genes should "want" to do (if I understand examples from Yudkowsky correctly, he doesn't believe that either, he uses this as an obvious example that there is nothing about optimization processes that would favor inner alignment) - genes "care" about genetic success, they don't care about what the bio-robots outght to believe at all 🤷

Comment by Aprillion on Sum-threshold attacks · 2023-09-14T15:26:59.591Z · LW · GW

Some successful 19th century experiments used 0.2°C/minute and 0.002°C/second.

Have you found the actual 19th century paper?

The oldest quote about it that I found is from https://www.abc.net.au/science/articles/2010/12/07/3085614.htm

Or perhaps the story began with E.M. Scripture in 1897, who wrote the book, The New Psychology. He cited earlier German research: "…a live frog can actually be boiled without a movement if the water is heated slowly enough; in one experiment the temperature was raised at the rate of 0.002°C per second, and the frog was found dead at the end of two hours without having moved."

Well, the time of two hours works out to a temperature rise of 18°C. And, the numbers don't seem right.

First, if the water boiled, that means a final temperature of 100°C. In that case, the frog would have to be put into water at 82°C (18°C lower).

Surely, the frog would have died immediately in water at 82°C. 
Comment by Aprillion on Sum-threshold attacks · 2023-09-13T13:02:23.429Z · LW · GW

I'm not sure what to call this sort of thing. Is there a preexisting name?

sounds like https://en.wikipedia.org/wiki/Emergence to me 🤔 (not 100% overlap and also not the most useful concept, but very similar shaky pointer in concept space between what is described here and what has been observed as a phenomena called Emergence)

Comment by Aprillion on Sum-threshold attacks · 2023-09-13T12:55:02.580Z · LW · GW

Thanks to Gaurav Sett for reminding me of the boiling frog.

I would like to see some mention that this is a pop culture reference / urban myth, not  something actual frogs might do.

To quote https://en.wikipedia.org/wiki/Boiling_frog, "the premise is false".

Comment by Aprillion on ACX Meetups Everywhere List · 2023-08-27T12:34:52.408Z · LW · GW

PSA: This is the old page pointing to the 2022 meetup month events, chances are you got here in year 2023 (at the time of writing this comment) while there was a bug on the homepage of lesswrong.com with a map and popup link pointing here...

https://www.lesswrong.com/posts/ynpC7oXhXxGPNuCgH/acx-meetups-everywhere-2023-times-and-places seems to be the right one 🤞

Comment by Aprillion on $500 Bounty/Prize Problem: Channel Capacity Using "Insensitive" Functions · 2023-05-27T12:25:33.669Z · LW · GW

sampled uniformly and independently

 

🤔 I don't believe this definition fits the "apple" example - uniform samples from a concept space of "apple or not apple" would NEVER™ contain any positive example (almost everything is "not apple")... or what assumption am I missing that would make the relative target volume more than ~zero (for high n)?

Bob will observe a highly optimized set of Y, carefully selected by Alice, so the corresponding inputs will be Vastly correlated and interdependent at least for the positive examples (centeroid first, dynamically selected for error-correction later 🤷‍♀️), not at all selected by Nature, right?

Comment by Aprillion on Coordination by common knowledge to prevent uncontrollable AI · 2023-05-15T07:52:09.887Z · LW · GW

A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up.

Ah, the condition for the reality of money is much weaker though - you only have to believe that you will be able to find "someone" who believes they can find someone for whom money will be worth something, no need to involve "everyone" in one's reasoning.

Inflation is much more complicated of course, but in essence, you only have to believe that other people believe that money is losing value and will buy the same thing for higher price from you to be incentivized to increase prices, you don't have to believe that you yourself will be able to buy less from your suppliers, increasing the price for higher profits is a totally valid reason for doing so.

This is also a kind of "coordination by common knowledge", but the parties involved don't have to share the same "knowledge" per se - consumers might believe "prices are higher because of inflation" while retailers might belive "we can make prices higher because people believe in inflation"...

Not sure myself whether search for coordination by common knowledge incentivizes deceptive alignment "by default" (having an exponentially larger basin) or if some reachable policy can incentivize true aligmnent 🤷

Comment by Aprillion on AI #1: Sydney and Bing · 2023-03-21T08:36:31.244Z · LW · GW

yes, it takes millions to advance, but companies are pouring BILLIONS into this and number 3 can earn its own money and create its own companies/DAOs/some new networks of cooperation if it wanted without humans realizing ... have you seen any GDP per year charts whatsoever, why would you think we are anywhere close to saturation of money? have you seen any emergent capabilities from LLMs in the last year, why do you think we are anywhere close to saturation of capabilities per million of dollars? Alpaca-like improvemnts are somehow one-off miracle and things are not getting cheaper and better and more efficient in the future somehow?

it could totally happen, but what I don't see is why are you so sure it will happen by default, are you extrapolating some trend from non-public data or just overly optimistic that 1+1 from previous trends is less than 2 in the future, totally unlike the compount effects in AI advancement in the last year?

Comment by Aprillion on AI #1: Sydney and Bing · 2023-03-20T12:02:39.065Z · LW · GW

Thanks for sharing your point of view. I tried to give myself a few days, but I'm aftraid I still don't understand where you see the magic barrier for the transition from 3 to 4 to happen outside of the realm of human control.

Comment by Aprillion on AI #1: Sydney and Bing · 2023-03-14T14:52:53.027Z · LW · GW

are you thinking about sub-human-level of AGIs? the standard definition of AGI involves it being it better than most humans in most of the tasks humans can do

the first human hackers were not trained on "take over my data center" either, but humans can behave out of distribution and so will the AGI that is better than humans at behaving out of distribution

the argument about AIs that generalize to many tasks but are not "actually dangerous yet" is about speeding up creation of the actually dangerous AGIs, and it's the speeding up that is dangerous, not that AI Safety researchers believe that those "weak AGIs" created from large LLMs would actually be capable of killing everyone immediatelly on their own

if you believe "weak AGIs" won't speed creation of "dangerous AGIs", can you spell out why, please?

Comment by Aprillion on Why Not Just... Build Weak AI Tools For AI Alignment Research? · 2023-03-05T12:18:02.833Z · LW · GW

first-hand idea of what kinds of things even produce progress

I'd rather share second-hand ideas about how progress looks like based on a write up from someone with deep knowledge from multiple research directions than to spend next 5 years forming my own idiosyncratic first-hand empathic intuitions.

It's not like Agent Foundations are 3cm / 5db / 7 dimensions of more progress than Circuits, but if there is no standardized quantity of progress, then why ought we believe that making 1000 different tools by 1000 people now is worse than those people doing research first before attempting to help with non-research skills?

Comment by Aprillion on Aspiring AI safety researchers should ~argmax over AGI timelines · 2023-03-05T09:25:20.208Z · LW · GW

if everyone followed the argmax approach I laid out here. Are there any ways they might do something you think is predictably wrong?

 

While teamwork seems to be assumed in the article, I believe it's worth spelling out explicitly that argmaxing for a plan with highest marginal impact might mean joining and/or building a team where the team effort will make the most impact, not optimizing for highest individual contribution.

Spending time to explain why a previous research failed might help 100 other groups to learn from our mistake, so it could be more impactful than pursuing the next shiny idea.

We don't want to optimize for the naive feeling of individual marginal impact, we want to keep in mind the actual goal is to make an Aligned AGI.

Comment by Aprillion on AI alignment researchers don't (seem to) stack · 2023-02-23T16:32:42.489Z · LW · GW

I agree with the explicitly presented evidence and reasoning steps, but one implied prior/assumption seems to me so obscenely wrong (compared to my understanding about social reality) that I have to explain myself before making a recommendation. The following statement:

“stacking” means something like, quadrupling the size of your team of highly skilled alignment researchers lets you finish the job in ~1/4 of the time

implies a possibility that approximately neg-linear correlation between number of people and time could exist (in multidisciplinary software project management in particular and/or in general for most collective human endeavors). The model of Nate that I have in my mind believes that reasonable readers ought to believe that:

  • as a prior, it's reasonable to expect more people will finish a complex task in less time than fewer people would, unless we have explicit reasons to predict otherwise
  • Brooks's law is a funny way to describe delayed projects with hindsight, not a powerful predictor based on literally every single software project humankind ever pursued

I am making a claim about the social norm that it's socially OK to assume other people can believe in linear scalability, not a belief whether other people actually believe that 4x the people will actually finish in 1/4 time by default.

Individually, we are well calibrated to throw a TypeError at the cliche counterexamples to the linear scalability assumption like "a pregnant woman delivers one baby in 9 months, how many ...".

And professional managers tend to have an accurate model of applicability of this assumption, individually they all know how to create the kind of work environment that may open the possibility for time improvements (blindly quadrupling the size of your team can bring the project to a halt or even reverse the original objective, more usually it will increase the expected time because you need to lower other risks, and you have to work very hard for a hope of 50% decrease in time - they are paid to believe in the correct model of scalability, even in cases when they are incentivized to say more optimistic professions of belief in public).

Let's say 1000 people can build a nuclear power plant within some time unit. Literally no one will believe that one person will build it a thousand times slower or that a million people will build it a thousand times faster.

I think it should not be socially acceptable to say things that imply that other people can assume that others might believe in linear scalability for unprecedented large complex software projects. No one should believe that only one person can build Aligned AGI or that a million people can build it thousand times faster than a 1000 people. Einstein and Newton were not working "together", even if one needed the other to make any progress whatsoever - the nonlinearity of "solving gravity" is so qualitatively obvious, no one would even think about it in terms of doubling team size or halving time. That should be the default, a TypeError law of scalability.

If there is no linear scalability by default, Alignment is not an exception to other scalability laws. Building unaligned AGI, designing faster GPUs, physically constructing server farms, or building web apps ... none of those are linearly scalable, it's always hard management work to make a collective human task faster when adding people to a project.

 

Why is this a crux for me? I believe the incorrect assumption leads to rationally-wrong emotions in situations like these:

Also, I've tried a few different ways of getting researchers to "stack" (i.e., of getting multiple people capable of leading research, all leading research in the same direction, in a way that significantly shortens the amount of serial time required), and have failed at this.

Let me talk to you (the centeroid of my models of various AI researchers, but not any one person in particular)You are a good AI researcher and statistically speaking, you should not expect you to also be an equally good project manager. You understand maths and statistically speaking, you should not expect you to also be equally good at social skills needed to coordinate groups of people. Failling at a lot of initial attempts to coordinate teams should be the default expectation - not one or two attempts and then you will nail it. You should expect to fail more ofthen than the people who are getting the best money in the world for aligning groups of people towards a common goal. If those people who made themselves successful in management initially failed 10 times before they became billionaires, you should expect to fail more times than that.

 

Recommendation

You can either dilute your time by learning both technical and social / management skills or you can find other experts to help you and delegate the coordination task. You cannot solve Alignment alone, you cannot solve Alignment without learning, and you cannot learn more than one skill at a time.

The surviving worlds look like 1000 independent alignment ideas, each pursued by 100 different small teams. Some of the teams figured out how to share knowledge between some of the other teams and connect one or two ideas and merge teams iff they figure out explicit steps how to shorten time by merging teams.

We don't need to "stack", we need to increase the odds of a positive black swan.

Yudkowsky, Christiano, and the person who has the skills to start figuring out the missing piece to unify their ideas are at least 10,000 different people.

Comment by Aprillion on AI alignment researchers don't (seem to) stack · 2023-02-23T12:39:45.842Z · LW · GW

Building a tunnel from 2 sides is the same thing even if those 2 sides don't see each other initially. I believe some, but not all, approaches will end up seeing each other, that it's not a bad sign if we are not there yet.

Since we don't seem to have time to build 2 "tunnels" (independent solutions to alignment), a bad sign would be if we could prove all of the approaches are incompatible with each other, which I hope is not the case.

Comment by Aprillion on AGI in sight: our look at the game board · 2023-02-19T15:32:58.015Z · LW · GW

Staying in meta-level, if AGI weren't going to be created "by the ML field", would you still believe problems on your list cannot possibly be solved within 6-ish months if companies would throw $1b at each of those problems?

Even if competing groups of humans augmented by AI capabilities existing "soon" were trying to solve those problems with combined tools from inside and outside ML field, the foreseeable optimization pressure is not enough for those foreseeable collective agents to solve those known-known and known-unknown problems that you can imagine?

Comment by Aprillion on AGI in sight: our look at the game board · 2023-02-19T15:08:18.367Z · LW · GW

No idea about original reasons, but I can imagine a projected chain of reasoning:

  • there is a finite number of conjunctive obstacles
  • if a single person can only think of a subset of obstacles, they will try to solve those obstacles first, making slow(-ish) progress as they discover more obstacles over time
  • if a group shares their lists, each individual will become aware of more obstacles and will be able to solve more of them at once, potentially making faster progress
Comment by Aprillion on The Usefulness Paradigm · 2022-12-26T21:49:16.845Z · LW · GW

To be continued in the form of a science fiction story Unnatural Abstractions.

Comment by Aprillion on The Usefulness Paradigm · 2022-12-26T21:44:11.069Z · LW · GW

See the Humor tag ¯\_(ツ)_/¯

Comment by Aprillion on Verification Is Not Easier Than Generation In General · 2022-12-25T14:35:48.731Z · LW · GW

TypeError: Comparing different "solutions".

How do I know that I generated a program that halts?

a) I can prove to myself that my program halts => the solution consists of both the program and the proof => the verification problem is a sub-problem of the generation problem.

b) I followed a trusted process that is guaranteed to produce valid solutions => the solution consists of both the program and the history that generated the proof of the process => if the history is not shared between the 2 problems, then you redefined "verification problem" to include generation of all of the necessary history, and that seems to me like a particularly useless point of view (for the discussion of P vs NP, not useless in general).

In the latter point of view, you could say:

Predicate: given a set of numbers, is the first the sum of the latter 2?

Generation problem: provide an example true solution: "30 and prime factors of 221"

Verification problem: verify that 30 is the sum of prime factors of 221

WTF does that have to say about P vs NP? ¯\_(ツ)_

Comment by Aprillion on [deleted post] 2022-11-30T11:18:23.845Z

I mostly fixed the page by removing quotes from links (edited as markdown in VS Code, 42 links were like []("...") and 64 quotes were double-escaped \") ... feel free to double check (I also sent feedback to moderators, maybe they want to check for similar problems on other pages on DB level)

Comment by Aprillion on Why square errors? · 2022-11-27T09:19:37.799Z · LW · GW

see your β there? you assume that people remember to "control for bias" before they apply tools that assume Gaussian error

that is indeed what I should have remembered about the implications of "we can often assume approximately normal distribution" from my statistics course ~15 years ago, but then I saw people complaining about sensitivity to outliers in 1 direction and I failed to make a connection until I dug deeper into my reasoning

Comment by Aprillion on Why square errors? · 2022-11-26T14:38:24.207Z · LW · GW

[EDITED]: good point, no idea what they meant with "uniform" distribution, the realization for me was about the connection that I can often assume errors are normally distributed, thus L2 is often the obvious choice