Posts

Epistemic Motif of Abstract-Concrete Cycles & Domain Expansion 2023-10-10T03:28:43.356Z
Least-problematic Resource for learning RL? 2023-07-18T16:30:48.535Z
Gearing Up for Long Timelines in a Hard World 2023-07-14T06:11:05.153Z
Dalcy's Shortform 2022-12-14T18:45:28.852Z

Comments

Comment by Dalcy (Darcy) on Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer · 2024-04-23T20:33:06.061Z · LW · GW

re: second diagram in the "Bayesian Belief States For A Hidden Markov Model" section, shouldn't the transition probabilities for the top left model be 85/7.5/7.5 instead of 90/5/5?

Comment by Dalcy (Darcy) on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-17T02:19:49.662Z · LW · GW

What is the shape predicted by compmech under a generation setting, and do you expect it instead of the fractal shape to show up under, say, a GAN loss? If so, and if their shapes are sufficiently distinct from the controls that are run to make sure the fractals aren't just a visualization artifact, that would be further evidence in favor of the applicability of compmech in this setup.

Comment by Dalcy (Darcy) on What does Eliezer Yudkowsky think of the meaning of life now? · 2024-04-11T19:02:14.534Z · LW · GW

If after all that it still sounds completely wack, check the date. Anything from before like 2003 or so is me as a kid, where "kid" is defined as "didn't find out about heuristics and biases yet", and sure at that age I was young enough to proclaim AI timelines or whatevs.

https://twitter.com/ESYudkowsky/status/1650180666951352320

Comment by Dalcy (Darcy) on "Fractal Strategy" workshop report · 2024-04-07T00:58:27.993Z · LW · GW

btw there's no input box for the "How much would you pay for each of these?" question.

Comment by Dalcy (Darcy) on LessWrong's (first) album: I Have Been A Good Bing · 2024-04-04T15:16:54.178Z · LW · GW

although I've practiced opening those emotional channels a bit, so this is a less uncommon experience for me than for most

i'm curious, what did you do to open those emotional channels?

Comment by Dalcy (Darcy) on Natural Abstractions: Key claims, Theorems, and Critiques · 2024-03-16T16:55:24.866Z · LW · GW

Out of the set of all possible variables one might use to describe a system, most of them cannot be used on their own to reliably predict forward time evolution because they depend on the many other variables in a non-Markovian way. But hydro variables have closed equations of motion, which can be deterministic or stochastic but at the least are Markovian.

This idea sounds very similar to this—it definitely seems extendable beyond the context of physics:

We argue that they are both; more specifically, that the set of macrostates forms the unique maximal partition of phase space which 1) is consistent with our observations (a subjective fact about our ability to observe the system) and 2) obeys a Markov process (an objective fact about the system's dynamics).

Comment by Dalcy (Darcy) on MIRI 2024 Mission and Strategy Update · 2024-01-06T22:03:15.761Z · LW · GW

I don't see any feasible way that gene editing or 'mind uploading' could work within the next few decades. Gene editing for intelligence seems unfeasible because human intelligence is a massively polygenic trait, influenced by thousands to tens of thousands of quantitative trait loci.

I think the authors in the post referenced above agree with this premise and still consider human intelligence augmentation via polygenic editing to be feasible within the next few decades! I think their technical claims hold up, so personally I'd be very excited to see MIRI pivot towards supporting their general direction. I'd be interested to hear your opinions on their post.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-12-24T02:33:05.425Z · LW · GW

I am curious as to how often the asymptotic results proven using features of the problem that seem basically practically-irrelevant become relevant in practice.

Like, I understand that there are many asymptotic results (e.g., free energy principle in SLT) that are useful in practice, but i feel like there's something sus about similar results from information theory or complexity theory where the way in which they prove certain bounds (or inclusion relationship, for complexity theory) seem totally detached from practicality?

  • joint source coding theorem is often stated as why we can consider the problem of compression and redundancy separately, but when you actually look at the proof it only talks about possibility (which is proven in terms of insanely long codes) and thus not-at-all trivial that this equivalence is something that holds in the context of practical code-engineering
  • complexity theory talks about stuff like quantifying some property over all possible boolean circuits of a given size which seems to me considering a feature of the problem just so utterly irrelevant to real programs that I'm suspicious it can say meaningful things about stuff we see in practice
    • as an aside, does the P vs NP distinction even matter in practice? we just ... seem to have very good approximation to NP problems by algorithms that take into account the structures specific to the problem and domains where we want things to be fast; and as long as complexity methods doesn't take into account those fine structures that are specific to a problem, i don't see how it would characterize such well-approximated problems using complexity classes.
    • Wigderson's book had a short section on average complexity which I hoped would be this kind of a result, and I'm unimpressed (the problem doesn't sound easier - now how do you specify the natural distribution??)
Comment by Dalcy (Darcy) on Self-Embedded Agent's Shortform · 2023-10-21T07:34:23.824Z · LW · GW

Found an example in the wild with Mutual information! These equivalent definitions of Mutual Information undergo concept splintering as you go beyond just 2 variables:

    • interpretation: relative entropy b/w joint and product of margin
    • interpretation: joint entropy minus all unshared info
      • ... become bound information

... each with different properties (eg co-information is a bit too sensitive because just a single pair being independent reduces the whole thing to 0, total-correlation seems to overcount a bit, etc) and so with different uses (eg bound information is interesting for time-series).

Comment by Dalcy (Darcy) on [Cross-post]The theoretical computational limit of the Solar System is 1.47x10^49 bits per second. · 2023-10-17T16:22:36.381Z · LW · GW

The limit's probably much higher with sub-Landauer thermodynamic efficiency.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-09-23T00:02:56.351Z · LW · GW

'Symmetry' implies 'redundant coordinate' implies 'cyclic coordinates in your Lagrangian / Hamiltonian' implies 'conservation of conjugate momentum'

And because the action principle (where the true system trajectory extremizes your action, i.e. integral of Lagrangian) works in various dynamical systems, the above argument works in non-physical dynamical systems.

Thus conserved quantities usually exist in a given dynamical system.

mmm, but why does the action principle hold in such a wide variety of systems though? (like how you get entropy by postulating something to be maximized in an equilibrium setting)

Comment by Dalcy (Darcy) on 6 non-obvious mental health issues specific to AI safety · 2023-08-20T03:24:42.376Z · LW · GW

Bella is meeting a psychotherapist, but they treat her fear as something irrational. This doesn't help, and only makes Bella more anxious. She feels like even her therapist doesn't understand her.

How would one find a therapist in their local area who's aware of what's going on in the EA/rat circles such that they wouldn't find statements about, say, x-risks as being schizophrenic/paranoid?

Comment by Dalcy (Darcy) on Feedbackloop-first Rationality · 2023-08-08T02:23:45.856Z · LW · GW

I am very interested in this, especially in the context of alignment research and solving not-yet-understood problems in general. Since I have no strong commitments this month (and was going to do something similar to this anyways), I will try this every day for the next two weeks and report back on how it goes (writing this comment as a commitment mechanism!)

Have a large group of people attempt to practice problems from each domain, randomizing the order that they each tackle the problems in. (The ideal version of this takes a few months)

...

As part of each problem, they do meta-reflection on "how to think better", aiming specifically to extract general insights and intuitions. They check what processes seemed to actually lead to the answer, even when they switch to a new domain they haven't studied before.

Within this upper-level feedback loop (at the scale of whole problems, taking hours or days), I'm guessing a lower-level loop would involve something like cognitive strategy tuning to get real-time feedback as you're solving the problems?

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-08-06T23:33:00.144Z · LW · GW

I had something like locality in mind when writing this shortform, the context being: [I'm in my room -> I notice itch -> I realize there's a mosquito somewhere in my room -> I deliberately pursue and kill the mosquito that I wouldn't have known existed without the itch]

But, again, this probably wouldn't amount to much selection pressure, partially due to the fact that the vast majority of mosquito population exists in places where such locality doesn't hold i.e. in an open environment.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-08-05T13:43:50.272Z · LW · GW

Makes sense. I think we're using the terms differently in scope. By "DL paradigm" I meant to encompass the kind of stuff you mentioned (RL-directing-SS-target (active learning), online learning, different architecture, etc) because they really seemed like "engineering challenges" to me (despite them covering a broad space of algorithms) in the sense that capabilities researchers already seem to be working on & scaling them without facing any apparent blockers to further progress, i.e. in need of any "fundamental breakthroughs"—by which I was pointing more at paradigm shifts away from DL like, idk, symbolic learning.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-08-05T13:31:49.056Z · LW · GW

But the evolutionary timescale at which mosquitos can adapt to avoid detection must be faster than that of humans adapting to find mosquitos itchy! Or so I thought - my current boring guess is that (1) mechanisms for the human body to detect foreign particles are fairly "broad", (2) the required adaptation from the mosquitos to evade them are not-way-too-simple, and (3) we just haven't put enough selection pressure to make such change happen.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-08-03T10:34:27.026Z · LW · GW

To me, the fact that the human brain basically implements SSL+RL is very very strong evidence that the current DL paradigm (with a bit of "engineering" effort, but nothing like fundamental breakthroughs) will kinda just keep scaling until we reach point-of-no-return. Does this broadly look correct to people here? Would really appreciate other perspectives.

Comment by Dalcy (Darcy) on Big picture of phasic dopamine · 2023-08-03T10:18:04.500Z · LW · GW

What are the errors in this essay? As I'm reading through the Brain-like AGI sequence I keep seeing this post being referenced (but this post says I should instead read the sequence!)

I would really like to have a single reference post of yours that contains the core ideas about phasic dopamine rather than the reference being the sequence posts (which is heavily dependent on a bunch of previous posts; also Post 5 and 6 feels more high-level than this one?)

Comment by Dalcy (Darcy) on Least-problematic Resource for learning RL? · 2023-08-01T21:25:28.885Z · LW · GW

Answering my own question, review / survey articles like https://arxiv.org/abs/1811.12560 seem like a pretty good intro.

Comment by Dalcy (Darcy) on DragonGod's Shortform · 2023-07-25T20:38:51.809Z · LW · GW

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-07-20T18:00:09.697Z · LW · GW

Mildly surprised how some verbs/connectives barely play any role in conversations, even in technical ones. I just tried directed babbling with someone, and (I think?) I learned quite a lot about Israel-Pakistan relations with almost no stress coming from eg needing to make my sentences grammatically correct.

Example of (a small part of) my attempt to summarize my understanding of how Jews migrated in/out of Jerusalem over the course of history:

They here *hand gesture on air*, enslaved out, they back, kicked out, and boom, they everywhere.

(audience nods, given common knowledge re: gestures, meaning of "they," etc)

Comment by Dalcy (Darcy) on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-13T12:48:23.555Z · LW · GW

Related - "There are always many ways through the garden of forking paths, and something needs only one path to happen."

Comment by Dalcy (Darcy) on OpenAI Launches Superalignment Taskforce · 2023-07-12T05:52:03.886Z · LW · GW

Also, davidad's Open Agency Architecture is a very concrete example of what such a non-antisocial pivotal act that respects the preferences of various human representatives would look like (i.e. a pivotal process).

Perhaps not realistically feasible in its current form, yes, but davidad's proposal suggests that there might exist such a process, and we just have to keep searching for it.

Comment by Dalcy (Darcy) on OpenAI Launches Superalignment Taskforce · 2023-07-12T05:39:38.381Z · LW · GW

Agree that current AI paradigm can be used to make significant progress in alignment research if used correctly. I'm thinking something like Cyborgism; leaving most of the "agency" to humans and leveraging prosaic models to boost researcher productivity which, being highly specialized in scope, wouldn't involve dangerous consequentialist cognition in the trained systems.

However, the problem is that this isn't what OpenAI is doing - iiuc, they're planning to build a full-on automated researcher that does alignment research end-to-end, for which orthonormal was pointing out that this is dangerous due to their cognition involving dangerous stuff.

So, leaving aside the problems with other alternatives like pivotal act for now, it doesn't seem like your points are necessarily inconsistent with orthonormal's view that OpenAI's plans (at least in its current form) seem dangerous.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-07-11T03:08:04.002Z · LW · GW

Complaint with Pugh's real analysis textbook: He doesn't even define the limit of a function properly?!

It's implicitly defined together with the definition of continuity where , but in Chapter 3 when defining differentiability he implicitly switches the condition to  without even mentioning it (nor the requirement that  now needs to be an accumulation point!) While Pugh has its own benefits, coming from Terry Tao's analysis textbook background, this is absurd!

(though to be fair Terry Tao has the exact same issue in Book 2, where his definition of function continuity via limit in metric space precedes that of defining limit in general ... the only redeeming factor is that it's defined rigorously in Book 1, in the limited context of )

*sigh* I guess we're still pretty far from reaching the Pareto Frontier of textbook quality, at least in real analysis.

... Speaking of Pareto Frontiers, would anyone say there is such a textbook that is close to that frontier, at least in a different subject? Would love to read one of those.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-07-06T19:34:32.290Z · LW · GW

Any advice on reducing neck and shoulder pain while studying? For me that's my biggest blocker to being able to focus longer (especially for math, where I have to look down at my notes/book for a long period of time). I'm considering stuff like getting a standing desk or doing regular back/shoulder exercises. Would like to hear what everyone else's setups are.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-07-01T16:35:20.097Z · LW · GW

Update: huh, nonstandard analysis is really cool. Not only are things much more intuitive (by using infinitesimals from hyperreals instead of using epsilon-delta formulation for everything), by the transfer principle all first order statements are equivalent between standard and nonstandard analysis!

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-23T17:51:40.072Z · LW · GW

Man, deviation arguments are so cool:

  • what are macrostates? Variables which are required to make your thermodynamics theory work! If they don't, add more macrostates!
  • nonequilibrium? Define it as systems that don't admit a thermodynamic description!
  • inductive biases? Define it as the amount of correction needed for a system to obey Bayesian updating, i.e. correction terms in the exponent of the Gibbs measure!
  • coarse graining? Define the coarse-grained variables to keep the dynamics as close as possible to that of the micro-dynamics!
  • or in a similar spirit - does your biological system deviate from expected utility theory? Well, there's discovery (and money) to be made!

It's easy to get confused and think the circularity is a problem ("how can you define thermodynamics in terms of equilibriums, when equilibriums are defined using thermodynamics?"), but it's all about carving nature at the right joints—and a sign that you made the right carving is that the amount of corrections needed to be applied aren't too numerous, and they all seem "natural" (and of course, all of this while letting you make nontrivial predictions. that's what matters at the end of the day).

Then, it's often the case that those corrections also turn out to be meaningful and natural quantities of interest.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-23T09:07:34.787Z · LW · GW

I used to try out near-random search on ideaspace, where I made a quick app that spat out 3~5 random words from a dictionary of interesting words/concepts that I curated, and I spent 5 minutes every day thinking very hard on whether anything interesting came out of those combinations.

Of course I knew random search on exponential space was futile, but I got a couple cool invention ideas (most of which turned out to already exist), like:

  • infinite indoor rockclimbing: attach rocks to a vertical treadmill, and now you have an infinite indoor rock climbing wall (which is also safe from falling)! maybe add some fancy mechanism to add variations to the rocks + a VR headgear, I guess.
  • clever crypto mechanism design (in the spirit of CO2 Coin) to incentivize crowdsourcing of age-reduction molecule design animal trials from the public. (I know what you're thinking)

You can probably do this smarter now if you wanted, with eg better GPT models.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-22T05:01:51.822Z · LW · GW

algebraic geometry in the infinite dimensions (algebraic geometric ... functional analysis?!) surely sounds like a challenge, damn.

Comment by Dalcy (Darcy) on "textbooks are all you need" · 2023-06-21T22:59:05.823Z · LW · GW

gwern's take on a similar paper (Tinystories), in case anyone was wondering. Notable part for me:

...

Now, what would be really interesting is if they could go beyond the in-domain tasks and show something like meta-learning. That's supposed to be driven by the distribution and variety of Internet-scale datasets, and thus should not be elicited by densely sampling a domain like this.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-17T11:19:04.456Z · LW · GW

I wonder if the following is possible to study textbooks more efficiently using LLMs:

  • Feed the entire textbook to the LLM and produce a list of summaries that increases in granularity and length, covering all the material in the textbook just at a different depth (eg proofs omitted, further elaboration on high-level perspectives, etc)
  • The student starts from the highest-level summary, and gradually moves to the more granular materials.

When I study textbooks, I spend a significant amount of time improving my mental autocompletion, like being able to familiarize myself with the terminologies, which words or proof-style usually come in which context, etc. Doing this seems to significantly improve my ability to read eg long proofs, since I can ignore all the pesky details (which I can trust my mental autocompletion to later fill in the details if needed) and allocate my effort in getting a high-level view of the proof.

Textbooks don't really admit this style of learning, because the students don't have prior knowledge of all the concept-dependencies of a new subject they're learning, and thus are forced to start at the lowest-level and make their way up to the high-level perspective.

Perhaps LLMs will let us reverse this direction, instead going from the highest to the lowest.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-16T11:41:03.131Z · LW · GW

What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-11T13:31:38.779Z · LW · GW

There's still some pressure, though. If the bites were permanently not itchy, then I may have not noticed that the mosquitos were in my room in the first place, and consequently would less likely pursue them directly. I guess that's just not enough.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-11T12:34:30.409Z · LW · GW

Why haven't mosquitos evolved to be less itchy? Is there just not enough selection pressure posed by humans yet? (yes probably) Or are they evolving towards that direction? (they of course already evolved towards being less itchy while biting, but not enough to make that lack-of-itch permanent)

this is a request for help i've been trying and failing to catch this one for god knows how long plz halp

tbh would be somewhat content coexisting with them (at the level of houseflies) as long as they evolved the itch and high-pitch noise away, modulo disease risk considerations.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-10T07:51:56.781Z · LW · GW

Having lived ~19 years, I can distinctly remember around 5~6 times when I explicitly noticed myself experiencing totally new qualia with my inner monologue going “oh wow! I didn't know this dimension of qualia was a thing.” examples:

  • hard-to-explain sense that my mind is expanding horizontally with fractal cube-like structures (think bismuth) forming around it and my subjective experience gliding along its surface which lasted for ~5 minutes after taking zolpidem for the first time to sleep (2 days ago)
  • getting drunk for the first time (half a year ago)
  • feeling absolutely euphoric after having a cool math insight (a year ago)
  • ...

Reminds me of myself around a decade ago, completely incapable of understanding why my uncle smoked, being "huh? The smoke isn't even sweet, why would you want to do that?" Now that I have [addiction-to-X] as a clear dimension of qualia/experience solidified in myself, I can better model their subjective experiences although I've never smoked myself. Reminds me of the SSC classic.

Also one observation is that it feels like the rate at which I acquire these is getting faster, probably because of increase in self-awareness + increased option space as I reach adulthood (like being able to drink).

Anyways, I think it’s really cool, and can’t wait for more.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-06-09T22:35:54.879Z · LW · GW

i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost had a literal breakdown trying to install Minecraft recently (and eventually failed). God.

Comment by Dalcy (Darcy) on Portia's Shortform · 2023-06-01T11:42:56.317Z · LW · GW

This shortform just reminded me to buy a CO2 sensor and, holy shit, turns out my room is at ~1500ppm.

While it's too soon to say for sure, this may actually be the underlying reason for a bunch of problems I noticed myself having primarily in my room (insomnia, inability to focus or read, high irritability, etc).

Although I always suspected bad air quality, it really is something to actually see the number with your own eyes, wow. Thank you so, so much for posting about this!!

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-05-30T06:26:53.438Z · LW · GW

One of the rare insightful lessons from high school: Don't set your AC to the minimum temperature even if it's really hot, just set it to where you want it to be.

It's not like the air released gets colder with lower target temperature, because most ACs (according to my teacher, I haven't checked lol) are just a simple control system that turns itself on/off around the target temperature, meaning the time it takes to reach a certain temperature X is independent of the target temperature (as long it's lower than X)

... which is embarrassingly obvious in hindsight.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-05-30T03:29:23.523Z · LW · GW

God, I wish real analysis was at least half as elegant as any other math subject — way too much pathological examples that I can't care less about. I've heard some good things about constructivism though, hopefully analysis is done better there.

Comment by Dalcy (Darcy) on Davidad's Bold Plan for Alignment: An In-Depth Explanation · 2023-05-12T02:59:57.679Z · LW · GW

I think the point of having an explicit human-legible world model / simulation is to make desideratas formally verifiable, which I don't think would be possible with a blackbox system (like LLM w/ wrappers).

Comment by Dalcy (Darcy) on Why do we care about agency for alignment? · 2023-05-11T02:40:04.349Z · LW · GW

Also important to note:

The phenomenon you call by names like "goals" or "agency" is one possible shadow of the deep structure of optimization - roughly, preimaging outcomes onto choices by reversing a complicated transformation.

 - @esyudkowsky

i.e. if we were to pin-down something we actually care about, that'd be "a system exhibiting consequentialism", because those are the kind of systems that will end up shaping our lightcone and more. Consequentialism is convergent in an optimization process, i.e. the "deep structure of optimization". Terms like "goals" or "agency" are shadows of consequentialism, finite approximations of this deep structure.

And by the virtue of being finite approximations (eg they're embedded), these "agents" have a bunch of convergent properties that makes it easy for us to reason about the "deep structure" themselves, like eg modularity, having a world-model, etc (check johnswentworth's comment).

Edit: Also the following quote

it is relatively unimportant to understand agency for its own sake or intelligence for its own sake or optimization for its own sake. Instead we should remember that these are frames for understanding these patterns that exert influence over the future

Comment by Dalcy (Darcy) on Cognitive Emulation: A Naive AI Safety Proposal · 2023-04-14T23:46:00.755Z · LW · GW

re: reducing magic and putting bounds, I'm reminded of Cleo Nardo's Hodge Podge Alignment proposal.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-04-09T13:24:39.750Z · LW · GW

moments of microscopic fun encountered while studying/researching:

  • Quantum mechanics call vector space & its dual bra/ket because ... bra-c-ket. What can I say? I like it - But where did the letter 'c' go, Dirac?
  • Defining cauchy sequences and limits in real analysis: it's really cool how you "bootstrap" the definition of Cauchy sequences / limit on real using the definition of Cauchy sequences / limit on rationals. basically:
    • (1) define Cauchy sequence on rationals
    • (2) use it to define limit (on rationals) using rational-Cauchy
    • (3) use it to define reals
    • (4) use it to define Cauchy sequence on reals
    • (5) show it's consistent with Cauchy sequence on rationals in both directions
      • a. rationals are embedded in reals hence the real-Cauchy definition subsumes rational-Cauchy definition
      • b. you can always find a rational number smaller than a given real number hence a sequence being rational-Cauchy means it is also real-Cauchy)
    • (6) define limit (on reals)
    • (7) show it's consistent with limit on rationals
    • (8) ... and that they're equivalent to real-Cauchy
    • (9) proceed to ignore the distinction b/w real-Cauchy/limit and their rational counterpart. Slick!

(will probably keep updating this in the replies)

Comment by Dalcy (Darcy) on Why Not Just Outsource Alignment Research To An AI? · 2023-03-15T20:53:38.405Z · LW · GW

That means the problem is inherently unsolvable by iteration. "See what goes wrong and fix it" auto-fails if The Client cannot tell that anything is wrong.

Not at all meant to be a general solution to this problem, but I think that a specific case where we could turn this into something iterable is by using historic examples of scientific breakthroughs - consider past breakthroughs to a problem where the solution (in hindsight) is overdetermined, train the AI on data filtered by date, and The Client evaluates the AI solely based on how close the AI approaches that overdetermined answer.

As a specific example: imagine feeding the AI historical context that led up to the development of information theory, and checking if the AI converges onto something isomorphic to what Shannon found (training with information cutoff, of course). Information theory surely seems like The Over-determined Solution for tackling the sorts of problems that it was motivated by, and so the job of the client/evaluator is much easier.

Of course this is probably still too difficult in practice (eg not enough high-quality historical data of breakthroughs, evaluation & data-curation still demanding great expertise, hope of "... and now our AI should generalize to genuinely novel problems!" not cashing out, scope of this specific example being too limited, etc).

But the situation for this specific example sounds somewhat better than that laid out in this post, i.e. The Client themselves needing the expertise to evaluate non-hindsight based supposed Alignment breakthroughs & having to operate on completely novel intellectual territory.

Comment by Dalcy (Darcy) on The Waluigi Effect (mega-post) · 2023-03-03T11:42:23.079Z · LW · GW

Therefore, the longer you interact with the LLM, eventually the LLM will have collapsed into a waluigi. All the LLM needs is a single line of dialogue to trigger the collapse.

Hm, what if we do the opposite? i.e. Prompt chatbob starting as a pro-croissant simulacrum, and then proceed to collapse the superposition into the anti-croissant simulacrum using a single line of dialogue; behold, we have created a stable Luigi!

I can see how this is more difficult for desirable traits rather than their opposite because fiction usually has the structure of an antagonist appearing after the protagonist (who holds our values), rarely the opposite.

(leaving this comment halfway through - you could've mentioned this later in the post)

Comment by Dalcy (Darcy) on Abstraction As Symmetry and Other Thoughts · 2023-02-28T21:49:29.262Z · LW · GW

The actual theorem is specific to classical mechanics, but a similar principle seems to hold generally.

Interesting, would you mind elaborating on this further?

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-02-11T17:51:25.411Z · LW · GW

Just noticing that the negation of a statement exists is enough to make meaningful updates.

e.g. I used to (implicitly) think "Chatbot Romance is weird" without having evaluated anything in-depth about the subject (and consequently didn't have any strong opinions about it)—probably as a result of some underlying cached belief. 

But after seeing this post, just reading the title was enough to make me go (1) "Oh! I just realized it is perfectly possible to argue in favor of Chatbot Romance ... my belief on this subject must be a cached belief!" (2) hence is probably by-default biased towards something like the consensus opinion, and (3) so I should update away from my current direction, even without reading the post.

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-01-13T10:04:16.007Z · LW · GW

(Note: This was a post, but in retrospect was probably better to be posted as a shortform)

(Epistemic Status: 20-minute worth of thinking, haven't done any builder/breaker on this yet although I plan to, and would welcome any attempts in the comment)

  1. Have an algorithmic task whose input/output pair could (in reasonable algorithmic complexity) be generated using highly specific combination of modular components (e.g., basic arithmetic, combination of random NN module outputs, etc).
  2. Train a small transformer (or anything, really) on the input/output pairs.
  3. Take a large transformer that takes the activation/weights, and outputs a computational graph.
  4. Train that large transformer over the small transformer, across a diverse set of such algorithmic tasks (probably automatically generated) with varying complexity. Now you have a general tool that takes in a set of high-dimensional matrices and backs-out a simple computational graph, great! Let's call it Inspector.
  5. Apply the Inspector in real models and see if it recovers anything we might expect (like induction heads).
  6. To go a step further, apply the Inspector to itself. Maybe we might back-out a human implementable general solution for mechanistic interpretability! (Or, at least let us build a better intuition towards the solution.)

(This probably won't work, or at least isn't as simple as described above. Again, welcome any builder/breaker attempts!)

Comment by Dalcy (Darcy) on Dalcy's Shortform · 2023-01-11T19:08:47.242Z · LW · GW

There were various notions/frames of optimization floating around, and I tried my best to distill them:

  • Eliezer's Measuring Optimization Power on unlikelihood of outcome + agent preference ordering
  • Alex Flint's The ground of optimization on robustness of system-as-a-whole evolution
  • Selection vs Control as distinguishing different types of "space of possibilities"
    • Selection as having that space explicitly given & selectable numerous times by the agent
    • Control as having that space only given in terms of counterfactuals, and the agent can access it only once.
    • These distinctions correlate with the type of algorithm being used & its internal structure, where Selection uses more search-like process using maps, while Control may just use explicit formula ... although it may very well use internal maps to Select on counterfactual outcomes!
      • In other words, the Selection vs Control may very well be viewed as a different cluster of Analysis. Example:
        • If we decide to focus our Analysis of "space of possibilities" on eg "Real life outcome," then a guided missile is always Control.
        • But if we decide to focus on "space of internal representation of possibilities," then a guided missle that uses internal map to search on becomes Selection.
  • "Internal Optimization" vs "External Optimization"
    • Similar to Selection vs Control, but the analysis focuses more on internal structure:
      • Why? Motivated by the fact that, as with the guided missile example, Control systems can be viewed as Selection systems depending on perspective
      • ... hence, better to focus on internal structures where it's much less ambiguous.
    • IO: Internal search + selection
    • EO: Flint's definition of "optimizing system"
      • IO is included in EO, if we assume accurate map-to-environment correspondence.
    • To me, this doesn't really get at what the internals of actually-control-like systems look like, which presumably a subset of EO - IO.
  • Search-in-Territory vs Search-in-Map
    • Greater emphasis on internal structure—specifically, "maps."
    • Maps are capital investment, allowing you to be able to optimize despite not knowing what to exactly optimize for (by compressing info)

I have several thoughts on these framings, but one trouble is the excessive usage of words to represent "clusters" i.e. terms to group a bunch of correlated variables. Selection vs Control, for example, doesn't have a clear definition/criteria but rather points at a number of correlated things, like internal structure, search, maps, control-like things, etc.

Sure, deconfusing and pointing out clusters is useful because clusters imply correlations and correlations perhaps imply hidden structure + relationships—but I think the costs from cluster-representing-words doing hidden inference is much greater than the benefits, and it would be better to explicitly lay out the features-of-clusters that the one is referring to instead of just using the name of the cluster.

This is similar to the trouble I had with "wrapper-minds," which is yet another example of a cluster pointing at a bunch of correlated variables, and people using the same term to mean different things.

Anyways, I still feel totally confused about optimization—and while these clusters/frames are useful, I think thinking in terms of them would ensue even more confusion within myself. It's probably better to take the useful individual parts within the cluster and start deconfusing from the ground-up using those as the building blocks.