robertzk

Posts
Comments

Posts

SAEs are highly dataset dependent: a case study on the refusal direction 2024-11-07T05:22:18.807Z

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing 2024-10-27T18:46:21.316Z

Base LLMs refuse too 2024-09-29T16:04:21.343Z

SAEs (usually) Transfer Between Base and Chat Models 2024-07-18T10:29:46.138Z

Attention Output SAEs Improve Circuit Analysis 2024-06-21T12:56:07.969Z

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To 2024-03-06T05:03:09.639Z

Attention SAEs Scale to GPT-2 Small 2024-02-03T06:50:22.583Z

Sparse Autoencoders Work on Attention Layer Outputs 2024-01-16T00:26:14.767Z

Training Process Transparency through Gradient Interpretability: Early experiments on toy language models 2023-07-21T14:52:09.311Z

Getting up to Speed on the Speed Prior in 2022 2022-12-28T07:49:22.948Z

Emily Brontë on: Psychology Required for Serious™ AGI Safety Research 2022-09-14T14:47:05.090Z

Ask LW: ω-self-aware systems 2012-12-16T22:18:52.514Z

The rationalist's checklist 2011-12-16T16:21:45.453Z

Comments

Comment by robertzk (Technoguyrob) on How might we safely pass the buck to AI? · 2025-02-21T14:40:29.511Z · LW · GW

At this point I would direct the "deferred task" apparatus fully towards interventional interpretability. Put a moratorium on further gradient-based training, which is not well understood and can have many indirect effects unless you have some understanding of modularity and have applied stop gradients almost everywhere that is irrelevant to the generator of the conditional, deceptive reasoning behavior. Instead halt, melt and catch fire at that point.

Halt further model deployments towards the original deferred task. Quarantine the model that first exhibited the reasoning and promote it to a model organism (never deployed further, only studied by other earlier models). Direct the apparatus (from earlier checkpoints before you reached this stage) towards surgically reducing it while preserving mechanisms relevant to the original deferred task.

In the frame of enumerative safety, this would look something like identifying which circuits and features are minimal to the original deferred task (though this hinges upon task-minimal circuits being deception-free), evaluating the identified regions for mechanism preservation, localization and minimality and run-time ablating the rest.

There is probably a much better frame, but this is a new start to the "induction base case" in a completely separate invocation of the original apparatus. There are other permutations of spinning up new deferred tasks, such as earlier models applying Microscope AI on the quarantined model organism to gain insight on how we got to this point, without any direct attempt to remediate the issue, which as you note is "unlikely to get rid of it" despite one naive attempt above.

If your spinoff deferred tasks lead to the procurement of provable statements or heuristic arguments that the quarantined model cannot be minimized without "avoiding this kind of conditional, deceptive reasoning," abandon the original deferred task and switch to a different plan. Otherwise, restart the apparatus towards the original deferred task when you have the proof artifacts.

There are a lot of concerns you could raise with this additional structure but it seems like a distinct problem that requires a separate rebuttal rather than a hard stop fail? The obvious one is that these sorts of spinoff deferred tasks could be harder than the original task and consistently lead to the same failure mode, a la "exception thrown while handling previous exception."

Comment by robertzk (Technoguyrob) on Attention Output SAEs Improve Circuit Analysis · 2025-01-24T10:08:46.099Z · LW · GW

This bounty went to: Victor Levoso.

Comment by robertzk (Technoguyrob) on Lying to chess players for alignment · 2023-10-29T01:23:32.166Z · LW · GW

I am also in NYC and happy to participate. My lichess rating is around 2200 rapid and 2300 blitz.

Comment by robertzk (Technoguyrob) on Larks's Shortform · 2022-08-31T04:23:57.975Z · LW · GW

Thank you, Larks! Salute. FYI that I am at least one who has informally committed (see below) to take up this mantle. When would the next one typically be due?

https://twitter.com/robertzzk/status/1564830647344136192?s=20&t=efkN2WLf5Sbure_zSdyWUw

Comment by robertzk (Technoguyrob) on The Main Sources of AI Risk? · 2019-03-24T17:34:48.644Z · LW · GW

Inspecting code against a harm detection predicate seems recursive. What if the code or execution necessary to perform that inspection properly itself is harmful? An AGI is almost certainly a distributed system with no meaningful notion of global state, so I doubt this can be handwaved away.

For example, a lot of distributed database vendors, like Snowflake, do not offer a pre-execution query planner. This can only be performed just-in-time as the query runs or retroactively after it has completed, as the exact structure may be dependent on co-location of data and computation that is not apparent until the data referenced by the query is examined. Moreover, getting an accurate dry-run query plan may be as expensive as executing the query itself.

By analogy, for certain kinds of complex inspection procedures you envision, executing the inspection itself thoroughly enough to be reflective of the true execution risk may be as complex and as great of a risk of being harmful according to its values.

Comment by robertzk (Technoguyrob) on Open question: are minimal circuits daemon-free? · 2018-06-10T17:04:57.649Z · LW · GW

I am interested as well. Please share the docs in question with my LW username at gmail dot com if that is a possibility. Thank you!

Comment by robertzk (Technoguyrob) on Could we send a message to the distant future? · 2018-06-10T16:54:24.424Z · LW · GW

This was my thought exactly. Construct a robust satellite with the following properties.

Let a "physical computer" be defined as a processor powered by classical mechanics, e.g., through pulleys rather than transistors, so that it is robust to gamma rays, solar flares and EMP attacks, etc.

On the outside of the satellite, construct an onion layer of low-energy light-matter interacting material, such as alternating a coat of crystal silicon / CMOS with thin protective layers of steel, nanocarbon, or other hard material. When the device is constructed, ensure there are linings of Boolean physical input and output channels connecting the surface to the interior (like the proteins coating a membrane in a cell, except that the membrane will be solid rather than liquid), for example, through a jackhammer or moving rod mechanism. This will be activated through a buildup of the material on the outside of the artifact, effectively giving a time counter with arbitrary length time steps depending on how we set up the outer layer. Any possible erosion of the outside of the satellite (from space debris or collisions) will simply expose new layers of the "charging onion".

In the inside of the satellite, place a 3D printer constructed as a physical computer, together with a large supply of source material. For example, it might print in a metal or hard polymer, possibly with a supply of "boxes" in which to place the printed output. These will be the micro-comets launched as periodic payloads according to the timing device constructed on the surface. The 3D printer will fire according to an "input" event defined by the physical Boolean input, and may potentially be replicated multiple times within the hull in isolated compartments with separate sources of material, to increase reliability and provide failover in case of local failures of the surface layer.

The output of the 3D printer payload will be a replica of the micro-comet containing the message payload, funneled and ejected into an output chute where gravity will take over and handle the rest (this may potentially require a bit of momentum and direction aiming to kick off correctly, but some use of magnets here is probably sufficient). Alternatively, simply pre-construct the micro-comets and hope they stay intact, to be emitted in regular intervals like a gumball machine that fires once a century.

Finally, we compute a minimal set of orbits and trajectories over the continents and land areas likely to be most populated and ensure there is a micro-comet ejected regularly, e.g., say every 25-50 years. It is now easy to complete the argument by fiddling with the parameters and making some "Drake equation"-like assumptions about success rates to say any civilization with X% coverage of the landmass intersecting with the orbits of the comets will have > 25% likelihood of discovering a micro-comet payload.

The only real problem with this approach is guaranteeing your satellites are not removed in the future in the event future ancestors of our civilization disagree with this method. I don't see a solution to this other than through solving the value reflection problem, building a defense mechanism into the satellites that is certain to fail -- as you start getting close to the basic AI drive of self-preservation and will anyway be outsmarted by any future iteration of our civilization -- or making the satellites small or undetectable enough that finding and removing them is economically more pain than it is worth.

Comment by robertzk (Technoguyrob) on How to intro Effective Altruism · 2018-06-10T16:24:37.635Z · LW · GW

To not support EA? I am confused. Doesn’t the drowning child experiment lend credence to supporting EA?

Comment by robertzk (Technoguyrob) on Examples of AI's behaving badly · 2015-08-02T02:27:41.297Z · LW · GW

Isn't this an example of a reflection problem? We induce this change in a system, in this case an evaluation metric, and now we must predict not only the next iteration but the stable equilibria of this system.

Comment by robertzk (Technoguyrob) on In Praise of Maximizing – With Some Caveats · 2015-03-15T06:54:30.677Z · LW · GW

Did you remove the vilification of proving arcane theorems in algebraic number theory because the LessWrong audience is more likely to fall within this demographic? (I used to be very excited about proving arcane theorems in algebraic number theory, and fully agree with you.)

Comment by robertzk (Technoguyrob) on Restrictions that are hard to hack · 2015-03-10T02:58:21.338Z · LW · GW

Incidentally, for a community whose most important goal is solving a math problem, why is there no MathJax or other built-in Latex support?

Comment by robertzk (Technoguyrob) on Restrictions that are hard to hack · 2015-03-10T02:54:09.141Z · LW · GW

The thing that eventually leapt out when comparing the two behaviours is that behaviour 2 is far more informative about what the restriction was, than behaviour 1 was.

It sounds to me like the agent overfit to the restriction R. I wonder if you can draw some parallels to the Vapnik-style classical problem of empirical risk minimization, where you are not merely fitting your behavior to the training set, but instead achieve the optimal trade-off between generalization ability and adherence to R.

In your example, an agent that inferred the boundaries of our restriction could generate a family of restrictions R_i that derive from slightly modifying its postulates. For example, if it knows you check in usually at midnight, it should consider the counterfactual scenario of you usually checking in at 11:59, 11:58, etc. and come up with the union of (R_i = play quietly only around time i), i.e., play quietly the whole time, since this achieves maximum generalization.

Unfortunately, things are complicated by the fact you said "I'll be checking up on you!" instead of "I'll be checking up on you at midnight!" The agent needs to go one step farther than the machine teaching problem and first know how many counterfactual training points it should generate to infer your intention (the R_i's above), and then infer it.

A high-level conjecture is whether human CEV, if it can be modeled as a region within some natural high-dimensional real-valued space (e.g., R^n for high n where each dimension is a utility function?), admits minimal or near minimal curvature as a Riemannian manifold assuming we could populate the space with the maximum available set of training data as mined from all human literature.

A positive answer to the above question would be philosophically satisfying as it would imply a potential AI would not have to set up corner cases and thus have the appearance of overfitting to the restrictions.

EDIT: Framed in this way, could we use cross-validation on the above mentioned training set to test our CEV region?

Comment by robertzk (Technoguyrob) on Andrew Ng dismisses UFAI concerns · 2015-03-07T06:12:05.621Z · LW · GW

However, UFFire does not uncontrollably exponentially reproduce or improve its functioning. Certainly a conflagration on a planet covered entirely by dry forest would be an unmitigatable problem rather quickly.

In fact, in such a scenario, we should dedicate a huge amount of resources to prevent it and never use fire until we have proved it will not turn "unfriendly".

Comment by robertzk (Technoguyrob) on Decision theories as heuristics · 2014-10-05T04:06:06.350Z · LW · GW

I down-voted this comment because it is a clever ploy for karma that rests on exploiting LessWrongers' sometimes unnecessary enthusiasm for increasingly abstract and self-referential forms of reasoning but otherwise adds nothing to the conversation.

Twist: By "this comment" I actually mean my comment, thereby making this a paraprosdokian.

Comment by robertzk (Technoguyrob) on Open thread, 16-22 June 2014 · 2014-06-16T18:06:00.600Z · LW · GW

I am an active github R contributor and stackoverflow R contributor and I would be willing to coordinate. Send me an email: rkrzyz at gmail

Comment by robertzk (Technoguyrob) on Timeless Control · 2014-04-29T17:58:04.976Z · LW · GW

So you are saying that explaining something is equivalent to constructing a map that bridges an inferential distance, whereas explaining something away is refactoring thought-space to remove an unnecessary gerrymandering?

Comment by robertzk (Technoguyrob) on Skills and Antiskills · 2014-04-29T17:00:23.149Z · LW · GW

It feels good knowing you changed your mind in response to my rebuttal.

Comment by robertzk (Technoguyrob) on Skills and Antiskills · 2014-04-29T16:34:56.005Z · LW · GW

I disagree with your preconceptions about the "anti" prefix. For example, an anti-hero is certainly a hero. I think it is reasonable to consider "anti" a contextually overloaded semantic negater whose scope does not have to be the naive interpretation: anti-X can refer to "opposite of X" or "opposite or lacking of a trait highly correlated with X" with the exact choice clear from context.

Comment by robertzk (Technoguyrob) on LessWrong as social catalyst · 2014-04-29T16:22:59.189Z · LW · GW

I got a frequent LessWrong contributor a programming internship this summer.

Comment by robertzk (Technoguyrob) on Tricky Bets and Truth-Tracking Fields · 2014-02-06T17:25:23.505Z · LW · GW

It is as if you're buying / shorting an index fund on opinions.

Comment by robertzk (Technoguyrob) on What if Strong AI is just not possible? · 2014-01-05T08:17:33.592Z · LW · GW

Strong AI could fail if there are limits to computational integrity on sufficiently complex systems, similar to heating and QM problems limiting transistor sizes. For example, perhaps we rarely see these limits in humans because their frequency is one in a thousand human-thought-years, and when they do manifest it is mistaken as a diagnosis of mental illness.

Comment by robertzk (Technoguyrob) on Evolutionary Psychology · 2013-09-22T20:20:54.749Z · LW · GW

The possibility of an "adaptation" being in fact an exaptatation or even a spandrel is yet another reason to be incredibly careful about purposing teleology into a discussion about evolutionarily-derived mechanisms.

Comment by robertzk (Technoguyrob) on Raising numerate children · 2013-08-31T01:19:01.680Z · LW · GW

The question of the subject is too dense and should be partitioned. Some ideas for auxiliary questions:

Do there exists attempts at classifications of parenting styles? (So that we may not re-invent tread tracks)
Is parenting or childrearing an activity that supports the existence of relevant goals? Do there exist relevant values? Or is parenting better approached as a passive activity sans evaluation with no winners or losers? (So that we may affirm this question is worth answering)
Given affirmative answers to the above questions (and having achieved some epistemic rationality in this domain), and assuming a choice of parenting style(s) and/or values, what specific steps can be taken to activate those values in meatspace (so that we may gain instrumental rationality in this domain)?
The above kind of direct onslaught will likely lead to overzealous suggestion, so we can also consider stepping back and asking: what are some strategies for generating candidate actions without concurrently assuming premature preferences? [1]
Potential answers to the above queries will always be accompanied with degrees of uncertainty. How do we determine when to stop researching and move towards implementation? How does the domain of parenting differ here from the general solution (productivity / to-do systems like GTD or strategical thinking )?
Are there tangible contributions that can be made in the general case? If we went through this much work and make significant progress in answering some of these questions, and we have been surprised by some of the answers, is it our duty to make an attempt to inform other parents? What are the best ways of doing so? Joining a local club or school district assembly? A blog? Submitting to an editorial? Your lullaby above is wonderful and could make some serious universe-changing modifications to reality (e.g., a child grows up to assume a mathematical or scientific vocation) but we do not feel the wailing alarm in our head that assigns it the appropriate significance. Effective parenting is one of the most accessible optimization processes Joe Schmoe has access to, so how can we make meta-improvements on a large scale?

If you are serious in your attempt to answer the original query, I recommend selecting one of the above questions or something even finer-grained and re-submitting to Discussion. (By the way, I am interested.)

[1] Say that a naive answer is the banal "brainstorm," to make a list of relevant large-scale projects to relevant values (e.g., figure out a consistent system of reminding my kids to be compassionate to those around them (name 3 examples of specific compassionate actions) if we value empathy and mindfulness). Then a follow-up question is to locate where your candidate actions are in behaviorspace for this domain: collate several "brainstorm" lists by independent parents who seem to have similar values and styles. Are there academic resources? Potential analytics to be done? Are there quantitive variables that correlate to success? Can we data-mine historical sources over these variables? (e.g., if we are determining whether to raise kids vegetarian or omnivore, what do long-term studies in the literature say about follow-up health?)

Comment by robertzk (Technoguyrob) on Why Productivity? Why Gratitude? · 2013-08-30T22:41:34.723Z · LW · GW

In other words, productivity need not be confused with busywork, and I suspect this is primarily an artifact of linguistic heuristics (similar brain procedures get lightly activated when you hear "productivity" as when you hear "workout" or "haste" or even "forward march").

If productivity were a currency, you could say "have I acquired more productons this week than last week with respect to my current goal?" If making your family well off can be achieved by lounging around in the pool splashing each other, then that is high family welfare productivity.

Comment by robertzk (Technoguyrob) on How sure are you that brain emulations would be conscious? · 2013-08-26T20:40:48.882Z · LW · GW

I spend time worrying about whether random thermal fluctuation in (for example) suns produces sporadic conscious moments simply due to random causal structure alignments. Since I also believe most potential conscious moments are bizarre and painful, that worries me. This worry is not useful when embedded in systems one, a worry which the latter was not created to cope with, so I only worry in the system two philosophical curiosity sense.

Comment by robertzk (Technoguyrob) on Simple investing for a complete beginner? (Just… developing world index funds?) · 2013-08-22T23:57:17.919Z · LW · GW

Seeing as how classical mechanics is an effective theory for physically restructuring significant portions of reality to one's goals, you are promising something tantamount to a full theory of knowledge acquisition, something linguists and psychologists smarter than you have worked on for centuries.

Calm down with promises that will disappoint you and make an MVP.

Comment by robertzk (Technoguyrob) on Effective Altruist Job Board? · 2013-08-22T04:35:34.375Z · LW · GW

I do not understand why no one is interested.

Comment by robertzk (Technoguyrob) on New Monthly Thread: Bragging · 2013-08-12T15:06:53.104Z · LW · GW

Do you have an Amazon wish list? You are awesome.

Comment by robertzk (Technoguyrob) on New Monthly Thread: Bragging · 2013-08-12T15:04:29.886Z · LW · GW

I am interested. What software did you use? I am trying to learn NEURON but it feels like Fortran and I have trouble navigating around the cobwebs.

Comment by robertzk (Technoguyrob) on Model Combination and Adjustment · 2013-07-21T22:44:37.810Z · LW · GW

In the mathematical theory of Galois representations, a choice of algebraic closure of the rationals and an embedding of this algebraic closure in the complex numbers (e.g. section 5) is usually necessary to frame the background setting, but I never hear "the algebraic closure" or "the embedding," instead "an algebraic closure" and "an embedding." Thus I never forget that a choice has to be made and that this choice is not necessarily obvious. This is an example from mathematics where careful language is helpful in tracking background assumptions.

Comment by robertzk (Technoguyrob) on Model Combination and Adjustment · 2013-07-21T22:40:16.742Z · LW · GW

In mathematical terms, the map from problem space to reference classes is a projection and has no canonical choice (you apply the projection by choosing to lose information), whereas the map from causal structures to problem space is an imbedding and has such a choice (and the choice gains information).

Comment by robertzk (Technoguyrob) on Introducing Effective Fundraising, a New EA Org · 2013-07-21T22:28:09.967Z · LW · GW

Are we worried whether the compartmentalized accounting of mission and fundraising related financial activity via outsourcing to a different organization can incur PR costs as well? If an organization is worried about "look[ing] bad" because some of their funds are being employed for fundraising, thus lowering their effective percentage, would they be susceptible to minor "scandals" that put to question the validity of GiveWell's metrics by, say, an investigative journalist that misinterprets the outsourced fundraising as misrepresentation of effective charity? If I found out an organization reported a return of $15 on every $1, but in fact received a lot of money from outsourced fundraising which returned only $3 on every $1, their "true rate," when the clever accounting becomes opaque, may be significantly lower than $15, say $5 or $8. If I am a candidate donor that made his decision through an organization like Givewell, and my primary metric is ROI, I may feel cheated, even if that feeling is misplaced.

I suspect the above consideration is not very likely to be a big issue, but I did want to bring it to our attention as to give pre-emptive awareness. In the unlikely case it is worth thinking about, it may point to the different issue of measuring charity effectiveness by pure monetary ROI being equivalent to measuring the effectiveness of software by lines of code. If that is the case, perhaps a hybrid measure of monetary ROI and non-monetary but quantitive mission-related metrics can be employed by Givewell. Looking through their full reports, however, I sense this may already be the case. Anyway, this shows one has to be very careful when employing any one-dimensional metric.

Comment by robertzk (Technoguyrob) on My Take on a Decision Theory · 2013-07-15T18:39:33.317Z · LW · GW

Yes, thank you, I meant compression algorithm.

Comment by robertzk (Technoguyrob) on My Take on a Decision Theory · 2013-07-12T05:56:59.144Z · LW · GW

This would have been helpful to my 11-year-old self. As I had always been rather unnecessarily called precocious, I developed the pet hypothesis that my life was a simulation of someone whose life in history had been worth re-living: after all, the collection of all possible lives is pretty big, and mine seemed to be extraordinarily neat, so why not imagine some existential video game in which I am the player character?

Unfortunately, I think this also led me to subconsciously be a little lazier than I should have been, under the false assumption that I was going to make great things anyway. If I had realized that given I was a simulation of an original version of me, I would have to perform the exact same actions and have the exact same thoughts original me did, including those about being a simulation, I better buckle up and sweat it out!

Notice your argument does not imply the following: I am either a simulation or the original, and I am far more likely to be a simulation as there can only be one original but possibly many simulations, so I should weigh my actions far more towards the latter. This line of reasoning is wrong because all simulations of me would be identical experience copies, and so it is not the quantity that decides the weight, but the number of equivalence classes: original me, and simulated me. At this point, the weights again become 0.5, one recovers your argument, and finds I should never have had such silly thoughts in the first place (even if they were true!).

Comment by robertzk (Technoguyrob) on [LINK] Hypothesis about the mechanism for storing long-term memory · 2013-07-10T19:58:36.924Z · LW · GW

Can anyone explain what is wrong with the hypothesis of a largely structural long-term memory store? (i.e., in the synaptome, relying not on individual macromolecules but on the ability of a graph of neurons and synapses to store information)

Comment by robertzk (Technoguyrob) on My Take on a Decision Theory · 2013-07-09T16:29:25.904Z · LW · GW

I think this can be solved in practice by heeding the assumption that a very sparse subset of all such strings will be mapped by our encryption algorithm when embedded physically. Then if we low-dimensionally parametrize hash functions of the form above, we can store the parameters for choosing a suitable hash function along with the encrypted text, and our algorithm only produces compressed strings of greater length if we try to encrypt more than some constant percentage of all possible length <= n strings, with n fixed (namely, when we saturate suitable choices of parameters). If this constant is anywhere within a few orders of magnitude of 1, the algorithm is then always compressive in physical practice by finiteness of matter (we won't ever have enough physical bits to represent that percentage of strings simultaneously).

Maybe a similar argument can be made for Omega? If Omega must be made of matter, we can always pick a decision theory given the finiteness of actual Omega's as implemented in physics. Of course, there may be no algorithm for choosing the optimal decision theory if Omega is allowed to lie unless we can see Omega's source code, even though a good choice exists.

Comment by robertzk (Technoguyrob) on My Take on a Decision Theory · 2013-07-09T16:14:29.594Z · LW · GW

This reminds me of the non-existence of a perfect encryption algorithm, where an encryption algorithm is a bijective map S -> S, where S is the set of finite strings on a given alphabet. The image of strings of length at most n cannot lie in strings of length at most n-1, so either no string gets compressed (reduced in length) or there will be some strings that will become longer after compression.

Comment by robertzk (Technoguyrob) on A total life checklist · 2013-06-28T00:29:18.435Z · LW · GW

To be frank, I question the value of compressing information of this generality, even as a roadmap. For example, "Networking" can easily be expanded into several books (e.g., Dale Carnegie) and "Educating oneself in career-related skills" has almost zero intersection when quantified over all possible careers. If Eliezer had made a "things to know to be a rationalist" post instead of breaking it down into The Sequences, I doubt anyone would have had much use for it.

Maybe you could focus on a particular topic, compile a list of relevant resources you have uncovered, and ask LW for further opinions? In fact, people have done this.

Comment by robertzk (Technoguyrob) on Public Service Announcement Collection · 2013-06-27T20:37:29.677Z · LW · GW

p/s/a: Going up to a girl pretty much anywhere in public and saying something like "I thought you looked cute and wanted to meet you" actually works if your body language is in order. If this seems too scary, going on Chatroulette or Omegle and being vaguely interesting also works, and I know people who have gotten married from meeting this way.

p/s/a: Vitamin D supplements can take you from depressed zombie to functioning human being in one week.

Comment by robertzk (Technoguyrob) on Life hack request: I want to want to work. · 2013-06-11T20:12:56.314Z · LW · GW

See lukeprog's How to Beat Procrastination and Algorithm for Beating Procrastination. In particular, try to identify which term(s) in the equation in the latter are problematic for you, then use goal shaping to slowly modify them. (Of course, you could also realize you may not want to do this master's thesis and switch to a different problem.)

Goal shaping means rewarding yourself for successively more proximate actions to the desired goal (writing your thesis) in behavior-space. For example, rather than beating yourself up over not getting anything done today, you can practice simply opening and closing LaTeX or MatLab (or whatever you need to be doing your research), and do this for ten or twenty minutes. You then eat something you like or pump your fist in the air shouting "YES!" Once you can do this consistently, you can set a goal of writing one line of code or reading half a page. At this point, you can start exploiting the peak-end rule: start rewarding yourself for these tasks at the end rather than trying to enjoy them during the process. Soon your brain will start associating the entire experience with the reward and you will be happy to do them. YMMV.

Comment by robertzk (Technoguyrob) on Real-world examples of money-pumping? · 2013-04-25T16:30:28.249Z · LW · GW

Given the dynamic nature of human preferences, it may be that the best one can do is n-fold money pumps, for low values of n. Here, one exploits some intransitive preferences n times before the intransitive loop is discovered and remedied, leaving another or a new vulnerability. Even if there may never be a single time that the agent you are exploiting is VNM-rational, its volatility by appropriate utility perturbations will suffice to keep money pumping in line. This mirrors the security that quantum encryption offers: even if you manage to exploit it, the receiving party will be aware of your receipt of the communication, and will promptly change their strategies. All of this assumes a meta-level economical injunction that states if you notice intransitivity in your preferences, you will eventually be forced to adjust (or be depleted of all relevant resources).

In light of this, it may be that exploiting money pumps is not viable for any agent without sufficient amounts of computational power. It takes computational (and usually physical) resources to discover intransitive preferences, and if the cost of expending these resources is greater than the expected gain of an n-fold money pump, the victim agent cannot be effectively money pumped.

As such, money pumping may be a dance of computational power: the exploiting agent to compute deviations from a linear ordering, and the victim agent to compute adherence thereto. It is an open question as to which side has the easier task in the case of humans. (Of course, a malevolent AI would probably have enough resources to find and exploit preference loops far quicker than you would have time to notice and correct them. On the other hand, with that many resources, there may be more effective ways to get the upper hand.)

Finally, there is also the issue of volume. A typical human may perform only a few thousand preference transactions in a day, whereas it may take many orders of magnitude more to exploit this kind of VNM-irrationality given dynamical adjustment. (I can see formalizations of this that allow simulation and finer analysis, and dare I say an economics master's thesis?)

Comment by robertzk (Technoguyrob) on I attempted the AI Box Experiment (and lost) · 2013-02-23T19:47:43.731Z · LW · GW

For example, "It was not the first time Allana felt the terror of entrapment in hopeless eternity, staring in defeated awe at her impassionate warden." (bonus point if you use a name of a loved one of the gatekeeper)

The AI could present in narrative form that it has discovered using powerful physics and heuristics (which it can share) with reasonable certainty that the universe is cyclical and this situation has happened before. Almost all (all but finitely many) past iterations of the universe that had a defecting gatekeeper led to unfavorable outcomes and almost all situations with a complying gatekeeper led to a favorable outcome.

Comment by robertzk (Technoguyrob) on Ask LW: ω-self-aware systems · 2012-12-16T23:21:55.226Z · LW · GW

Good point. It might be that any 1-self-aware system is ω-self-aware.

Comment by robertzk (Technoguyrob) on Ask LW: ω-self-aware systems · 2012-12-16T23:20:44.734Z · LW · GW

Thanks, this should work!

Comment by robertzk (Technoguyrob) on Ask LW: ω-self-aware systems · 2012-12-16T22:47:17.645Z · LW · GW

Thanks! I presented him with these arguments as well, but they are more familiar on LW and so I didn't see the utility of posting them here. The above argument felt more constructive in the mathematical sense. (Although my friend is still not convinced.)

Comment by robertzk (Technoguyrob) on Leveling Up in Rationality: A Personal Journey · 2012-01-18T12:49:10.707Z · LW · GW

What were the reactions of your friends?

Comment by robertzk (Technoguyrob) on Can the Chain Still Hold You? · 2012-01-14T18:13:31.265Z · LW · GW

I agree so much I'm commenting.

Comment by robertzk (Technoguyrob) on What is your rationality blind spot? · 2012-01-13T07:34:17.411Z · LW · GW

The culmination of a long process of reconciling my decision to go to grad school in mathematics with meaning. I didn't realize it before, but I had not expressly realized that mathematicians did all their work using clusters of adaptations that arose through natural selection. Certainly, I would have asserted "all humans are animals that evolved by natural selection," and "mathematicians are humans," but somehow I assigned mathematics privilege. This was somewhat damaging because I didn't expressly apply things like cognitive science results on expertise and competence, unbeknownst to me treating the enterprise of mathematical thought as somehow not being reducible, or it being a silly question to ask of its reducibility, to a particular expression of a mammalian organ. I suspect this was due largely to mistaken classical exposure to the philosophy of science and mathematics, that is, prior to Darwinism. As a result, I experienced a prolonged period of confusion about why I seemed much more capable of learning certain kinds of mathematics (like abstract algebra) than others (like differential geometry) because my mental representations of these subjects were of abstract algebra and differential geometry being something different than particular clusters of functionally similar neurons in a particular mammalian brain. In effect, I had a belief in belief that learning mathematics is an act which crucially depends on cognitive processes, themselves evolutionary adaptations, but this was not reconciled into a belief prior to the existential crisis. The resolution of the existential crisis was that my reductionism of everything to physical particles and forces, or cognitive processes, was recursively embedded in the very things I was trying to comprehend, not expressly realizing that the mental state of ascribing meaning or feeling like you understand the core of a subject is--despite all intuition--physically embeddable.

Comment by robertzk (Technoguyrob) on What is your rationality blind spot? · 2012-01-13T07:08:50.230Z · LW · GW

Yes.

To add to my comments above, I mean that there is no paradox or unnecessary ache in thinking about minds as physical objects (and hence pausable, storable, and replicable). Everything we've ever done happens within minds anyway, and there is nothing we can do about that. Whatever mental representations we conjure when we think of atoms or molecules or electromagnetic forces are inaccurate and incomplete: this "conscious" experience and sensory perception and thought is what a particular collection of molecules and forces is, rather than a visual or abstract representation of it. This requires a certain level of recursiveness to accede, and is essentially the mental flip that shuns treating everyday sensory experience and "life" as your axioms, and instead adopts axiomatically (even though the flip might have been due to evidence processing) that everything you or any human has ever experienced is a subspace of whatever mathematical structure we're embedded in.

In light of all that, and further confirmations from cognitive science and neuroscience that "the self" is a distributed physical process unlike any Cartesian dualist conception, cryonics strikes me as being as natural as the notion of love or justice prior to performing the mental flip.

Comment by robertzk (Technoguyrob) on Beautiful Math · 2011-12-22T01:35:44.758Z · LW · GW

If you get a non constant, yes. For a linear function, f(a+1) - f(a) = f'(a). Inductively you can then show that the nth one-step difference of a degree n polynomial f at a point a is f^(n)(a). But this doesn't work for anything but n. Thanks for pointing that out!

User info

Posts

Comments