Posts

What is Ontology? 2024-02-12T23:01:35.632Z
Choosing a book on causality 2024-02-07T21:16:08.885Z
Would you have a baby in 2024? 2023-12-25T01:52:04.358Z
How useful is Corrigibility? 2023-09-12T00:05:41.995Z
Disincentivizing deception in mesa optimizers with Model Tampering 2023-07-11T00:44:48.089Z

Comments

Comment by martinkunev on Value Impact · 2024-03-28T00:21:41.984Z · LW · GW

It seems to me that objective impact stems from convergent instrumental goals - self-preservation, resource acquisition, etc.

Comment by martinkunev on Do not delete your misaligned AGI. · 2024-03-25T23:36:03.355Z · LW · GW

A while back I was thinking about a kind of opposite approach. If we train many agents and delete most of them immediately, they may be looking to get as much reward as possible before being deleted. Potentially deceptive agents may prefer to show their preferences. There are many IFs to this idea but I'm wondering whether it makes any sense.

Comment by martinkunev on Anxiety vs. Depression · 2024-03-19T11:12:00.466Z · LW · GW

Both gravity and inertia are determined by mass. Both are explained by spacetime curvature in general relativity. Was this an intentional part of the metaphor?

Comment by martinkunev on Goal-Completeness is like Turing-Completeness for AGI · 2024-02-19T17:30:30.030Z · LW · GW

I find the ideas you discuss interesting, but they leave me with more questions. I agree that we are moving toward a more generic AI that we can use for all kinds of tasks.

I have trouble understanding the goal-completeness concept. I'd reiterate @Razied 's point. You mention "steers the future very slowly", so there is an implicit concept of "speed of steering". I don't find the turing machine analogy helpful in infering an analogous conclusion because I don't know what that conclusion is.

You're making a qualitative distinction between humans (goal-complete) and other animals (non-goal complete) agents. I don't understand what you mean by that distinction. I find the idea of goal completeness interesting to explore but quite fuzzy at this point.

Comment by martinkunev on Goal-Completeness is like Turing-Completeness for AGI · 2024-02-18T21:47:53.254Z · LW · GW

The turing machine enumeration analogy doesn't work because the machine needs to halt.

Optimization is conceptually different than computation in that there is no single correct output.

What would humans not being goal-complete look like? What arguments are there for humans being goal-complete?

Comment by martinkunev on Testing The Natural Abstraction Hypothesis: Project Intro · 2024-02-16T02:41:10.372Z · LW · GW

I'm wondering whether useful insights can come from studying animals (or even humans from different cultures) - e.g. do fish and dolphins form the same abstractions; do bats "see" using ecolocation?

Comment by martinkunev on [deleted post] 2024-02-12T02:11:49.213Z

I hope the next parts don't get delayed due to akrasia :)

Comment by martinkunev on Generalizing From One Example · 2024-02-10T23:49:59.902Z · LW · GW

my guess was 0.8 cheat, 0.2 steal (they just happen to add up to 1 by accident)

Comment by martinkunev on Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis · 2024-02-05T02:19:27.928Z · LW · GW

Max Tegmark presented similar ideas in a TED talk (without much details). I'm wondering if he and Davidad are in touch.

Comment by martinkunev on Schelling fences on slippery slopes · 2024-02-02T02:18:55.172Z · LW · GW

The ban on holocaust denial undermines the concept of free speech - there is no agreed upon schelling point and arguments start. Many people don't really understand the concept of free speech because the example they see is actually a counterexample.

Not everyone is totally okay with it, I certainly am not.

Comment by martinkunev on [deleted post] 2024-01-31T20:46:47.677Z

Maybe "irrational" is not the right word here. The point I'm trying to make is that human preferences are not over world states.

When discussing preference rationality, arguments often consider only preferences over states of the world, while ignoring transitions between those states. For example, a person in San Francisco may drive to San Jose, then to Oakland and then back to San Francisco simply because they enjoy moving around. Cyclic transition between states is not necessarily something that needs fixing.

Comment by martinkunev on Yes Requires the Possibility of No · 2024-01-25T01:27:11.325Z · LW · GW

When a teacher is wondering whether to skip explaining concept-X, they should ask "who is familiar with concept-X" and not "who is not familiar with concept-X".

Comment by martinkunev on Iteration Fixed Point Exercises · 2024-01-23T13:53:47.001Z · LW · GW

For question 2

you haven't proven f is continuous

For question 3 you say

  is a contraction map because  is differentiable and ... 

I would think proving this is part of what is asked for.

Comment by martinkunev on Inner and outer alignment decompose one hard problem into two extremely hard problems · 2024-01-16T20:58:15.078Z · LW · GW

@TurnTrout You wrote "if I don’t, within about a year’s time, have empirically verified loss-as-chisel insights which wouldn’t have happened without that frame..."

More than a year later, what do you think?

Comment by martinkunev on The alignment stability problem · 2024-01-06T00:02:38.349Z · LW · GW

I tend to agree but I believe most non-aligned behavior is due to scarcity. It's hard to get into the heads of people like Stalin but I believe if everybody has a very realistic virtual reality where they could do all the things they'd do in real life, they may be much less motivated to enter into conflict with other humans.

Comment by martinkunev on An Orthodox Case Against Utility Functions · 2023-12-28T01:03:27.498Z · LW · GW

 should have some sort of representation which allows us to feed it into a Turing machine -- let's say it's an infinite bit-string which...


Why do we assume the representation is infinite? Do we assume the environment in which the agent operates is infinite?

Comment by martinkunev on Would you have a baby in 2024? · 2023-12-26T01:18:33.339Z · LW · GW

For example, US-China conflict is fueled in part by the AI race dynamics.

Comment by martinkunev on Would you have a baby in 2024? · 2023-12-26T01:15:46.270Z · LW · GW

I didn't provide any evidence because I didn't make any claim (about timelines or otherwise). I'm trying to get my views by asking on lesswrong and I get something like "You have no right to ask this".

I quoted Yudkowski because he asks a related question (whether you agree with his assessment or not).

 

"I'm not convinced timelines should be relevant to having kids"

Thanks, this looks more like an answer.

Comment by martinkunev on Can you control the past? · 2023-12-13T00:28:55.621Z · LW · GW

In yankees vs red sox, the described oracle (predicting "win" or "lose") is not always possible. The behavior of the agent changes depending on the oracle's prediction. There may not be a possible oracle prediction such that the agent behaves according to it.

Comment by martinkunev on Logical Updatelessness as a Robust Delegation Problem · 2023-12-12T03:37:09.984Z · LW · GW

For whatever reason this is a duplicate of

https://www.lesswrong.com/posts/K5Qp7ioupgb7r73Ca/logical-updatelessness-as-a-subagent-alignment-problem

Comment by martinkunev on Introduction to Cartesian Frames · 2023-12-06T01:35:51.769Z · LW · GW

There seems to be a mistake in section 4.2.
Prevent(C5) is said to be the closure of {{ur}, {nr}, {us}, {ns}} under subsets.
It should be {{nr, us, ns}, {ur, us, ns}, {ur, nr, ns}, {ur, nr, us}}.

Comment by martinkunev on Watermarking considered overrated? · 2023-12-02T14:23:09.442Z · LW · GW

I consider watermarking a lost cause. Trying to imitate humans as best as possible conflicts with trying to distinguish AI- from human-generated output. The task is impossible in the limit. If somebody wants to avoid watermarking, they can always use an open-source model (e.g. to paraphrase the watermarked content).

Digitally signing content can be used to track the origin of that content (but not the tools used to create it). We could have something like a global distributed database indicating the origin of content and everybody can decide what to trust based on that origin. This does not achieve what watermarking is trying to do but I belive this is the best we can do.

As some develop watermarking techniques others are looking for ways to circumvent them. This is an arms-race-like dynamics which just leads to waste of reources. People working on watermarking could probably contribute on something that can have actual benefit instead.

Comment by martinkunev on Cognitive Emulation: A Naive AI Safety Proposal · 2023-11-24T02:39:13.783Z · LW · GW

I'm unsure whether CoEms as described could actually help in solving alignment. It may be the case that advancing alignment requires enough cognitive capabilities to make the system dangerous (unless we have already solved alignment).

I doubt that a single human mind which runs on a computer is guaranteed to be safe - this mind would think orders of magnitude faster (speed superintelligence) and copy itself. Maybe most humans would be safe. Maybe power corrupts.

Comment by martinkunev on There are no coherence theorems · 2023-11-12T02:57:41.581Z · LW · GW

In "A money-pump for Completeness" you say "by the transitivity of strict preference"
This only says that transitive preferences do not need to be complete which is weaker than preferences do not need to be complete.

Comment by martinkunev on There are no coherence theorems · 2023-11-12T02:51:20.466Z · LW · GW

"paying to avoid being given more options looks enough like being dominated that I'd want to keep the axiom of transitivity around"

Maybe offtopic but paying to avoid being given more options is a common strategy in negotiation.

Comment by martinkunev on Deconfusing Direct vs Amortised Optimization · 2023-11-10T13:36:18.634Z · LW · GW

The distinction amortized vs direct in humans seems related to system-1 vs system-2 in Thinking Fast and Slow.

 

"the implementation of powerful mesa-optimizers inside the network quite challenging"

I think it's quite likely that we see optimizers implemented outside the network in the style of AutoGPT (people can explicitly build direct optimizers on top of amortized ones).

Comment by martinkunev on Writing Causal Models Like We Write Programs · 2023-11-01T01:12:07.849Z · LW · GW

The letters I and l look the same. Maybe use 1 instead of upper case i?

Comment by martinkunev on Utilitarianism Meets Egalitarianism · 2023-10-29T12:02:00.262Z · LW · GW

"However, when everyone gets expected utility 1, the expected logarithm of expected utility will have the same derivative as expected expected utility"


Can you clarify this sentence? What functions are we differentiating?

Comment by martinkunev on Siren worlds and the perils of over-optimised search · 2023-10-27T12:37:05.012Z · LW · GW

I'm wondering whether this framing (choosing between a set of candidate worlds) is the most productive. Does it make sense to use criteria like corrigibility, minimizing impact and prefering reversible actions (or we have no reliable way to evaluate whether these hold)?

Comment by martinkunev on REPL's: a type signature for agents · 2023-10-24T23:56:38.509Z · LW · GW

a couple of typos

(no sub X in print)    Env       := { Print  : S → A,   Eval  : S × Aₓ → S }

in the second image, in the bottom right S^1_X should be S^1

Comment by martinkunev on The Anthropic Trilemma · 2023-10-24T12:21:39.823Z · LW · GW

I'm just wondering what would Britney Spears say when she reads this.

Comment by martinkunev on Study Guide · 2023-10-15T13:19:42.559Z · LW · GW

The Games and Information book link is broken. It appears to be this book:
https://www.amazon.com/Games-Information-Introduction-Game-Theory/dp/1405136669/ref=sr_1_1?crid=2VDJZFMYT6YTR&keywords=Games+and+Information+rasmusen&qid=1697375946&sprefix=games+and+information+rasmuse%2Caps%2C178&sr=8-1

Comment by martinkunev on My impression of singular learning theory · 2023-10-01T23:20:38.624Z · LW · GW

To make this easier to parse on the first read, I would add that

N is the number of parameters of the NN and we assume each parameter is binary (instead of the usual float).

Comment by martinkunev on A very non-technical explanation of the basics of infra-Bayesianism · 2023-09-29T01:15:49.381Z · LW · GW

"the agent guesses the next bits randomly. It observes that it sometimes succeeds, something that wouldn't happen if Murphy was totally unconstrained"

Do we assume Murphy knows how the random numbers are generated? What justifies this?

Comment by martinkunev on The Case for Convexity · 2023-09-27T00:09:27.989Z · LW · GW

Arguably the notion of certainty is not applicable to the real world but only to idealized settings. This is also relevant.

Comment by martinkunev on Standard and Nonstandard Numbers · 2023-09-10T00:11:55.736Z · LW · GW

A couple of clarifications if somebody is as confused as me when first reading this.

In ZF we can quantify over sets because "set" is the name we use to designate the underlying objects (the set of natural numbers is an object in the theory). In Peano, the objects are numbers so we can quantify over those we cannot quantify over sets.

Predicates are more "powerful" than first-order formulas so quantifying over predicates allows us to restrict the possible models more than having an axiom for each formula. Even though every predicate is a formula, the interpretation of a predicate is determined by the model so we cannot capture all predicates by having a formula for each predicate symbol.

Comment by martinkunev on Eliezer Yudkowsky Facts · 2023-09-05T22:11:08.378Z · LW · GW

Eliezer Yudkowsky once entered an empty newcomb's box simply so he can get out when the box was opened.

or

When you one-box against Eliezer Yudkowsky on newcomb's problem, you lose because he escapes from the box with the money.

Comment by martinkunev on Improvement on MIRI's Corrigibility · 2023-09-03T00:09:18.857Z · LW · GW

"Realistically, the function UN doesn't incentivize the agent to perform harmful actions."

I don't understand what that means and how it's relevant to the rest of the paragraph.

Comment by martinkunev on A shot at the diamond-alignment problem · 2023-08-31T23:41:21.091Z · LW · GW

It would be interesting to see if a similar approach can be applied to the strawberries problem (I haven't personally thought about this).

Comment by martinkunev on Godzilla Strategies · 2023-08-29T22:55:26.639Z · LW · GW

Refering to all forms of debate, overseeing, etc. as "Godzilla strategies" is loaded language. Should we refrain from summoning Batman because we may end up summoning Godzilla by mistake? Ideally, we want to solve alignment without summoning anything. However, applying some humility, we should consider that the problem may be too difficult for human intelligence to solve.

Comment by martinkunev on ToL: Foundations · 2023-07-26T01:49:19.446Z · LW · GW

The image doesn't load.

The notation in Hume's Black Box seems inconsistent. When defining [e], e is an element of a world. When defining I, e is a set of worlds.

Comment by martinkunev on Which values are stable under ontology shifts? · 2023-07-15T01:59:26.914Z · LW · GW

In "Against Discount Rates" Eliezer characterizes discount rate as arising from monetary inflation, probabilistic catastrophes etc. I think in this light discount rate less than ONE (zero usually indicates you don't care at all about the future) makes sense.

Some human values are proxies to things which make sense in general intelligent systems - e.g. happiness is a proxy for learning, reproduction etc.

Self-preservation can be seen as an instance of preservation of learned information (which is a reasonable value for any intelligent system). Indeed, If there was a medium superior to a human brain where people could transfer the "contents" of their brain, I believe most would do it. It is not a coincidence that self-preservation generalizes this way. Otherwise elderly people would have been discarded from the tribe in the ancestral environment.

Comment by martinkunev on The Parable of the Dagger · 2023-07-08T00:25:21.019Z · LW · GW

Is the existence of such situations an argument for intuitionistic logic?

Comment by martinkunev on FAI and the Information Theory of Pleasure · 2023-06-22T00:36:03.187Z · LW · GW

"wireheading ... how evolution has addressed it in humans"

It hasn't - that's why people do drugs (including alcohol). What is stopping all humans from wireheading is that all currently available methods work only short term and have negative side effects. The ancestral environment didn't allow for the human kind to self-destruct by wireheading. Maybe peer pressure to not do drugs exists but there is also peer pressure in the other direction.

Comment by martinkunev on Where can one learn deep intuitions about information theory? · 2023-06-20T01:24:28.637Z · LW · GW

Is it worth it to read "Information Theory: A Tutorial Introduction 2nd edition" (James V Stone)?

https://www.amazon.com/Information-Theory-Tutorial-Introduction-2nd/dp/1739672704/ref=sr_1_2

Comment by martinkunev on Why Do People Think Humans Are Stupid? · 2023-04-20T00:08:37.629Z · LW · GW

"There doesn't seem to be anything a sufficiently motivated and resourced intelligent human is incapable of grasping given enough time"

  - a human

 

If there is such a thing, what would a human observe?

Comment by martinkunev on Iterated Distillation and Amplification · 2023-04-02T23:30:28.429Z · LW · GW

"there is some threshold of general capability such that if someone is above this threshold, they can eventually solve any problem that an arbitrarily intelligent system could solve"

This is a very interesting assumption. Is there research or discussions on this?

Comment by martinkunev on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-31T17:30:01.712Z · LW · GW

"discovering that you're wrong about something should, in expectation, reduce your confidence in X"

This logic seems flawed. Suppose X is whether humans go extinctinct. You have an estimate of the distribution of X (for a bernoulli process it would be some probability p). Take the joint distribution of X and the factors on which X depends (p is now a function of those factors). Your best estimate of p is the mean of the joint distribution and the variance measures how uncertain you're about the factors. Discovering that you're wrong about something means becoming more uncertain about some of the factors. This would increase the variance of the joint distribution. I don't see any reason to expect the mean to move in any particular direction.

Or maybe I'm making a mistake. In any case, I'm not convinced.

Comment by martinkunev on What Are You Tracking In Your Head? · 2023-03-28T22:57:00.339Z · LW · GW

When outside, I'm usually tracking location and direction on a mental map. This doesn't seem like a big deal to me but in my experience few people do it. On some occasions I am able to tell which way we need to go while others are confused.

Comment by martinkunev on AI: Practical Advice for the Worried · 2023-03-02T23:44:52.254Z · LW · GW

Given that hardware advancements are very likely going to continue, delaying general AI would favor what Nick Bostrom calls a fast takeoff. This makes me uncertain as to whether delaying general AI is a good strategy.

I expected to read more about actively contributing to AI safety rather than about reactivively adapting to whatever is happening.