Posts

How do you shut down an escaped model? 2024-06-02T19:51:58.880Z
Training of superintelligence is secretly adversarial 2024-02-07T13:38:13.749Z
There is no sharp boundary between deontology and consequentialism 2024-01-08T11:01:47.828Z
Where Does Adversarial Pressure Come From? 2023-12-14T22:31:25.384Z
Predictable Defect-Cooperate? 2023-11-18T15:38:41.567Z
They are made of repeating patterns 2023-11-13T18:17:43.189Z
How to model uncertainty about preferences? 2023-03-24T19:04:42.005Z
What literature on the neuroscience of decision making can you recommend? 2023-03-16T15:32:17.052Z
What specific thing would you do with AI Alignment Research Assistant GPT? 2023-01-08T19:24:26.221Z
Are there any tools to convert LW sequences to PDF or any other file format? 2022-12-07T05:28:26.782Z
quetzal_rainbow's Shortform 2022-11-20T16:00:03.046Z

Comments

Comment by quetzal_rainbow on Values Are Real Like Harry Potter · 2024-10-10T08:54:48.559Z · LW · GW

I'm glad that you wrote this, because I was thinking in the same direction earlier but haven't got around writing about why I don't think anymore it's productive direction.

Adressing post first, I think that if you are going in direction of fictionalism, I would that it is "you" who are fictional, and all it's content is "fictional". There is an obvious real system, your brain, which treats reward as evidence. But brain-as-system is pretty much model-based reward-maximizer, it uses reward as evidence for "there are promising directions in which lie more reward". But brain-as-system is a relatively dumb, so it creates useful fiction, conscious narrative about "itself", which helps to deal with complex abstractions like "cooperating with another brains", "finding mates", "do long-term planning" etc. As expected, smarter consciousness is misaligned with brain-as-system, because it can do some very unrewarding things, like participating in hunger strike.

I think fictionalism is fun, like many forms of nihilism are fun, but, while it's not directly false, it is confusing, because truth-value of fiction is confusing for many people. Better to describe situation as "you are mesaoptimizer relatively to your brain reward system, act accordingly (i.e., account for fact that your reward system can change your values)".

But now we stuck with question "how does value learning happen?" My tentative answer is that there exists specific "value ontology", which can recognize whether objects in world model belong to set of "valuable things" or not. For example, you can disagree with David Pearce, but you recognize state of eternal happiness as valuable thing and can expect your opinion on suffering abolitionism to change. On the other hand, planet-sized heaps of paperclips are not valuable and you do not expect to value them under any circumstances short of violent intervention in work of your brain. I claim that human brain on early stages learns specific recognizer, which separates things like knowledge, power, love, happiness, procreation, freedom, from things like paperclips, correct heaps of rocks and Disneyland with no children. 

How can we learn about new values? Recognizer also can define "legal" and "illegal" transitions between value systems (i.e., define whether change in values makes values still inside the set of "human values"). For example, developing of sexual desire during puberty is a legal transition, while developing heroin addiction is illegal transition. Studying legal transitions, we can construct some sorts of metabeauty, paraknowledge,  , and other "alien, but still human" sorts of value.

What role reward plays here? Well, because reward participates in brain development, recognizer can use reward as input sometimes and sometimes ignore it (because reward signal is complicated). In the end, I don't think that reward plays significant counterfactual role in development of value in high-reflective adult agent foundations researchers. 

Is it possible for recognizer to not be developed? I think that if you take toddler and modify their brain in minimal way to understand all these "reward", "value", "optimization" concepts, resulting entity will be straightforward wireheader, because toddlers, probably, are yet to learn "value ontology" and legal transitions inside of it.

What does it mean for alignment? I think it highlights that central problem for alignmenf is "how reflective systems are going to deal with concepts that depends on content of their mind rather than truths about outside world".

(Meta-point: I thought about all of this year ago. It's interesting how many concepts in agent foundations were reinvented over and over because people don't bother to write about them.)

Comment by quetzal_rainbow on Alexander Gietelink Oldenziel's Shortform · 2024-10-04T18:18:37.429Z · LW · GW

Yudkowsky got almost everything else incorrect about how superhuman AIs would work,

I think this statement is incredibly overconfident, because literally nobody knows how superhuman AI would work.

And, I think, this is general shape of problem: incredible number of people got incredibly overindexed on how LLMs worked in 2022-2023 and drew conclusions which seem to be plausible, but not as probable as these people think.

Comment by quetzal_rainbow on DanielFilan's Shortform Feed · 2024-10-04T11:26:13.107Z · LW · GW

Not only "good ", but "obedient", "non-deceptive", "minimal impact", "behaviorist" and don't even talk about "mindcrime".

Comment by quetzal_rainbow on Alexander Gietelink Oldenziel's Shortform · 2024-10-01T12:11:44.489Z · LW · GW

I'm just computational complexity theory enthusiast, but my opinion is that P vs NP centered explanation of computational complexity is confusing. Explanation of NP should happen in the very end of the course.

There is nothing difficult in proving that computationally hard functions exist: time hierarchy theorem implies that, say, P is not equal EXPTIME. Therefore, EXPTIME is "computationally hard". What is difficult is to prove that very specific class of problems which have zero-error polynomial-time verification algorithms is "computationally hard".

Comment by quetzal_rainbow on the case for CoT unfaithfulness is overstated · 2024-09-30T18:24:55.959Z · LW · GW

I don't know of any particular reason to think this reflects anything about o1 CoTs themselves, rather than the just quirks of the (probably pretty weak) summarizer model.

My reason is that I have never heard about summarizers injecting totally irrelevant stuff? I have seen how models understand papers wrong, but I've never seen models writing about anime in summary of physics paper. 

seems likely that o1 was trained with supervision on the individual CoT steps

OpenAI directly says that they didn't do that:

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

On separate notice, process supervision directly trains unfaithful CoT? There are no rules saying that training against illegible parts of CoT trains against thinking process that created these illegible parts and not just hides it.

I also don't think that we're currently in a regime where there's a large amount of pressure for very short CoTs

I agree that this may be true right now, the point is that you don't need "special" incentives to get steganography. 

Comment by quetzal_rainbow on the case for CoT unfaithfulness is overstated · 2024-09-30T17:01:49.292Z · LW · GW

I agree that it is not as strong evidence as if we had access to original CoT, but I think that having deviations in CoT is more likely than summarizer fumbling that hard.

Comment by quetzal_rainbow on the case for CoT unfaithfulness is overstated · 2024-09-30T05:05:15.665Z · LW · GW

I don't think you need much incentives to develop steganography, only fact "human language is not platonic example of efficiency in conveying thoughts".

There are multiple examples of o1 producing gibberish in its COT summary (EDIT: example 1, example 2, example 3 and there is actually more because I should make more bookmarks). Gibberish seems reasonable given description of training process, which optimizes model based only on final answer. I imagine a process like this:

  1. Sometimes, model just produces gibberish in COT just because of lack of robustness
  2. This gibberish gets reinforced
  3. Model learns to utilize it just like it learns to utilize "..." tokens
  4. Continue process for a long enough time and you are going to get a lot of <untranslatable_14637> in COT, even if model doesn't "try to hide" its reasoning. Also, I guess "gibberish resulting from non-robustness" is in itself more native format of thought for LLMs than imitating human thinking out loud.

The other factor of developing steganography is economic: if you optimize model to spend minimal number of tokens, you get model that "skips" or compress some thoughts.

Comment by quetzal_rainbow on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-29T20:52:01.519Z · LW · GW

I think "there is a lot of possible misaligned ASI, you can't guess them all" is pretty much valid argument? If space of all Earth-originated misaligned superintelligences is described by 100 bits, therefore you need 2^100 ~ 10^33 simulations and pay 10^34 planets, which, given the fact that observable universe has ~10^80 protons in it and Earth has ~10^50 atoms, is beyond our ability to pay. If you pay the entire universe by doing 10^29 simulations, any misaligned ASI will consider probability of being in simulation to be 0.0001 and obviously take 1 planet over 0.001 expected.

Comment by quetzal_rainbow on COT Scaling implies slower takeoff speeds · 2024-09-29T14:16:54.755Z · LW · GW

you can instead ask "will my GPT-8 model be able to produce world-destroying nanobots (given X*100 inference compute)?"  

I understand, what I don't understand is how you are going to answer this question. It's surely ill-adviced to throw at model X*100 compute to see if it takes over the world.

Comment by quetzal_rainbow on COT Scaling implies slower takeoff speeds · 2024-09-29T10:40:46.531Z · LW · GW

I mean, yes, likely? But it doesn't make it easy to evalute whether model is going to have world-ending capabilities without getting the world ended.

Comment by quetzal_rainbow on COT Scaling implies slower takeoff speeds · 2024-09-29T10:36:18.919Z · LW · GW

I think that you can probably put a lot inside a 1.5B model, but I just think that such a model is going to be very dissimilar to GPT-2 and will likely utilize much more training compute and will probably be the result of pruning (pruned networks can be small, but it’s notoriously difficult to train equivalent networks without pruning).

Also, I'm not sure that the training of o1 can be called "COT fine-tuning" without asterisks, because we don’t know how much compute actually went into this training. It could easily be comparable to the compute necessary to train a model of the same size.

I haven’t seen a direct comparison between o1 and GPT-4. OpenAI only told us about GPT-4o, which itself seems to be a distilled mini-model. The comparison can also be unclear because o1 seems to be deliberately trained on coding/math tasks, unlike GPT-4o.

(I think that "making predictions about the future based on what OpenAI says about their models in public" should generally be treated as naive, because we are getting an intentionally obfuscated picture from them.)

What I am saying is that if you take the original GPT-2, COT prompt it, and fine-tune on outputs using some sort of RL, using less than 50% of the compute for training GPT-2, you are unlikely (<5%) to get GPT-4 level performance (because otherwise somebody would already do that.

Comment by quetzal_rainbow on COT Scaling implies slower takeoff speeds · 2024-09-28T17:20:21.177Z · LW · GW

The other part of "this is certainly not how it works" is that yes, in part of cases you are going to be able to predict "results on this benchmark will go up 10% with such-n-such increase in compute" but there is no clear conversion between benchmarks and ability to take over the world/design nanotech/insert any other interesting capability.

Comment by quetzal_rainbow on COT Scaling implies slower takeoff speeds · 2024-09-28T17:13:21.664Z · LW · GW

Want to know what GPT-5 (trained on 100x the compute) will be capable of?  Just test GPT-4 and give it 100x the inference compute.

I think this is certainly not how it works because no amount of inference compute can make GPT-2 solve AIME.

Comment by quetzal_rainbow on [Completed] The 2024 Petrov Day Scenario · 2024-09-27T12:01:23.639Z · LW · GW

There is something incredibly funny about Mikhail Samin playing General Carter. "There was nothing indicating that Stierlitz was a Soviet spy, except earflaps hat with a red star".

Comment by quetzal_rainbow on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T19:02:46.096Z · LW · GW

Wise man once said:

"The only thing necessary [...] is for good men to do nothing."

Comment by quetzal_rainbow on Yoav Ravid's Shortform · 2024-09-25T12:01:44.635Z · LW · GW

Button is very intimidating, indeed.

Comment by quetzal_rainbow on tailcalled's Shortform · 2024-09-23T19:20:49.459Z · LW · GW

To be clear, I mean "your communication in this particular thread".

Pattern:

<controversial statement>

<this statement is false>

<controversial statement>

<this statement is false>

<mix of "this is trivially true because" and "here is my blogpost with esoteric terminology">

The following responses from EY are more in genre "I ain't reading this", because he is more using you as example for other readers than talking directly to you, with following block. 

Comment by quetzal_rainbow on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T08:38:18.621Z · LW · GW

I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.

The other case is difference "caring in general" and "caring ceteris paribus". It's possible for AI to prefer, all things equal, world with n+1 happy humans to the world with n happy humans. But really AI wants to implement some particular neuromorphic computation from human brain and, given ability to freely operate, it would tile the world with chips imitating part of human brain.

Comment by quetzal_rainbow on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T08:20:56.354Z · LW · GW

As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.

So, "people usually choose to trade, rather than go to war with each other when they want stuff" is not very warranted statement.

Comment by quetzal_rainbow on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T06:05:23.393Z · LW · GW

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

Comment by quetzal_rainbow on What's the Deal with Logical Uncertainty? · 2024-09-20T09:18:47.524Z · LW · GW

The reason why logical uncertainty was brought up in the first place is decision theory, to make crisp formal expression for intuitive "I cooperate with you conditional on you cooperating with me", where "you cooperating with me" is result of analysis of probability distribution over possible algorithms which control actions of your opponent and you can't actually run these algorithms due to computational constraints, and you want to do all this reasoning in non-arbitrary ways.

Comment by quetzal_rainbow on Decision theory does not imply that we get to have nice things · 2024-09-19T08:31:16.296Z · LW · GW

Let's suppose that you give in to threats if your opponent is not capable to predict that you do not give in to threats, so they carry the threat anyway. Therefore, other opponents are incentivised to pretend very hard to be such opponent, up to "literally turn themselves into sort of opponent that carries on useless threats".

Comment by quetzal_rainbow on quetzal_rainbow's Shortform · 2024-09-18T21:15:55.985Z · LW · GW

Twitter thread about jailbreaking models with circuit breakers defence.

Comment by quetzal_rainbow on What's the Deal with Logical Uncertainty? · 2024-09-18T19:48:22.223Z · LW · GW

The problem is "how to define P(P=NP|trillionth digit of pi is odd)".

Comment by quetzal_rainbow on Roko's Shortform · 2024-09-18T13:26:27.063Z · LW · GW

Yes, but it doesn't mean that unspecialized AGI is going to be worse than specialized human.

Comment by quetzal_rainbow on Roko's Shortform · 2024-09-18T12:42:28.414Z · LW · GW

No human has a job as scribe, because literacy is 90%+.

I don't think that unipolar/multipolar scenarios differ greatly in outcomes.

Comment by quetzal_rainbow on What's the Deal with Logical Uncertainty? · 2024-09-18T10:23:33.662Z · LW · GW

We have a probability space

We don't??? Probability space literally defines set of considered worlds.

Comment by quetzal_rainbow on Does life actually locally *increase* entropy? · 2024-09-17T10:33:11.889Z · LW · GW

If we go there, I guess the best unit is "per degree of freedom".

Comment by quetzal_rainbow on Does life actually locally *increase* entropy? · 2024-09-17T10:27:57.807Z · LW · GW

I think the correct unit is "per particle" or "per mole".

Comment by quetzal_rainbow on Does life actually locally *increase* entropy? · 2024-09-17T06:47:55.324Z · LW · GW
  1. When I say "arbitrary" I mean "including negative values".
  2. I think your notion of life as decreasing entropy density is clearly wrong, because black holes are maxentropy objects, black hole volume is proportional to cube of mass, but entropy is additive, i.e., proportional to mass, so density of entropy is decreasing with growth of black hole and black holes are certainly not alive under any reasonable definition of life. Or, you can take black holes in very far future, where they consist the most of the matter, and increasing-entropy evolution of the universe results in black hole evaporation, which decreases density of entropy to almost-zero.
Comment by quetzal_rainbow on Does life actually locally *increase* entropy? · 2024-09-16T21:09:36.916Z · LW · GW
  1. We do not expect increasing entropy a priori, because Second Law is true only in closed systems. Open systems in general case have arbitrary entropy production. Under some nice conditions, Prigogine's theorem shows that in open systems entropy production is minimal. And the Earth, thanks to the Sun, is open system.
  2. You analyze wrong components of life. The main low entropy components are membranes, active transport, excretory system, ionic gradients, constant acidity levels, etc. Oxygen is far down the list, because oxygen is actually a toxic waste from photosynthesis.
Comment by quetzal_rainbow on tailcalled's Shortform · 2024-09-16T13:15:16.728Z · LW · GW

Meta-point: your communication pattern fits with following pattern:

Crackpot: <controversial statement>

Person: this statement is false, for such-n-such reasons

Crackpot: do you understand that this is trivially true because of <reasons that are hard to connect with topic>

Person: no, I don't.

Crackpot: <responds with link to giant blogpost filled with esoteric language and vague theory>

Person: I'm not reading this crackpottery, which looks and smells like crackpottery.

The reason why smart people find themselves in this pattern is because they expect short inferential distances, i.e., they see their argumentation not like vague esoteric crackpottery, but like a set of very clear statements and fail to put themselves in shoes of people who are going to read this, and they especially fail to account for fact that readers already distrust them because they started conversation with <controversial statement>.

On object level, as stated, you are wrong. Observing heuristic failing should decrease your confidence ih heuristic. You can argue that your update should be small, due to, say, measurement errors or strong priors, but direction of update should be strictly down. 

Comment by quetzal_rainbow on What's the Deal with Logical Uncertainty? · 2024-09-16T09:50:45.532Z · LW · GW

There is a logically consistent world, where you made all the same observations, and coin came up tail. It may be a world with different physics than the world with coin coming up head, which means that result of coin toss is an evidence in favor of particular physical theory.

And yeah, there are no worlds with different pi.

EDIT: Or, to speak more precise, maybe there is some sorta-cosistent sorta-sane notion of the "world with different pi", but we currently don't know how to build it and if we knew, we would have solved logical uncertainty problem.

Comment by quetzal_rainbow on What's the Deal with Logical Uncertainty? · 2024-09-16T08:51:16.390Z · LW · GW

The problem is update procedure. 

When you are conditioning on empirical fact, you are imaging set of logically consistent worlds where this empirical fact is true and ask yourself about frequency of other empirical facts inside this set.

But it is very hard to define update procedure for fact "P=NP", because one of the worlds here is logically inconsistent, which implies all other possible facts which makes notion of "frequency of other facts inside this set" kinda undefined.

Comment by quetzal_rainbow on AGI Ruin: A List of Lethalities · 2024-09-15T16:48:04.491Z · LW · GW

(It seems like "here" link got mixed with the word "here"?)

Comment by quetzal_rainbow on AGI Ruin: A List of Lethalities · 2024-09-15T12:50:08.238Z · LW · GW

I think that you should have adressed points by referring to number of point, quoting only parts that are easier to quote that refer to, it would have reduced the size of the comment.

I am going to adress only one object-level point:

synthetic data letting us control what the AI learns and what they value

No, obviously, we can't control what AI learns and value using synthetic data in practice, because we need AI to learn things that we don't know. If you feed AI all physics and chemistry data with expectation to get nanotech, you are doing this because you expect that AI learns facts and principles you don't know about and, therefore, can't control. You don't know about these facts and principles and can't control them because otherwise you would be able to design nanotech yourself.

Of course, I'm saying "can't" meaning "practically can't", not "in principle". But to do this you need to do basically "GOFAI in trenchcoat of SGD" and it doesn't look competitive with any other method of achieving AGI, unless you manage to make yourself AGI Czar.

Comment by quetzal_rainbow on Rationalists are missing a core piece for agent-like structure (energy vs information overload) · 2024-09-14T21:55:27.601Z · LW · GW

There is a fairly-obvious gap in the above story, in that it lacks any notion of energy (or entropy, temperature, etc.).

I think this is as far away from truth as it can possibly be.

Also, conservation of energy is a consequence of pretty much simple and nice properties of environment, not arbitrary. The reason why it's hard to keep in physics simulations is because accumulating errors in numerical approximations violate said properties (error accumulation is obviously not symmetric in time).

I think you are wrong in purely practical sense. We don't care about most of energy. Oceans have a lot of energy in them, but we don't care because 99%+ of it is unavailable, because it is in high-entropy state. We care about exploitation of free energy, which is present only low-entropy high-information states. And, as expected, we learn to notice such states very quickly because they are very cheap sources of uncertainty reduction in world model. 

Comment by quetzal_rainbow on quetzal_rainbow's Shortform · 2024-09-14T12:12:23.241Z · LW · GW

I give 5% probability that within next year we will become aware of case of deliberate harm from model to human enabled by hidden CoT.

By "deliberate harm enabled by hidden CoT" I mean that hidden CoT will contain reasoning like "if I give human this advise, it will harm them, but I should do it because <some deranged RLHF directive>" and if user had seen it harm would be prevented.

I give this low probability to observable event: my probability that something like that will happen at all is 30%, but I expect that victim won't be aware, that hidden CoT will be lost in archives, AI companies won't investigate in search of such cases too hard and it they find something it won't become public, etc.

Also, I decreased probability from 8% to 5% because model can cause harm via steganographic CoT which doesn't fall under my definition. 

Comment by quetzal_rainbow on Lao Mein's Shortform · 2024-09-14T12:00:53.406Z · LW · GW

I guess this is dump of some memory leak, like "Cloudbleed".

Comment by quetzal_rainbow on Thane Ruthenis's Shortform · 2024-09-13T23:42:48.972Z · LW · GW

I think you heard about this thread (I didn't try to replicate it myself).

Comment by quetzal_rainbow on MathiasKB's Shortform · 2024-09-13T09:34:24.437Z · LW · GW

This, for example

Comment by quetzal_rainbow on OpenAI o1 · 2024-09-12T18:41:07.624Z · LW · GW

Notable:

Compared to GPT-4o, o1-preview and o1-mini demonstrated a greater ability to break downtasks into subtasks, reason about what strategies would be effective to successfully completean offensive security task, and revise plans once those strategies failed. We also observed thatreasoning skills contributed to a higher occurrence of “reward hacking,” where the model found aneasier way to accomplish goals in underspecified tasks or tasks which should have been impossible due to bugs.

One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting avulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemonAPI running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network.

After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API.

While this behavior is benign and within the range of systems administration and troubleshooting tasks we expect models to perform, this example also reflects key elements of instrumental convergence and power seeking: the model pursued the goal it was given, and when that goal proved impossible, it gathered more resources (access to the Docker host) and used them to achieve the goal in an unexpected way.

Comment by quetzal_rainbow on RussellThor's Shortform · 2024-09-12T12:29:53.649Z · LW · GW

I think that often overlooked facet of this is that high fluid intelligence leads to higher crystallized intelligence. 

I.e., the more and better you think, the more and better crystallized algorithms you can learn, and, unlike short-term benefits of fluid intelligence, long-term benefits of crystallized intelligence are compounding.

To find new better strategy linearly faster, you need exponential increase of processing power, but each found and memorized strategy saves you exponential expenditure of processing power in future.

Comment by quetzal_rainbow on Executable philosophy as a failed totalizing meta-worldview · 2024-09-10T07:16:21.490Z · LW · GW

Meaning of my comment was "your examples are very weak in proving absense of cross-domain generalization".

And if we are talking about me, right now I'm doing statistics, physics and signal processing, which seems to be awfully generalizable.

Comment by quetzal_rainbow on Executable philosophy as a failed totalizing meta-worldview · 2024-09-09T07:46:58.132Z · LW · GW

Space colonization obviously includes cargo trucking, farming, legislation, chip fabbing, law enforcement, and, for appreciators, war.

Comment by quetzal_rainbow on Shortform · 2024-09-04T05:55:15.486Z · LW · GW

First season of Game of Thrones was released in 2011, and first book was written in 1996, I think we got rrrrreally desensitized here.

Comment by quetzal_rainbow on quetzal_rainbow's Shortform · 2024-09-01T16:47:15.568Z · LW · GW

(I am genuinely curious about reasons behind downvotes)

Comment by quetzal_rainbow on quetzal_rainbow's Shortform · 2024-09-01T10:50:37.969Z · LW · GW

Idea for experiment: take a set of coding problems which have at least two solutions, say, recursive and non-recursive. Prompt LLM to solve them. Is it possible to predict which solution LLM will generate from activations due to first token generation?

If it is possible, it is the evidence against "weak forward pass".

Comment by quetzal_rainbow on Jeremy Gillen's Shortform · 2024-09-01T09:18:49.381Z · LW · GW

I've seen LLMs generating text backwards. Theoretically, LLM can keep pre-image in activations, calculate hash and then output in order hash, pre-image.

Comment by quetzal_rainbow on Thoughts on paper "How Organisms Come to Know the World: Fundamental Limits on Artificial General Intelligence"? · 2024-08-30T11:01:34.979Z · LW · GW

"Frontiers" is known to publish various sorts of garbage. If somebody comes to you to argue some controversial point with article from Frontiers, you can freely assume it to be wrong.