Posts

Veedrac's Shortform 2024-05-26T19:50:20.156Z
Post-history is written by the martyrs 2022-04-11T15:45:14.997Z
Optimality is the tiger, and agents are its teeth 2022-04-02T00:46:27.138Z
Moore's Law, AI, and the pace of progress 2021-12-11T03:02:24.558Z

Comments

Comment by Veedrac on When do "brains beat brawn" in Chess? An experiment · 2024-12-17T11:45:32.396Z · LW · GW

LeelaKnightOdds has convincing beaten both Awonder Liang and Anish Giri at 3+2 by large margins, and has an extremely strong record at 5+3 against people who have challenged it.

I think 15+0 and probably also 10+0 would be a relatively easy win for Magnus based on Awonder, a ~150 elo weaker player, taking two draws at 8+3 and a win and a draw at 10+5. At 5+3 I'm not sure because we have so little data at winnable time controls, but wouldn't expect an easy win for either player.

It's also certainly not the case that these few-months-old networks running a somewhat improper algorithm are the best we could build—it's known at minimum that this Leela is tactically weaker than normal and can drop endgame wins, even if humans rarely capitalize on that.

Comment by Veedrac on Optimality is the tiger, and agents are its teeth · 2024-11-28T02:54:49.458Z · LW · GW

Fundamentally, the story was about the failure cases of trying to make capable systems that don't share your values safe by preventing specific means by which its problem solving capabilities express themselves in scary ways. This is different to what you are getting at here, which is having those systems actually operationally share your values. A well aligned system, in the traditional ‘Friendly AI’ sense of alignment, simply won't make the choices that the one in the story did.

Comment by Veedrac on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually" · 2024-09-29T04:56:05.465Z · LW · GW

I was finding it a bit challenging to unpack what you're saying here. I think, after a reread, that you're using ‘slow’ and ‘fast’ in the way I would use ‘soon’ and ‘far away’ (aka. referring to the time it will occur from the present). Is this read about correct?

Comment by Veedrac on Yoav Ravid's Shortform · 2024-09-25T16:31:58.066Z · LW · GW

If ‘Opt into Petrov Day’ was aside something other than a big red ominous button, I would think the obvious answer is that it's a free choice and I'd be positively inclined towards it. Petrov Day is a good thing with good side effects, quite unlike launching nuclear weapons.

It is confusing to me that it is beside a big red ominous button. On the one hand, Petrov's story is about the value of caution. To quote a top comment from an older Petrov Day,

Petrov thought the message looked legit, but noticed there were clues that it wasn't.

On the other hand, risk-taking is good, opting in to good things is good, and if one is taking Petrov Day to mean ‘don't take risks if they look scary’ I think one is taking an almost diametrically wrong message from the story.

All that said, for now I am going to fall for my own criticism and not press the big red ominous button around Petrov Day.

Comment by Veedrac on the Giga Press was a mistake · 2024-08-21T19:53:07.439Z · LW · GW

I'm interested in having Musk-company articles on LessWrong if it can be done while preserving LessWrong norms. I'm a lot less interested in it if it means bringing in sarcasm, name calling, and ungrounded motive-speculation.

Comment by Veedrac on Universal Basic Income and Poverty · 2024-07-27T20:57:04.992Z · LW · GW

if, judging by looking at some economical numbers, poverty already doesn't exist for centuries, why do we feel so poor

Let's not forget that people who read LW, often highly intelligent and having well-paying jobs such as software development

This underlines what I find so incongruous about EY's argument. I think I genuinely felt richer as a child eating free school meals in the UK but going to a nice school and whose parents owned a house than I do as an obscenely-by-my-standards wealthy person in San Francisco. I'm hearing this elaborate theory to explain why social security doesn't work when I have lived through and seen in others clear evidence that it can and it does. If the question “why hasn't a factor-100 increase in productivity felt like a factor-100 increase in productivity?” was levied at my childhood specifically, my response is that actually it felt like exactly that.

By the standards of low earning households my childhood was probably pretty atypical and I don't mean to say there aren't major systemic issues, especially given the number of people locked into bad employment, people with lives destroyed by addiction, people who struggle to navigate economic systems, people trapped in abusive or ineffectual families, etc. etc. etc. I really don't want to present a case just based on my lived experience, even including those I know living various lives under government assistance. But equally I think someone's lived experience of being wealthy in San Francisco and seeing drug addicts on the street is also not seeing an unbiased take of what social security does for poverty.

Comment by Veedrac on Veedrac's Shortform · 2024-05-27T00:36:54.858Z · LW · GW

Eg. a moderately smart person asking it to do something else by trying a few prompts. We're getting better at this for very simple properties but I still consider it unsolved there.

Comment by Veedrac on Veedrac's Shortform · 2024-05-26T19:50:20.475Z · LW · GW

Reply to https://twitter.com/krishnanrohit/status/1794804152444580213, too long for twitter without a subscription so I threw it here, but do please treat it like a twitter comment.

rohit: Which part of [the traditional AI risk view] doesn't seem accounted for here? I admit AI safety is a 'big tent' but there's a reason they're congregated together.


You wrote in your list,

the LLMs might start even setting bad objectives, by errors of omission or commission. this is a consequence of their innards not being the same as people (either hallucinations or just not having world model or misunderstanding the world)

In the context of traditional AI risk views, this misses the argument. Roughly the concern is instead like so:

ASI is by definition very capable of doing things (aka. selecting for outcomes), in at least all the ways collections of humans can. It is both theoretically true and observably the case in reality that when things are selected for, a bunch of other things that aren't that are traded off, and that the stronger something is selected for, the more stuff ends up traded against, incidentally or not.

We should expect any ASI to have world-changing effects, and for those effects to trade off strongly against other things. There is a bunch of stuff we want that we don't want traded off (eg. being alive).

The first problem is that we don't know how to put any preferences into an AI such that it's robust to even trivial selection pressure, not in theory, not in practice on existing models, and certainly not in ways that would apply to arbitrary systems that indirectly contain ML models but aren't constrained by those models’ expressed preferences.

The second problem is that there are a bunch of instrumental goals (not eg. lying, but eg. continuing to have causal effect on the world) that are useful to almost all goals, and that are concrete examples of why an ASI would want to disempower humans. Aka. almost every thing that could plausibly be called an ASI will be effective at doing a thing, and the natural strategies for doing things involve not failing at them in easily-foreseeable ways.

Stuff like lying is not the key issue here. It often comes up because people say ‘why don’t we just ask the AI if it’s going to be bad’ and the answer is basically code for ‘you don’t seem to understand that we are talking about something that is trying to do a thing and is also good at it.’

Similarly for ‘we wouldn't even know why it chooses outcomes, or how it accomplishes them’ — these are problematic because they are yet another reason to rule out simple fixes, not because they are fundamental to the issue. Like, if you understand why a bridge falls down, you can make a targeted fix and solve that problem, and if you don’t know then probably it’s a lot harder. But you can know every line of code of Stockfish (pre-NNUE) and still not have a chance against it, because Stockfish is actively selecting for outcomes and it is better at selecting them than you.

“LLMs have already lied to us” from the traditional AI risk crowd is similarly not about LLM lying being intrinsically scary, it is a yell of “even here you have no idea what you are doing, even here you have these creations you cannot control, so how in the world do you expect any of this to work when the child is smarter than you and it’s actually trying to achieve something?”

Comment by Veedrac on "Deep Learning" Is Function Approximation · 2024-03-25T00:09:22.265Z · LW · GW
Comment by Veedrac on "Deep Learning" Is Function Approximation · 2024-03-22T08:24:33.793Z · LW · GW

It took me a good while reading this to figure out whether it was a deconstruction of tabooing words. I would have felt less so if the post didn't keep replacing terms with ones that are both no less charged and also no more descriptive of the underlying system, and then start drawing conclusions from the resulting terms' aesthetics.

With regards to Yudkowsky's takes, the key thing to keep in mind is that Yudkowsky started down his path by reasoning backwards from properties ASI would have, not from reasoning forward from a particular implementation strategy. The key reason to be concerned that outer optimization doesn't define inner optimization isn't a specific hypothesis about whether some specific strategy with neural networks will have inner optimizers, it's because ASI will by necessity involve active optimization on things, and we want our alignment techniques to have at least any reason to work in that regime at all.

Comment by Veedrac on artifex0's Shortform · 2024-03-14T02:45:10.541Z · LW · GW

There is no ‘the final token’ for weights not at the final layer.

Because that is where all the gradients flow from, and why the dog wags the tail.

Aggregations of things need not be of the same kind as their constituent things? This is a lot like calling an LLM an activation optimizer. While strictly in some sense true of the pieces that make up the training regime, it's also kind of a wild way to talk about things in the context of ascribing motivation to the resulting network.

I think maybe you're intending ‘next token prediction’ to mean something more like ‘represents the data distribution, as opposed to some metric on the output’, but if you are this seems like a rather unclear way of stating it.

Comment by Veedrac on artifex0's Shortform · 2024-03-13T19:09:44.036Z · LW · GW

You're at token i in a non-final layer. Which token's output are you optimizing for? i+1?

By construction a decoder-only transformer is agnostic over what future token it should be informative to within the context limit, except in the sense that it doesn't need to represent detail that will be more cheaply available from future tokens.

As a transformer is also unrolled in the context dimension, the architecture itself is effectively required to be generic both in what information it gathers and where that information is used. Bias towards next token prediction is not so much a consequence of reward in isolation, but of competitive advantage: at position i, the network has an advantage in predicting i+1 over the network at previous locations by having more recent tokens, and an advantage over the network at future tokens by virtue of still needing to predict token i+1. However, if a token is more predictive of some abstract future token than the next token precisely, say it's a name that might be referenced later, one would expect the dominant learnt effect to be non-myopically optimizing for later use in some timestamp-invariant way.

Comment by Veedrac on artifex0's Shortform · 2024-03-13T02:27:11.110Z · LW · GW

If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.

I think you're just fundamentally misunderstanding the backwards pass in an autoregressive transformer here. Only a very tiny portion of the model is exclusively trained on next token prediction. Most of the model is trained on what might be called instead, say, conditioned future informativity.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2024-03-10T20:28:14.922Z · LW · GW

I greatly appreciate the effort in this reply, but I think it's increasingly unclear to me how to make efficient progress on our disagreements, so I'm going to hop.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2024-03-10T19:10:00.955Z · LW · GW

If you say “Indeed it's provable that you can't have a faster algorithm than those O(n^3) and O(n^4) approximations which cover all relevant edge cases accurately” I am quite likely to go on a digression where I try to figure out what proof you're pointing at and why you think it's a fundamental barrier, and it seems now that per a couple of your comments you don't believe it's a fundamental barrier, but at the same time it doesn't feel like any position has been moved, so I'm left rather foggy about where progress has been made.

I think it's very useful that you say

I'm not saying that AI can't develop useful heuristic approximations for the simulation of gemstone-based nano-mechanical machinery operating in ultra-high vacuum. I'm saying that it can't do so as a one-shot inference without any new experimental work

since this seems like a narrower place to scope our conversation. I read this to mean:

  1. You don't know of any in principle barrier to solving this problem,
  2. You believe the solution is underconstrained by available evidence.

I find the second point hard to believe, and don't really see anywhere you have evidenced it.

As a maybe-relevant aside to that, wrt.

You're saying that AI could take the garbage and by mere application of thought turn it into something useful. That's not in line with the actual history of the development of useful AI outputs.

I think you're talking of ‘mere application of thought’ like it's not the distinguishing feature humanity has. I don't care what's ‘in line with the actual history’ of AI, I care what a literal superintelligence could do, and this includes a bunch of possibilities like:

  • Making inhumanly close observation of all existing data,
  • Noticing new, inhumanly-complex regularities in said data,
  • Proving new simplifying regularities from theory,
  • Inventing new algorithms for heuristic simulation,
  • Finding restricted domains where easier regularities hold,
  • Bifurcating problem space and operating over each plausible set,
  • Sending an interesting email to a research lab to get choice high-ROI data.

We can ignore the last one for this conversation. I still don't understand why the others are deemed unreasonable ways of making progress on this task.

I appreciated the comments on time complexity but am skipping it because I don't expect at this point that it lies at the crux.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2024-03-10T02:36:30.895Z · LW · GW

Thanks, I appreciate the attempt to clarify. I do though think there's some fundamental disagreement about what we're arguing over here that's making it less productive than it could be. For example,

The fact that this has been an extremely active area of research for over 80 years with massive real-world implications, and we're no closer to finding such a simplified heuristic.

I think both:

  1. Lack of human progress doesn't necessarily mean the problem is intrinsically unsolvable by advanced AI. Humans often take a bunch of time before proving things.
  2. It seems not at all the case that algorithmic progress isn't happening, so it's hardly a given that we're no closer to a solution unless you first circularly assume that there's no solution to arrive at.

If you're starting out with an argument that we're not there yet, this makes me think more that there's some fundamental disagreement about how we should reason about ASI, more than your belief being backed by a justification that would be convincing to me had only I succeeded at eliciting it. Claiming that a thing is hard is at most a reason not to rule out that it's impossible. It's not a reason on its own to believe that it is impossible.

With regard to complexity,

  • I failed to understand the specific difference with protein folding. Protein folding is NP-hard, which is significantly harder than O(n³).
  • I failed to find the source for the claim that O(n³) or O(n⁴) are optimal. Actually I'm pretty confused how this is even a likely concept; surely if the O(n³) algorithm is widely useful then the O(n⁴) proof can't be that strong of a bound on practical usefulness? So why is this not true of the O(n³) proof as well?

It's maybe true that protein folding is easier to computationally verify solutions to, but first, can you prove this, and second, on what basis are you claiming that existing knowledge is necessarily insufficient to develop better heuristics than the ones we already have? The claim doesn't seem to complete to me.

It's magical thinking to assume that an AI will just one-shot this into existence.

Please note that I've not been making the claim that ASI could necessarily solve this problem. I have been making the claim that the arguments in this post don't usefully support the claim that it can't. It is true that largely on priors I expect it should be able to, but my priors also aren't particularly useful ones to this debate and I have tried to avoid making comments that are dependent on them.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2024-03-05T17:41:55.801Z · LW · GW

And what reason do you have for thinking it can't be usefully approximated in some sufficiently productive domain, that wouldn't also invalidly apply to protein folding? I think it's not useful to just restate that there exist reasons you know of, I'm aiming to actually elicit those arguments here.

Comment by Veedrac on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T02:30:51.901Z · LW · GW

Given Claude is not particularly censored in this regard (in the sense of refusing to discuss the subject), I expect the jailbreak here to only serve as priming.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2024-03-05T02:23:06.230Z · LW · GW

Well yes, nobody thinks that existing techniques suffice to build de-novo self-replicating nano machines, but that means it's not very informative to comment on the fallibility of this or that package or the time complexity of some currently known best approach without grounding in the necessity of that approach.

One has to argue instead based on the fundamental underlying shape of the problem, and saying accurate simulation is O(n⁷) is not particularly more informative to that than saying accurate protein folding is NP. I think if the claim is that you can't make directionally informative predictions via simulation for things meaningfully larger than helium then one is taking the argument beyond where it can be validly applied. If the claim is not that, it would be good to hear it clearly stated.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2024-03-04T10:20:33.894Z · LW · GW

Could you quote or else clearly reference a specific argument from the post you found convincing on that topic?

Comment by Veedrac on Supposing the 1bit LLM paper pans out · 2024-03-03T03:42:43.382Z · LW · GW

Communication overhead won't drop faster than linear.

Comment by Veedrac on Optimality is the tiger, and agents are its teeth · 2024-02-16T08:52:50.687Z · LW · GW

Which is equivalent to saying if you only care about a situation where none of your observations correlate with any of your other observations and none of your actions interact with any of your observations then your observations are valueless. Which is a true but empty statement, and doesn't meaningfully affect whether there is an optimality axis that it's possible to be better on.

Comment by Veedrac on Optimality is the tiger, and agents are its teeth · 2024-01-13T07:26:58.552Z · LW · GW

There is a useful generality axis and a useful optimality axis and you can meaningfully progress along both at the same time. If you think no free lunch theorems disprove this then you are confused about no free lunch theorems.

Comment by Veedrac on The Offense-Defense Balance Rarely Changes · 2023-12-10T21:56:19.438Z · LW · GW

Cybersecurity — this is balanced by being a market question. You invest resources until your wall makes attacks uneconomical, or otherwise economically survivable. This is balanced over time because the defense is ‘people spend time to think about how to write code that isn't wrong.’ In a world where cyber attacks were orders of magnitude more powerful, people would just spend more time making their code and computing infrastructure less wrong. This has happened in the past.

Deaths in conflicts — this is balanced by being a market question. People will only spend so much to protect their regime. This varies, as you see in the graph, by a good few orders of magnitude, but there's little historic reason to think that there should be a clean linear trend over time across the distant past, or that how much people value their regime is proportionate somehow to the absolute military technology of the opposing one.

Genome sequencing biorisk — 8ish years is not a long time to claim nobody is going to use it for highly damaging terror attacks? Most of this graph is irrelevant to that; unaffordable by 100x and unaffordable by 1000x still just equally resolve to it not happening. 8ish years at an affordable price is maybe at best enough for terrorists with limited resources to start catching on that they might want to pay attention.

Comment by Veedrac on Evaluating the historical value misspecification argument · 2023-10-06T00:48:29.948Z · LW · GW

This is a bad analogy. Phoning a human fails dominantly because humans are less smart than the ASI they would be trying to wrangle. Contra, Yudkowsky has even said that were you to bootstrap human intelligence directly, there is a nontrivial shot that the result is good. This difference is load bearing!

This does get to the heart of the disagreement, which I'm going to try to badly tap out on my phone.

The old, MIRI-style framing was essentially: we are going to build an AGI out of parts that are not intrinsically grounded in human values, but rather good abstract reasoning, during execution of which human values will be accurately deduced, and as this is after the point of construction, we hit the challenge of formally specifying what properties we want to preserve without being able to point to those runtime properties at specification.

The newer, contrasting framing is essentially: we are going to bulld an AGI out of parts that already have strong intrinsic, conceptual-level understanding of the values we want them to preserve, and being able to directly point at those values is actually needle-moving towards getting a good outcome. This is hard to do right now, with poor interpretability and steerability of these systems, but is nonetheless a relevant component of a potential solution.

Comment by Veedrac on Fifty Flips · 2023-10-02T06:20:07.940Z · LW · GW

I liked this! The game was plenty interesting and reasonably introduced. It's a fun twist on induction games with the addition of reasoning over uncertainty rather than exactly guessing a rule, though it does have the downside the relatively small number of samples can make the payoff dominated by randomness.

To offer one small piece of constructive advice on the execution, I did wish the flip history autoscrolled to the newest entry.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-01T18:55:16.127Z · LW · GW

I think I implicitly answered you elsewhere, though I'll add a more literal response to your question here.

On a personal level, none of this is relevant to AI risk. Yudkowsky's interest in it seems like more of a byproduct of his reading choices when he was young and impressionable than anything else, which is not reading I shared. Neither he nor I think this is necessary for xrisk scenarios, with me probably being on the more skeptical side, and me believing more in practical impediments that strongly encourage doing the simple things that work, eg. conventional biotech.

Due to this not being a crux and not having the same personal draw towards discussing it, I basically don't think about this when I think about modelling AI risk scenarios. I think about it when it comes up because it's technically interesting. If someone is reasoning about this because they do think it's a crux for their AI risk scenarios, and they came to me for advice, I'd suggest testing that crux before I suggested being more clever about de novo nanotech arguments.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-01T18:24:41.599Z · LW · GW

Rather than focusing on where I disagree with this, I want to emphasize the part where I said that I liked a lot of the discussion if I frame it in my head differently. I think if you opened the Introduction section with the second paragraph of this reply (“In my post I have explained”), rather than first quoting Yudkowsky, you'd set the right expectations going into it. The points you raise are genuinely interesting, and tons of people have worldviews that this would be much more convincing to than Yudkowsky's.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-01T18:09:08.702Z · LW · GW

apart from pointing out the actual physical difficulties in doing the thing

This excludes most of the potential good arguments! If you can show that large areas of the solution space seem physically unrealizable, that's an argument that potentially generalizes to ASI. For example, I think people can suggest good limits on how ASI could and couldn't traverse the galaxy, and trivially rule out threats like ‘the AI crashes the moon into Earth’, because of physical argument.

To hypothesize an argument of this sort that might be persuasive, at least to people able to verify such claims: ‘Synthesis of these chemicals is not energetically feasible at these scales because these bonds take $X energy to form, but it's only feasible to store $Y energy in available bonds. This limits you to a very narrow set of reactions which seems unable to produce the desired state. Thus larger devices are required, absent construction under an external power source.’ I think a similar argument could plausibly exist around object stickiness, though I don't have the chemistry knowledge to give a good framing for how that might look.

There aren't as many arguments once we exclude physical arguments. If you wanted to argue that it was plausibly physically realizable but that strong ASI wouldn't figure it out, I suppose some in-principle argument that it involves solving a computationally intractable challenge in leu of experiment might work, though that seems hard to argue in reality.

It's generally hard to use weaker claims to limit far ASI, because, being by definition qualitatively and quantitatively smarter than us, it can reason about things in ways that we can't. I'm happy to think there might exist important, practically-solvable-in-principle tasks that an ASI fails to solve, but it seems implausible for me to know ahead of time which tasks those are.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-01T01:41:52.892Z · LW · GW

I was claiming that titotal's post doesn't appear to give arguments that directly address whether or not Yudkowsky-style ASI can invent diamondoid nanotech. I don't understand the relevance to my comment. I agree that if you find titotal's argument persuasive then whether it is load bearing is relevant to AI risk concerns, but that's not what my comment is about.

FWIW Yudkowsky frequently says that this is not load bearing, and that much seems obviously true to me also.

Comment by Veedrac on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-30T21:36:53.929Z · LW · GW

I'm not sure how to put this, but while this post is framed as a response to AI risk concerns, those concerns are almost entirely ignored in favor of looking at how plausible it is for near-term human research to achieve it, and only at the end is it connected back to AI risk via a brief aside whose crux is basically that you don't think Yudkowsky-style ASI will exist.

I like a lot of the discussion if I frame it in my head to be about what it is actually arguing for. Taking it as given, it seems instead broadly non-sequiter, as the evidence given basically doesn't relate to resolving the disagreement.

Comment by Veedrac on Open Thread - July 2023 · 2023-08-06T04:38:59.537Z · LW · GW

Has there been any serious study of whether mirror life—life with opposite chemical chirality—poses an existential risk?

Comment by Veedrac on Mech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo · 2023-07-17T12:54:04.945Z · LW · GW

Note that we have built-in spoiler tags.

lorem ipsum dolor sit amet

Comment by Veedrac on Automatic Rate Limiting on LessWrong · 2023-06-24T16:44:50.189Z · LW · GW

Overall this seems like a great general method. I have some critical comments below on the exact implementation, but I want to emphasize upfront that I expect minimal problems in reality, given that, as you mention, LessWrong users are generally both gentle and accurate with downvotes. It is probably not worth making the system more complicated until you see problems in practice.

I don't think I like the -1/-5/-15 recent karma distinction. Consider someone who has reasonable positive reception to their comments, albeit on lower popularity posts, say an average of +2/comment. They then post a negatively received post on a popular topical thread; eg. say they take an unpopular side in a breaking controversy, or even just say something outright rude.

If they make two such comments, they would get -1 net karma in the last 20 posts at -18*2/2-1=-19 karma average of the two posts. They would get -5 at -21 karma average. This distinction seems pretty arbitrary, and the cool-off scale doesn't seem to map to anything.

One solution might be to smooth these numbers out to emphasize modal behavior, like taking the sum of roots instead (and shifting totals accordingly).

Similarly, let's say a user makes most of their posts at a rate of 2/day, again well received on average. They then have a deep conversation in a single thread over the course of a few days, pushing out most of the last 20 posts. Does it make sense to limit their other comments if this one conversation was poorly reviewed?

One possibility I see is for time aggregation to come into play a bit earlier.

Comment by Veedrac on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-10T18:49:53.957Z · LW · GW

This doesn't feel like a constructive way to engage with the zeitgeist here. Obviously Yudkowsky plus most people here disagree with you on this. As such, if you want to engage productively on this point, you should find a place better set up to discuss whether NNs uninformatively dead-end. Two such places are the open thread or a new post where you lay out your basic argument.

Comment by Veedrac on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-01T23:00:25.959Z · LW · GW

I think this is a pretty good and fair roundup, but I want to add as very lazy bit of personal context short of actually explaining my takes:

Both when I read the FOOM debate, and skimming over it again now, in my personal opinion Yudkowsky largely comes off better. Yudkowsky makes a few major mistakes that are clearly visible now, like being dismissive of dumb, scaled, connectionist architectures, but the arguments seem otherwise repairable. Contra, I do not know how to well defend Hanson's position.

I don't state this to claim a winner, and for sure there are people who read the arguments the other way, but only to suggest to the reader, if you have the time, consider taking a look and forming your own opinion.

Comment by Veedrac on Reacts now enabled on 100% of posts, though still just experimenting · 2023-05-29T00:40:05.712Z · LW · GW

I think signalling to someone that they've missed my intended point is most valuable if they are the kind of person to take it constructively, and if they are, I have no wish to be pointing fingers any more accusatorially than the minimum amount to bring that into focus.

I think a neutral reaction is still a plenty adequate signal in the case you mention and I value that it might do less social harm, whereas a harsher reaction is at least for me less universal as I will be disinclined to use it in prosocial interactions.

I'd be hesitant to label as "Critical" pointing out that someone has an invalid argument, and having it implicitly contrasted against "Positive" — it implies they're opposites or antithetical in some way, y'know?

Yes but the choice of word used to describe the category is not a crux. As I was imagining the interface in my head, the sections were not titled at all, and if they were titled I don't think I'd care about the choice.

Comment by Veedrac on Reacts now enabled on 100% of posts, though still just experimenting · 2023-05-28T22:42:12.645Z · LW · GW

Most of my comments on tone were meant to suggest better phrasings or, in the case of ‘Not what I meant’, iconography, not to suggest they were not valuable.

The specific issue with ‘Not what I meant’ is that the icon reads as ‘you missed’ and not ‘we missed’. Communication is a two-way street and the default react should be at least neutral and non-accusatory.

The section titles were meant very broadly and eg. you'd probably want to put both ‘Locally Valid’ and ‘Locally Invalid’ in that section next to each other even though the former is also Positive. To do some word association if it helps,

  • Positive — encouragement, endorsement, good vibes, congratulations
  • Critical — technical commentary, analysis, moderation, validity
  • Informational — personal actions, takes, directions, neutral

but also to be clear, I'm not wedded to this, and there are no doubt many ways to split it.

I clicked both "Disagree" and "Agree" on yours for partial agreement / mixed agreement, but that seems kind of unintuitive.

Partial agreement seems like a valuable icon to me!

Comment by Veedrac on Reacts now enabled on 100% of posts, though still just experimenting · 2023-05-28T15:14:13.385Z · LW · GW

Quick feedback,

  • The icons aren't all obviously interpretable to me
    • Not a crux — looks like a +, rec: ‘crux’ plus ‘🚫’
    • Please elaborate — unclear that it's a request, rec: ‘..?’
    • What's your prediction — rec: ‘
    • Examples, please — rec: ‘ex?’
    • Additional questions — rec: ‘??’
    • Obtuse — rec: remove
  • Some reactions seem tonally unpleasant:
    • Not what I meant — idea is good, but icon is accusatory
    • I already addressed this — icon is good, text could be ‘addressed elsewhere’
    • Muddled — maybe replace with ‘Unclear’?
    • Obtuse — maybe replace with ‘Too Long’?
    • Not worth getting into — feels like a meaner version of Not Planning to Respond
    • Note that I do like some critical reactions as-is, like Too Many Assumptions
  • There are too many to remember easily; perhaps
    • remove some partial redundancies, like Shrug + Seems Borderline?
    • add one-word summaries to the icon box, like [🕑 discussed 4]?
    • make it easier to see descriptions on mobile?
  • I think a top level grouping like this could make sense:
    • Positive ­— eg. Thanks, Important, Exciting, Clear
    • Critical — eg. Taboo, Harsh, Non Sequitur
    • Informational — eg. Will/Won't Reply Later, Agree to This, Shrug
  • There should be a Bikeshed emoji, for comments like this one
Comment by Veedrac on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-29T18:39:17.915Z · LW · GW

The only advantage of a CPU/GPU over an ASIC is that the CPU/GPU is programmable after device creation. If you know what calculation you want to perform you use an ASIC and avoid the enormous inefficiency of the CPU/GPU simulating the actual circuit you want to use

This has a kernel of truth but it is misleading. There are plenty of algorithms that don't naturally map to circuits, because a step of an algorithm in a circuit costs space, whereas a step of an algorithm in a programmable computer costs only those bits required to encode the task. The inefficiency of dynamic decode can be paid for with large enough algorithms. This is most obvious when considering large tasks on very small machines.

It is true that neither GPUs nor CPUs seem particularly pareto optimal for their broad set of tasks, versus a cleverer clean-sheet design, and it is also true that for any given task you could likely specialize a CPU or GPU design for it somewhat easily for at least marginal benefit, but I also think this is not the default way your comment would be interpreted.

Comment by Veedrac on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-29T04:52:48.678Z · LW · GW

A sanity check of a counterintuitive claim can be that the argument to the claim implies things that seem unjustifiable or false. It cannot be that the conclusion of the claim itself is unjustifiable or false, except inasmuch as you are willing to deny the possibility to be convinced of that claim by argument at all.

(To avoid confusion, this is not in response to the latter portion of your comment about general cognition.)

Comment by Veedrac on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-28T14:57:28.802Z · LW · GW

The section you were looking for is titled ‘Synapses’.

https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know#Synapses

Comment by Veedrac on GPTs are Predictors, not Imitators · 2023-04-10T00:04:55.793Z · LW · GW

They typically are uniform, but I think this feels like not the most useful place to be arguing minutia, unless you have a cruxy point underneath I'm not spotting. “The training process for LLMs can optimize for distributional correctness at the expense of sample plausibility, and are functionally different to processes like GANs in this regard” is a clarification with empirically relevant stakes, but I don't know what the stakes are for this digression.

Comment by Veedrac on GPTs are Predictors, not Imitators · 2023-04-09T23:23:21.090Z · LW · GW

The mathematical counterpoint is that this again only holds for sufficiently low entropy completions, which need not be the case, and if you want to make this argument against computronium suns you run into issues earlier than a reasonably defined problem statement does.

The practical counterpoint is that from the perspective of a simulator graded by simulation success, such an improvement might be marginally selected for, because epsilon is bigger than zero, but from the perspective of the actual predictive training dynamics, a policy with a success rate that low is ruthlessly selected against, and the actual policy of selecting the per-token base rate for the hash dominates, because epsilon is smaller than 1/64.

Comment by Veedrac on GPTs are Predictors, not Imitators · 2023-04-09T22:41:51.234Z · LW · GW

This is by construction: I am choosing a task for which one direction is tractable and the other is not. The existence of such tasks follows from standard cryptographic arguments, the specifics of the limiting case are less relevant.

If you want to extrapolate to models strong enough to beat SHA256, you have already conceded EY's point as this is a superhuman task at least relative to the generators of the training data, but anyway there will still exist similar tasks of equal or slightly longer length for which it will hold again because of basic cryptographic arguments, possibly using a different hashing scheme.

Note that this argument requires the text to have sufficiently high entropy for the hash to not be predictable a priori.

Comment by Veedrac on GPTs are Predictors, not Imitators · 2023-04-09T18:26:31.978Z · LW · GW

It is exactly because of the existence of GPT the predictive model, that sampling from GPT is considered simulation; I don't think there's any real tension in the ontology here.

EY gave a tension, or at least a way in which viewing Simulators as a semantic primitive, versus an approximate consequence of a predictive model, is misleading. I'll try to give it again from another angle.

To give the sort of claim worth objecting to, and I think is an easy trap to get caught on even though I don't think the original Simulators post was confused, here is a quote from that post: “GPT doesn’t seem to care which agent it simulates, nor if the scene ends and the agent is effectively destroyed.” Namely, the idea is that a GPT rollout is a stochastic sample of a text generating source, or possibly a set of them in superposition.

Consider again the task of predicting first a cryptographic hash and then the text which hashes to it, or rather the general class of algorithms for which the forward pass (hashing) is tractable for the network and the backwards pass (breaking the hash) is not, for which predicting cryptographic hashes is a limiting case.

If a model rollout was primarily trying to be a superposition of one or more coherent simulations, there is a computationally tractable approach to this task: internally sample a set of phrases, then compute their hashes, then narrow down the subset of sampled hashes as the hash is sampled from, then output the prior text.

Instead, a GPT model will produce a random semantically-meaningless hash and then sample unrelated text. Even if seeded from the algorithm above, backprop will select away from the superposition and towards the distributional, predictive model. This holds even in the case where the GPT has an entropy source that would allow it to be distributionally perfect when rolled out from the start! Backprop will still say no, your goal is prediction, not simulation. As EY says, this is not a GAN.

Again, I don't think the original Simulators post was necessarily confused about any of this, but I also agree with this post that the terminology is imprecise and the differences can be important.

Comment by Veedrac on How it feels to have your mind hacked by an AI · 2023-03-16T14:03:21.509Z · LW · GW

Ah, well it seems to me that this is mostly people being miscalibrated before GPT-3 hit them over the head about it (and to a lesser extent, even then). You should be roughly likely to update in either direction only in expectation over possible observations. Even if you are immensely calibrated, you should still also a priori expect to have shortening updates around releases and lengthening updates around non-releases, since both worlds have nonzero probability.

But if you'd appreciate a tale of over-expectations, my modal timeline gradually grew for a good while after this conversation with gwern (https://twitter.com/gwern/status/1319302204814217220), where I was thinking people were being slower about this than I expected and meta-updating towards the gwern position.

Alas, recent activity has convinced me my original model was right, it just had too small constant factors for ‘how much longer does stuff take in reality than it feels like it should take?’ Most of my timeline-shortening updates since GPT-3 have been like this: “whelp, I guess my modal models weren't wrong, there goes the tail probability I was hoping for.”

Another story would be my update toward alignment conservatism, mostly by updating on the importance of a few fundamental model properties, combined with some empirical evidence being non-pessimal. Pretraining has the powerful property that the model doesn't have influence over its reward, which avoids a bunch of reward hacking incentives, and I didn't update on that properly until I thought it through, though idk of anyone doing anything clever with the insight yet. Alas this is big on a log scale but small on an absolute one.

Comment by Veedrac on How it feels to have your mind hacked by an AI · 2023-03-16T05:59:23.988Z · LW · GW

I'll know how I want to judge it better after I have more data points. I have a page of questions I plan to ask at some point.

With regards to this update specifically, recall both that I thought you thought it would fail the intersection points question when I offered the bet, and that I specifically asked for a reduced-variance version of the bet. Those should tell you something about my probabilities going into this.

Comment by Veedrac on A mechanistic explanation for SolidGoldMagikarp-like tokens in GPT2 · 2023-02-27T11:49:33.445Z · LW · GW

BPEs are one of the simplest schemes for producing a large, roughly-fairly-weighted-by-frequency set of tokens that compresses arbitrary bytes drawn from a written language training dataset. That's about all you need to explain things in ML, typically.

Subword tokenization, the linguistically-guided pre-LLM approach, has a history but is comparatively complex, and I don't think it compresses as well for a given token budget even on fairly normal-looking text.

Comment by Veedrac on Optimality is the tiger, and agents are its teeth · 2023-01-29T05:37:04.793Z · LW · GW

I don't particularly care that people are running GPT-3 code (except inasmuch as it makes ML more profitable), and don't think it helps if we lose focus on what the actual ground-truth concerns are. I want to encourage analysis that gets at deeper similarities than this. 

GPT-3 code does not pose an existential risk, and members of the public couldn't stop it being an existential risk if it was by not using it to help run shell commands anyway, because, if nothing else, GPT-3, ChatGPT and Codex are all public. Beyond the fact GPT-3 is specifically not risky in this regard, it'd be a shame if people primarily took away ‘don't run code from neural networks’, rather than something more sensible like ‘the more powerful models get, the more relevant their nth-order consequences become’. The model in the story used code output because it's an especially convenient tool lying around, but it didn't have to, because there are lots of ways text can influence the world. Code is just particularly quick, accessible, precise, and predictable.