Posts

Good News, Everyone! 2023-03-25T13:48:22.499Z

Comments

Comment by jbash on o3 · 2024-12-21T16:25:10.519Z · LW · GW

Not to say it's a nothingburger, of course. But I'm not feeling the AGI here.

These math and coding benchmarks are so narrow that I'm not sure how anybody could treat them as saying anything about "AGI". LLMs haven't even tried to be actually general.

How close is "the model" to passing the Woz test (go into a strange house, locate the kitchen, and make a cup of coffee, implicitly without damaging or disrupting things)? If you don't think the kinesthetic parts of robotics count as part of "intelligence" (and why not?), then could it interactively direct a dumb but dextrous robot to do that?

Can it design a nontrivial, useful physical mechanism that does a novel task effectively and can be built efficiently? Produce usable, physically accurate drawings of it? Actually make it, or at least provide a good enough design that it can have it made? Diagnose problems with it? Improve the design based on observing how the actual device works?

Can it look at somebody else's mechanical design and form a reasonably reliable opinion about whether it'll work?

Even in the coding domain, can it build and deploy an entire software stack offering a meaningful service on a real server without assistance?

Can it start an actual business and run it profitably over the long term? Or at least take a good shot at it? Or do anything else that involves integrating multiple domains of competence to flexibly pursue possibly-somewhat-fuzzily-defined goals over a long time in an imperfectly known and changing environment?

Can it learn from experience and mistakes in actual use, without the hobbling training-versus-inference distinction? How quickly and flexibly can it do that?

When it schemes, are its schemes realistically feasible? Can it tell when it's being conned, and how? Can it recognize an obvious setup like "copy this file to another directory to escape containment"?

Can it successfully persuade people to do specific, relatively complicated things (as opposed to making transparently unworkable hypothetical plans to persuade them)?

Comment by jbash on TheManxLoiner's Shortform · 2024-12-20T18:07:27.852Z · LW · GW

There is an option for readers to hide names. It's in the account preferences. The names don't show up unless you roll over them. I use it, to supplement my long-cultivated habit of always trying to read the content before the author name on every site[1].

As for anonymous posts, I don't agree with your blanket dismissal. I've seen them work against groupthink on some forums (while often at the same time increasing the number of low-value posts you have to wade through). Admittedly Less Wrong doesn't seem to have too much of a groupthink problem[2]. Anyway, there could always be an option for readers to hide anonymous posts.


  1. Actually I'm not sure I had to cultivate it. Back in the days of Usenet, I had to learn to actually ever look at poster's names to begin with. I do not think that I am normal in this. ↩︎

  2. ... which actually surprises me because at least some people do seem to buy into the "karma" gamification. ↩︎

Comment by jbash on Filled Cupcakes · 2024-11-26T13:41:16.715Z · LW · GW

Stretching your mouth wide is part of the fun!

Comment by jbash on Decorated pedestrian tunnels · 2024-11-25T16:00:18.989Z · LW · GW

If you're going to do something that huge, why not put the cars underground? I suppose it would be more expensive, but adding any extensive tunnel system at all to an existing built up area seems likely to be prohibitively expensive, tremendously disruptive. and, at least until the other two are fixed, politically impossible. So why not go for the more attractive impossibility?

Comment by jbash on AI #91: Deep Thinking · 2024-11-21T16:25:44.312Z · LW · GW

Why so small? If you’re going to offer wall mounts and charge $1000, why not a TV-sized device that is also actually a television, or at least a full computer monitor? What makes this not want to simply be a Macintosh? I don’t fully ‘get it.’

You don't necessarily have a TV-sized area of wall available to mount your thermostat control, near where you most often find yourself wanting to change your thermostat setting. Nor do you necessarily want giant obtrusive screens all over the place.

And you don't often want to have to navigate a huge tree of menus on a general-purpose computer to adjust the music that's playing.

Comment by jbash on AI #91: Deep Thinking · 2024-11-21T16:21:39.819Z · LW · GW

“Aren’t we going to miss meaning?”

 

I've yet to hear anybody who brings this up explain, comprehensibly, what this "meaning" they're worried about actually is. Honestly I'm about 95 percent convinced that nobody using the word actually has any real idea what it means to them, and more like 99 percent sure that no two of them agree.

Comment by jbash on OpenAI Email Archives (from Musk v. Altman and OpenAI blog) · 2024-11-16T17:11:29.575Z · LW · GW

I seem to have gotten a "Why?" on this.

The reason is that checking things yourself is a really, really basic, essential standard of discourse[1]. Errors propagate, and the only way to avoid them propagating is not to propagate them.

If this was created using some standard LLM UI, it would have come with some boilerplate "don't use this without checking it" warning[2]. But it was used without checking it... with another "don't use without checking" warning. By whatever logic allows that, the next person should be able to use the material, including quoting or summarizing it, without checking either, so long as they include their own warning. The warnings should be able to keep propagating forever.

... but the real consequences of that are a game of telphone:

  1. An error can get propagated until somebody forgets the warning, or just plain doesn't feel like including the warning, and then you have false claims of fact circulating with no warning at all. Or the warning deteriorates into "sources claim that", or "there are rumors that", or something equally vague that can't be checked.
  2. Even if the warning doesn't get lost or removed, tracing back to sources gets harder with each step in the chain.
  3. Many readers will end up remembering whatever they took out of the material, including that it came from a "careful" source (because, hey, they were careful to remind you to check up on them)... but forget that they were told it hadn't been checked, or underestimate the importance of that.
  4. If multiple people propagate an error, people start seeing it in more than one "independent" source, which really makes them start to think it must be true. It can become "common knowledge", at least in some circles, and those circles can be surprisingly large.

That pollution of common knowledge is the big problem.

The pollution tends to be even worse because whatever factoid or quote will often get "simplified", or "summarized", or stripped of context, or "punched up" at each step. That mutation is itself exacerbated by people not checking references, because if you check references at least you'll often end up mutating the version from a step or two back, instead of building even higher on top of the latest round of errors.

All of this is especially likely to happen when "personalities" or politics are involved. And even more likely to happen when people feel a sense of urgency about "getting this out there as soon as possible". Everybody in the chain is going to feel that same sense of urgency.

I have seen situations like that created very intentionally in certain political debates (on multiple different topics, all unrelated to anything Less Wrong generally cares about). You get deep chains of references that don't quite support what they're claimed to support, spawning "widely known facts" that eventually, if you do the work, turn out to be exaggerations of admitted wild guesses from people who really didn't have any information at all. People will even intentionally add links to the chain to give others plausible deniability. I don't think there's anything intentional here, but there's a reason that some people do it intentionally. It works. And you can get away with it if the local culture isn't demanding rigorous care and checking up at every step.

You can also see this sort of thing as an attempt to claim social prestige for a minimal contribution. After all, it would have been possible to just post the link, or post the link and suggest that everybody get their AI to summarize it. But the main issue is that spreading unverified rumors causes widespread epistemic harm.


  1. The standard for the reader should still be "don't be sure the references support this unless you check them", which actually means that when the reader becomes a writer, that reader/writer should actually not only have checked their own references, but also checked the references of their references, before publishing anything. ↩︎

  2. Perhaps excusable since nobody actually knows how to make the LLM get it right reliably. ↩︎

Comment by jbash on OpenAI Email Archives (from Musk v. Altman and OpenAI blog) · 2024-11-16T14:24:23.865Z · LW · GW

I used AI assistance to generate this, which might have introduced errors.

Resulting in a strong downvote and, honestly, outright anger on my part.

Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1]

If other people have to check it before they quote it, why is it OK for you not to check it before you post it?

Comment by jbash on Proposing the Conditional AI Safety Treaty (linkpost TIME) · 2024-11-15T14:42:27.208Z · LW · GW

Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

So race to the brink and hope you can actually stop when you get there?

Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.

How, exactly?

Comment by jbash on Heresies in the Shadow of the Sequences · 2024-11-14T20:15:36.837Z · LW · GW

Non-causal decision theories are not necessary for A.G.I. design.

I'll call that and raise you "No decision theory of any kind, causal or otherwise, will either play any important explicit role in, or have any important architectural effect over, the actual design of either the first AGI(s), or any subsequent AGI(s) that aren't specifically intended to make the point that it's possible to use decision theory".

Comment by jbash on Buck's Shortform · 2024-11-12T14:14:17.153Z · LW · GW

Computer security, to prevent powerful third parties from stealing model weights and using them in bad ways.

By far the most important risk isn't that they'll steal them. It's that they will be fully authorized to misuse them. No security measure can prevent that.

Comment by jbash on The Evals Gap · 2024-11-11T17:12:33.037Z · LW · GW

Development and interpretation of evals is complicated

Proper elicitation is an unsolved research question

... and yet...

Closing the evals gap is possible

Why are you sure that effective "evals" can exist even in principle?

I think I'm seeing a "we really want this, therefore it must be possible" shift here.

Comment by jbash on evhub's Shortform · 2024-11-10T14:36:51.519Z · LW · GW

I don't have much trouble with you working with the US military. I'm more worried about the ties to Peter Thiel.

Comment by jbash on GPT-4o Can In Some Cases Solve Moderately Complicated Captchas · 2024-11-09T15:45:32.513Z · LW · GW

CAPTCHAs have "adversarial perturbations"? Is that in the sense of "things not visible to humans, but specifically adversarial to deep learning networks"? I thought they just had a bunch of random noise and weird ad hoc patterns thrown over them.

Anyway, CAPTCHAs can't die soon enough. Although the fact that they persist in the face of multiple commercial services offering to solve 1000 for a dollar doesn't give me much hope...

Comment by jbash on Force Sequential Output with SCP? · 2024-11-09T15:32:00.558Z · LW · GW

Using scp to stdout looks weird to me no matter what. Why not

ssh -n host cat /path/to/file | weird-aws-stuff

... but do you really want to copy everything twice? Why not run weird-aws-stuff on the remote host itself?

Comment by jbash on Should CA, TX, OK, and LA merge into a giant swing state, just for elections? · 2024-11-07T15:04:59.486Z · LW · GW

To prevent this, there must be a provision that once signed by all 4 states, the compact can't be repealed by any state until after the next election.

It's not obvious that state legislatures have the authority, under their own constitutions, to bind themselves that way. Especially not across their own election cycles.

Comment by jbash on Update on the Mysterious Trump Buyers on Polymarket · 2024-11-06T01:33:16.793Z · LW · GW

Thirty million dollars is a lot of money, but there are plenty of smart rich people who don't mind taking risks. So, once the identity and (apparent) motives of the Trump whale were revealed, why didn't a handful of them mop up the free EV?

Well, first I think you're right to say "a handful". My (limited but nonzero) experience of "sufficiently rich" people who made their money in "normal" ways, as opposed to by speculating on crypto or whatever, is that they're too busy to invest a lot of time in playing this kind of market personally, especially if they have to pay enough attention to play it intelligently. They're not very likely to employ anybody else to play for them either. Many or most of them will see as the whole thing as basically an arcane, maybe somewhat disreputable game. So the available pool is likely smaller than you might think.

That conjecture is at least to some degree supported by the fact that nobody, or not enough people, stepped in when the whole thing started. Nothing prevented the market from moving so far to begin with. It may not have been as certain what was going on then, but things looked weird enough that you'd expect a fair number of people to decide that crazy money was likely at work, and step in to try to take some of it... if enough such people were actually available.

In any case, whether when the whole thing started, after public understanding was reasonably complete, or anywhere along the way, the way I think you'd like to make your profit on the market being miscalibrated would be to buy in, wait for the correction, and then sell out... before the question resolved and before unrelated new information came in to move the price in some other way.

But would be hard to do that. All this is happening potentially very close to resolution time, or at least to functional resolution time. The market is obviously thin enough that single traders can move it, and new information is coming in all the time, and the already-priced-in old information isn't very strong and therefore can't be expected to "hold" the price very solidly, and you have to worry about who may be competing with you to take the same value, and you may be questioning how rational traders in general are[1].

So you can't be sure you'll get your correction in time to sell out; you have a really good chance of being stuck holding hold your position through resolution. If "markets can remain irrational longer than you can remain solvent", then they can also stay irrational for long enough that trading becomes moot.

If you have to hold through resolution, then you do still believe you have positive expected value, but it's really uncertain expected value. After all, you believe the underlying question is 50-50, even if one of those 50s would pay you more than the other would lose you. And you have at best limited chance to hedge. So you have to have risk tolerance high enough that, for most people, it'd be in the "recreational gambling" range rather than the "uncertain investment" range. The amount of money that any given (sane) person wants to put at risk definitely goes down under that much uncertainty, and probably goes down a lot. So you start to need more than a "handful" of people.

Also, don't forget the point somebody made the other day about taxes. Unless you're a regular who plays many different questions in such volume that you expect to offset your winnings with losses, you're going to stand to "win" a double-digit percentage less than you stand to "lose", whether on selling off your position or on collecting after resolution. Correcting 60-40 to 50-50 may just plain not be profitable even if you collect.

There are probably other sources of friction, too.


  1. I'd bet at least a recreational amount of money that players in betting markets are sharply more pro-Trump politically than, say, the general voting population, and that would be expected to skew their judgement, and therefore the market, unless almost all of them were superhuman or nearly so. And when you're seeing the market so easily moved away from expert opinion... ↩︎

Comment by jbash on Update on the Mysterious Trump Buyers on Polymarket · 2024-11-05T18:02:33.320Z · LW · GW

Can't this only be judged in retrospect, and over a decent sample size?

The model that makes you hope for accuracy from the market is that it aggregates the information, including non-public information, available to a large number of people who are doing their best to maximize profits in a reasonable VNM-ish rational way.

In this case, everybody seems pretty sure that the price is where it is because of the actions of a single person who's dumped in a very large amount of money relative to the float. It seems likely that that person has done this despite having no access to any important non-public information about the actual election. For one thing, they've said that they're dumping all of their liquidity into bets on Trump. Not just all the money they already have allocated to semi-recreational betting, or even all the money they have allocated to speculative long shots in general, but their entire personal liquidity. That suggests a degree of certainty that almost no plausible non-public information could actually justify.

Not only that, but apparently they've done it in a way calculated to maximally move the price, which is the opposite of what you'd expect a profit maximizer to want to do given their ongoing buying and their (I think) stated and (definitely at this point) evidenced intention to hold until the market resolves.

If the model is that makes you expect accuracy to begin with is known to be violated, it seems reasonable to assume that the market is out of whack.

Sure, it's possible that the market just happens to be giving an accurate probability for some reason unrelated to how it's "supposed" to work, but that sort of speculation would take a lot of evidence to establish confidently.

I'm assuming that by "every other prediction source" you mean everything other than prediction/betting markets

Well, yes. I would expect that if you successfully mess up Polymarket, you have actually messed up "The Betting Market" as a whole. If there's a large spread between any two specific operators, that really is free money for somebody, especially if that person is already set up to deal on both.

Comment by jbash on Update on the Mysterious Trump Buyers on Polymarket · 2024-11-04T22:45:43.787Z · LW · GW

Another way to look at that is that during a relatively long stretch of time when people most needed an advance prediction, the market was out of whack, it got even more out of whack as the event approached, and a couple of days before the final resolution, it partially corrected. The headline question is sitting at 59.7 to 40.5 as I write this, and that's still way of line with every other prediction source.

Comment by jbash on Update on the Mysterious Trump Buyers on Polymarket · 2024-11-04T20:51:05.456Z · LW · GW

... and the signifcance of the bets has been to show that prediction markets, at least as presently constituted, aren't very useful for actually predicting events of any real significance. It's too easy for whackos to move the market to completely insane prices that don't reflect any realistic probability assessment. Being rich is definitely not incompatible with being irrational, and being inclined to gamble is probably negatively correlated with being well calibrated.

Comment by jbash on Prediction markets and Taxes · 2024-11-01T23:21:38.828Z · LW · GW

Yeah, I got that, but it's a very counterintuitive way to describe probability, especially the negative thing.

Comment by jbash on Prediction markets and Taxes · 2024-11-01T20:48:35.647Z · LW · GW

I'll be using American odds.

Where do all these bizarre notations come from?

That one seems particularly off the wall.

Comment by jbash on Dentistry, Oral Surgeons, and the Inefficiency of Small Markets · 2024-11-01T18:00:59.428Z · LW · GW

Any time I am faced with this kind of shocking inefficiency, I ask myself a simple question: why was no one doing this before?

Well, as I understand it, the general belief is that...

  1. The "scaled up" practices are relatively unpleasant to work in, and make people (who went through a lot of education expecting to get "prestige" jobs, mind you...) feel deprived of agency, deprived of choices about the when-where-and-how of their work, and just generally devalued.

  2. The "non business savvy" people who actually generate the value believe, probably entirely correctly, that somewhere between most and actually-more-than-all of the increased income from that kind of scale-up will end up going to MBAs (or to the one or two theoretically-practitioners who actually own of a "medium-sized" practice), and not to them[1].

  3. Healthcare facilities operated by private equity are widely believed, both based on industry rumor and based on actual measurement, to reduce quality of care, and people don't like to be forced to do a bad job if they don't have to?

Why would you voluntarily make your daily life actually unpleasant just to increase an already high income that you'll probably have less time to enjoy anyway?


  1. ... and it may not drive prices down for the consumer as much as you might think, either, because many consumers have limited price sensitivity as well as very limited ability to evaluate the quality of care. ↩︎

Comment by jbash on Habryka's Shortform Feed · 2024-10-30T01:48:10.762Z · LW · GW

You may want higher density, but I don't think you can say that I want high density at the expense of legibility.

It takes a lot to make me notice layout, and I rarely notice fonts at all... unless they're too small. I'm not as young as I used to be. This change made me think I must have zoomed the browser two sizes smaller. The size contrast is so massive that I have to actually zoom the page to read comfortably when I get to the comment section. It's noticeably annoying, to the point of breaking concentration.

I've mostly switched to RSS for Less Wrong[1]. I don't see your fonts at all any more, unless I click through on an article. The usual reason I click through is to read the comments (occasionally to check out the quick takes and popular comments that don't show up on RSS). So the comments being inaccessible is doubly bad.

My browser is Firefox on Fedora Linux, and I use a 40 inch 4K monitor (most of whose real estate is wasted by almost every Web site). I usually install most of the available font packages, and it says it's rendering this text in "Gill Sans Nova Medium".


  1. My big reason for going to RSS was to mitigate the content prioritization system. I want to skim every headline, or at least every headline over some minimum threshold of "good". On the other hand, I don't want to have to look at any old headlines twice to see the new ones. I'm really minimally interested in either the software's or the other users' opinions of which material I should want to see. RSS makes it easier to get a simple chronological view; the built-in chronological view is weird and hard to navigate to. I really feel like I'm having to fight the site to see what I want to see. ↩︎

Comment by jbash on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King! · 2024-10-26T23:43:23.495Z · LW · GW

If you read the entire story, you'll find that the Demon King (who, by the way, is called "she" a whole bunch of times througout most of the story), never has any intention or expectation of taking the fortress. That's probably the only "military secret" that really matters. And she doesn't sell that secret.

Comment by jbash on The Mask Comes Off: At What Price? · 2024-10-23T16:19:47.568Z · LW · GW

... assuming the values you want are learnable and "convergeable" upon. "Alignment" doesn't even necessarily have a coherent meaning.

Actual humans aren't "aligned" with each other, and they may not be consistent enough that you can say they're always "aligned" with themselves. Most humans' values seem to drive them toward vaguely similar behavior in many ways... albeit with lots of very dramatic exceptions. How they articulate their values and "justify" that behavior varies even more widely than the behavior itself. Humans are frequently willing to have wars and commit various atrocities to fight against legitimately human values other than their own. Yet humans have the advantage of starting with a lot of biological commonality.

The idea that there's some shared set of values that a machine can learn that will make everybody even largely happy seems, um, naive. Even the idea that it can learn one person's values, or be engineered to try, seems really optimistic.

Anyway, even if the approach did work, that would just mean that "its own ideas" were that it had to learn about and implement your (or somebody's?) values, and also that its ideas about how to do that are sound. You still have to get that right before the first time it becomes uncontrollable. One chance, no matter how you slice it.

Comment by jbash on The Mask Comes Off: At What Price? · 2024-10-22T15:58:01.989Z · LW · GW

I'm not necessarily going to argue with your characterization of how the "AI safety" field views the world. I've noticed myself that people say "maintaining human control" pretty much interchangeably with "alignment", and use both of those pretty much interchangeably with "safety". And all of the above have their own definition problems.

I think that's one of several reasons that the "AI safety" field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.

Comment by jbash on The Mask Comes Off: At What Price? · 2024-10-22T02:49:05.878Z · LW · GW

Taxes enforced by whom?

Well, that's where the "safe" part comes in, isn't it?

I think a fair number of people would say that ASI/AGI can't be called "safe" if it's willing to wage war to physically take over the world on behalf of its owners, or to go around breaking laws all the time, or to thwart whatever institutions are supposed to make and enforce the laws. I'm pretty sure that even OpenAI's (present) "safety" department would have an issue if ChatGPT started saying stuff like "Sam Altman is Eternal Tax-Exempt God-King".

Personally, I go further than that. I'm not sure about "basic" AGI, but I'm pretty confident that very powerful ASI, the kind that would be capable of really total world domination, can't be called "safe" if it leaves really decisive power over anything in the hands of humans, individually or collectively, directly or via institutions. To be safe, it has to enforce its own ideas about how things should go. Otherwise the humans it empowers are probably going to send things south irretrievably fairly soon, and if they don't do so very soon they always still could, and you can't call that safe.

Yeah, that means you get exactly one chance to get "its own ideas" right, and no, I don't think that success is likely. I don't think it's technically likely to be able to "align" it to any particular set of values. I also don't think people or insitutions would make good choices about what values to give it even if they could. AND I don't think anybody can prevent it from getting built for very long. I put more hope in it being survivably unsafe (maybe because it just doesn't usually happen to care to do anything to/with humans), or on intelligence just not being that powerful, or whatever. Or even in it just luckily happening to at least do something less boring or annoying than paperclipping the universe or mass torture or whatever.

Comment by jbash on The Mask Comes Off: At What Price? · 2024-10-22T00:59:47.856Z · LW · GW
  1. OpenAI charges headfirst to AGI, and succeeds in building it safely. [...] The world transforms, and OpenAI goes from previously unprofitable due to reinvestment to an immensely profitable company.

You need a case where OpenAI successfully builds safe AGI, which may even go on to build safe ASI, and the world gets transformed... but OpenAI's profit stream is nonexistent, effectively valueless, or captures a much smaller fraction than you'd think of whatever AGI or ASI produces.

Business profits (or businesses) might not be a thing at all in a sufficiently transformed world, and it's definitely not clear that preserving them is part of being safe.

In fact, a radical change in allocative institutions like ownership is probably the best case, because it makes no sense in the long term to allocate a huge share of the world's resources and production to people who happened to own some stock when Things Changed(TM). In a transformed-except-corporate-ownership-stays-the-same world, I don't see any reason such lottery winners' portion wouldn't increase asymptotically toward 100 percent, with nobody else getting anything at all.

Radical change is also a likely case[1]. If an economy gets completely restructured in a really fundamental way, it's strange if the allocation system doesn't also change. That's never happened before.

Even without an overtly revolutionary restructuring, I kind of doubt "OpenAI owns everything" would fly. Maybe corporate ownership would stay exactly the same, but there'd be a 99.999995 percent tax rate.


  1. Contingent on the perhaps unlikely safe and transformative parts coming to pass. ↩︎

Comment by jbash on LLM Psychometrics and Prompt-Induced Psychopathy · 2024-10-18T23:17:29.234Z · LW · GW

Random reactions--

  • It looks like you're really assigning scores to the personae the models present, not to the models themselves.

    The models as opposed to the personae may or may not actually have anything that can reasonably be interpreted as "native" levels of psychopathy. It's kind of hard to tell whether something is, say, prepared to manipulate you, when there's no strong reason to think it particularly cares about having any particular effect on you. But if they do have native levels--

    • It doesn't "feel" to me as though the human-oriented questions on the LSRP are the right sorts of ways to find out. The questions may suit the masks, but not the shoggoth.

    • I feel even less as though "no system prompt" would elicit the native level, rather than some default persona's level.

  • By asking a model to play any role to begin with, you're directly asking it to be deceptive. If you tell it it's a human bicycle mechanic named Sally, in fact it still is an AI system that doesn't have a job other than to complete text or converse or whatever. It's just going along with you and playing the role of Sally.

    When you see the model acting as psychopathic as it "expects" that Sally would be, you're actually demonstrating that the models can easily be prompted to in some sense cheat on psychopathy inventories. Well, effectively cheat, anyway. It's not obvious to me that the models-in-themselves have any "true beliefs" about who or what they are that aren't dependent on context, so the question of whether they're being deceptive may be harder than it looks.

    But they seem to have at least some capacity to "intentionally" "fake" targeted levels of psychopathy.

  • By training a model to take on roles given in system prompts in the first place, its creators are intentionally teaching it to honor requests to be deceptive.

    Just blithely taking on whatever roles you think fit the conversation you're in sounds kind of psychopathic, actually.

  • By "safety training" a model, its creators are causing it to color its answers according to what people want to hear, which I would think would probably make it more, not less, prone to deception and manipulation in general. It could actually inculcate something like psychopathy. And it could easily fail to carry over to actions, rather than words, once you get an agentic system.

    I'm still not convinced the whole approach has any real value even for the LLMs we have now, let alone for whatever (probably architecturally different) systems end up achieving AGI or ASI.

    All that goes double for teaching it to be "likable".

  • Since any given model can be asked to play any role, it might be more interesting to try to figure out which of all the possible roles it might be "willing" to assume would make it maximally deceptive.

Comment by jbash on How much I'm paying for AI productivity software (and the future of AI use) · 2024-10-11T18:15:13.335Z · LW · GW

I could spend a lot more than $1000/month, because cloud services are a non-starter.

It seems to me that if you're going to use something like this to its real potential, it has to be integrated into your habitual ways of doing things. You have to use it all the time. It's too jarring to have to worry about whether it's trustworthy, or to "code switch" because you can't use it for some reason[1].

I can't imagine integrating any of those things into my normal, day to day routine unless the content of what I was doing were, in normal course, exposed only to me. Which in practice means locally hosted. Which would be prohibitively expensive even if it were possible.


  1. This is actually the same reason I rarely customize applications very much. It's too jarring when I get onto another machine and have to use the vanilla version. ↩︎

Comment by jbash on What constitutes an infohazard? · 2024-10-08T23:51:59.960Z · LW · GW

I do not want to put forth an idea that could possibly have a detrimental future consequence-i.e. basilisk.

I would suggest you find somebody who's not susceptible to basilisks, or at least not susceptible to basilisks of your particular kind, and bounce it off of them.

For example, I don't believe there's a significant chance that any AIs operating in our physics ever will run, or even be able to run, any really meaningful number of simulations containing conscious beings with experiences closely resembling the real world. And I think that acausal trade is silly nonsense. And not only do I not want to fill the whole future light cone with the maximum possible number of humans or human analogs, but I actively dislike the idea. I've had a lot of time to think about those issues, and have read many "arguments for". I'm haven't bought any of it and I don't ever expect to buy any of it.

So I can reasonably be treated as immune to any basilisks that rely on those ideas.

Of course, if your idea is along those lines, I'm also likely to tell it's silly even though others might not see it that way. But I could probably make at least an informed guess as to what such people might buy into.

Note, by the way, that the famous Roko's basilisk didn't actually cause much of a stir, and the claims that it was a big issue seem to have come from somebody with an axe to grind.

I am afraid to a certain extent that thinking of the theory was already enough and it's too late. Perhaps an AI exists already and it already knows my thoughts in realtime.

To know your thoughts in real time, it would have to be smart enough to (a) correctly guess your thoughts based on limited information, or (b) secretly build and deploy some kind of apparatus that let it actually read your thoughts.

(a) is probably completely impossible, period. Even if it is possible, it definitely requires an essentially godlike level of intelligence. (b) still requires the AI to be very smart. And they both imply a lot of knowledge about how humans think.

I submit that any AI that could do either (a) or (b) would long ago have come up with your idea on its own, and could probably come up with any number of similar ideas any time it wanted to.

It doesn't make sense to worry that you could have leaked anything to some kind of godlike entity just by thinking about it.

Comment by jbash on Nathan Helm-Burger's Shortform · 2024-10-03T21:40:16.380Z · LW · GW

you've likely already lost the reins of the future

"Having the reins of the future" is a non-goal.

Also, you don't "have the reins of the future" now, never have had, and almost certainly never will.

some rogue AIs are building an antimatter-powered laser in the Oort Cloud to boost solar-sail-powered von Neumann probes to nearby solar systems.... you lost.

It's true that I'd probably count anybody building a giant self-replicating swarm to try to tile the entire light cone with any particular thing as a loss. Nonetheless, there are lots of people who seem to want to do that, and I'm not seeing how their doing it is any better than some AI doing it.

Comment by jbash on shminux's Shortform · 2024-09-29T13:18:14.508Z · LW · GW

It's set in about 1995...

Comment by jbash on The Existential Dread of Being a Powerful AI System · 2024-09-27T14:56:14.093Z · LW · GW

Why would you think that an AGI/ASI, even if conscious, would have an emotional makeup or motivational structure in any way similar to that of a human? Why should it care about any of this?

Comment by jbash on Mira Murati leaves OpenAI/ OpenAI to remove non-profit control · 2024-09-26T13:28:21.724Z · LW · GW

Pretty obvious that Altman really, really should have stayed fired, even, maybe especially, if it totally blew up OpenAI.

Comment by jbash on Counting arguments provide no evidence for AI doom · 2024-09-22T01:08:11.804Z · LW · GW

This inspired me to give it the sestina prompt from the Sandman ("a sestina about silence, using the key words dark, ragged, never, screaming, fire, kiss"). It came back with correct sestina form, except for an error in the envoi. The output even seemed like better poetry than I've gotten from LLMs in the past, although that's not saying much and it probably benefited a lot from the fact that the meter in the sestina is basically free.

I had a similar-but-different problem in getting it to fix the envoi, and its last response sounded almost frustrated. It gave an answer that relaxed one of the less agreed-upon constraints, and more or less claimed that that it wasn't possible to do better... so sort of like the throwing-up-the-hands that you got. Yet the repair it needed to do was pretty minor compared to what it had already achieved.

It actually felt to me like its problem in doing the repairs was that it was distracting itself. As the dialog went on, the context was getting cluttered up with all of its sycophantic apologies for mistakes and repetitive explanations and "summaries" of the rules and how its attempts did or did not meet them... and I got this kind of intuitive impression that that was interfering with actually solving the problem.

I was sure getting lost in all of its boilerplate, anyway.

https://chatgpt.com/share/66ef6afe-4130-8011-b7dd-89c3bc7c2c03

Comment by jbash on AI #80: Never Have I Ever · 2024-09-11T01:14:14.653Z · LW · GW

Nvidia’s products are rather obviously superior

CUDA seems to be superior to ROCm... and has a big installed-base and third-party tooling advantage. It's not obvious, to me anyway, that NVidia's actual silicon is better at all.

... but NVidia is all about doing anything it can to avoid CUDA programs running on non-NVidia hardware, even if NVidia's own code isn't used anywhere. Furthermore, if NVidia is like all the tech companies I saw during my 40 year corporate career, it's probably also playing all kinds of subtle, hard-to-prove games to sabotage the wide adoption of any good hardware-agnostic APIs.

Comment by jbash on AI #80: Never Have I Ever · 2024-09-11T01:09:01.376Z · LW · GW

We caution against purely technical interpretations of privacy such as “the data never leaves the device.” Meredith Whittaker argues that on-device fraud detection normalizes always-on surveillance and that the infrastructure can be repurposed for more oppressive purposes. That said, technical innovations can definitely help.

I really do not know what you are expecting. On-device calculation using existing data and other data you choose to store only, the current template, is more privacy protecting than existing technologies.

She's expecting, or at least asking, that certain things not be done on or off of the device, and that the distinction between on-device and off-device not be made excessively central to that choice.

If an outsider can access your device, they can always use their own AI to analyze the same data.

The experience that's probably framing her thoughts here is Apple's proposal to search through photos on people's phones, and flag "suspicious" ones. The argument was that the photos would never leave your device... but that doesn't really matter, because the results would have. And even if they had not, any photo that generated a false positive would have become basically unusable, with the phone refusing to do anything with it, or maybe even outright deleting it.

Similarly, a system that tries to detect fraud against you can easily be repurposed to detect fraud by you. To act on that detection, it has to report you to somebody or restrict what you can do. On-device processing of whatever kind can still be used against the interests of the owner of the device.

Suppose that there was a debate around the privacy implications of some on-device scanning that actually acted only in the user's interest, but that involved some privacy concerns. Further suppose that the fact that it was on-device was used as an argument that there wasn't a privacy problem. The general zeitgeist might absorb the idea that "on-device" was the same as "privacy-preserving". "On device good, off device bad".

A later transition from "in your interest" to "against your interest" could easily get obscured in any debate, buried under insistence that "It's on-device".

Yes, some people with real influence really, truly are that dumb, even when they're paying close attention. And the broad sweep of opinion tends to come from people who aren't much paying attention to begin with. It happens all the time in complicated policy arguments.

Comment by jbash on Why Swiss watches and Taylor Swift are AGI-proof · 2024-09-06T20:50:26.001Z · LW · GW

Turns out that this is today's SMBC comic. Which gets extra points points for the line "Humans are a group-level psychiatric catastrophe!"

https://www.smbc-comics.com/comic/scarcity

Comment by jbash on Why Swiss watches and Taylor Swift are AGI-proof · 2024-09-06T14:42:19.410Z · LW · GW

If the value of Taylor Swift concerts comes mostly from interactions between the fans, is Swift herself essential to it?

Comment by jbash on Why Swiss watches and Taylor Swift are AGI-proof · 2024-09-06T14:41:18.787Z · LW · GW

this is a fair critique of AIs making everyone losing their jobs.

I have never heard anybody push the claim that there wouldn't be niche prestige jobs that got their whole value from being done by humans, so what's actually being critiqued?

... although there is some question about whether that kind of thing can actually sustain the existence of a meaningful money economy (in which humans are participants, anyway). It's especially subject to question in a world being run by ASIs that may not be inclined to permit it for one reason or another. It's hard to charge for something when your customers aren't dealing in money.

It also seems like most of the jobs that might be preserved are nonessential. Not sure what that means.

Comment by jbash on What happens if you present 500 people with an argument that AI is risky? · 2024-09-04T23:45:10.477Z · LW · GW

If humanity develops very advanced AI technology, how likely do you think it is that this causes humanity to go extinct or be substantially disempowered?

I would find this difficult to answer, because I don't know what you mean by "substantially disempowered".

I'd find it especially hard to understand because you present it as a "peer risk" to extinction. I'd take that as a hint that whatever you meant by "substantially disempowered" was Really Bad(TM). Yet there are a lot of things that could reasonably be described as "substantially disempowered", but don't seem particularly bad to me... and definitely not bad on an extinction level. So I'd be lost as to how substantial it had to be, or in what way, or just in general as to what you were getting at it with it.

Comment by jbash on Ruby's Quick Takes · 2024-08-30T21:49:13.441Z · LW · GW
Comment by jbash on Shortform · 2024-08-25T15:13:15.234Z · LW · GW

Your snake mnemonic is not the standard one and gives an incorrect, inverted result. Was that intentional?

This is a coral snake, which is dangerously venomous:

Eastern Coral Snake | National Geographic

 

This is a king snake, which is totally harmless unless you're a vole or something:

Comment by jbash on A primer on the current state of longevity research · 2024-08-23T20:18:52.354Z · LW · GW

But does it work at all?

It seems counterintutive that there would be one single thing called "aging" that would happen everywhere in the body at once; have a single cause or even a small set of causes; be measurable by a single "biological age" number; and be slowed, arrested, or reversed by a single intervention... especially an intervention that didn't have a million huge side effects. In fact, it seems counterintutive that that would even be approximately true. Biology sucks because everything interacts in ways that aren't required to have any pattern, and are still inconvenient even when they do have patterns.

How do you even do a meaningful experiment? For example, isn't NAD+ right smack in the middle of the whole cell energy cycle? So if you do something to NAD+, aren't you likely to have a huge number of really diverse effects that may or may not be related to aging? If you do that and your endpoint is just life span, how do you tease out useful knowledge? Maybe the sirtuins would have extended life span, but for the unrelated toxic effects of all that NAD+. Or maybe the sirtuins are totally irrelevant to what's actually going on.

The same sort of thing applies to any wholesale messing with histones and gene expression, via sirtuins or however else. You're changing everything at once when you do that.

Reprogramming too: you mentioned different kinds of cells responding differently. It seems really un-biological to expect that difference to be limited to how fast the cells "come around", or the effects to be simply understandable by measuring any manageable number of things or building any manageable mental model.

And there are so many other interactions and complications even outside of the results of experiments. OK, senescent cells and inflammation are needed for wound healing... but I'm pretty sure I don't heal even as fast at over 60 as I did at say 20, even with lots more senescent cells available and more background inflammation. So something else must be going on.

And then there are the side effects, even if something "works". For example, isn't having extra/better telomeres a big convenience if you want to grow up to be a tumor? Especially convenient if you're part of a human and may have decades to accumulate other tricks, as opposed to part of a mouse and lucky to have a year. How do you measure the total effect of something like that in any way other than full-on long-term observed lifespan and healthspan in actual humans?

And and and...

Comment by jbash on Would you benefit from, or object to, a page with LW users' reacts? · 2024-08-20T19:02:24.153Z · LW · GW

I often bounce through my comment history as a relatively quick way of re-finding discussions I've commented in. Just today, I wanted to re-find a discussion I'd reacted to, and realized that I couldn't do that and would have to find it another way.

Comment by jbash on Decision Theory in Space · 2024-08-18T15:05:32.495Z · LW · GW

There's surely some point or joke in this, but I'm just going "Wat?". This disturbs me because not many things go completely over my head. Maybe I'm not decision theory literate enough (or I guess maybe I'm not Star Wars literate enough).

Is Vader supposed to have yet another decision theory? And what's the whole thing with the competing orders supposed to be about?

Comment by jbash on It's time for a self-reproducing machine · 2024-08-08T15:09:35.586Z · LW · GW

Well, I seem to be talking to someone who knows more about alloys than I do.

Maybe. But what I know tends to be very patchy, depending on what rabbit holes I happen to have gone down at various times.

I figure there's a need for Neodymium Iron Boron, for motor cores,

I hadn't thought about magnetics at all, or anything exotic. I was just talking about basic steel.

Unless I'm mixed up, NdFeB is for permanent magnets. You might not need any permanent magnets. If you do, I believe also you need a big solenoid, possibly in an oven, to magnetize them. Said solenoid needs a metric butt-ton of current when it's on, by the way, although it probably doesn't have to be on for long.

Inductor and electromagnet cores, including for motors, are made out of "electrical steel", which is typically cut to shape in thin plates, then laminated with some kind of lacquer or something for insulation against eddy currents. You can also use sintered ferrite powders, which come in a bewildering array of formulations, but if you're just worried about motors, you'd probably only really need one or two.

Those plates are an example of a generalized issue, by the way. I think those plates are probably normally hot die cut in a roll process. In fact, I suspect they're normally made in a plant that can immediately drop the offcuts, probably still hot to save energy on reheating them, into equipment that rerolls them back into more stock. Or maybe they even roll them out in their final shapes directly from a melt somehow.

You could mill every single plate in a motor core out of sheet stock on a milling machine... but it would take eternity, go through a ton of tooling, and generate a lot of waste (probably in the form of oily swarf mixed in with oily swarf of every other thing you process in the shop).

There are lots of processes like that, where stuff that you could "hand make" with the "mother machines" isn't made that way in practice, because specialized machines, often colocated with other specialized machines in large specialized plants, are qualitatively more efficient in terms of time, energy, waste, consumables, you name it. Stuff that's hot is kept hot until it needs to be cool (and often you try to cool it by putting as much as possible of the heat back into an input stream). Steps are colocated to avoid reheats. Waste products are recycled or used for something else, and the plant for "something else" is often also colocated.

It's really hard to compete with that kind of efficiency. Most of the individual specialized machines are a lot more than a cubic meter, too. You already mentioned that temperature-sensitive processes tend to have optimal sizes, which are often really big.

Can you afford to use 10 times the energy and produce 10 times the waste of "traditional" processes? If not, you may need a lot of specialized equipment, more than you could fit in a reasonable-sized self-replicating module.

Cast Iron in the form of near-net-shape castings for machine frames,

All castings are imported, right?

By the way, you need nichrome or Kanthal or something like that for the heating elements in your furnace. Which isn't really different from the copper wire you use, but it's another item.

some kind of hardenable tool steel for everything else.

Here I break down. I suspect, but do not know, that if you only think in terms of making usable parts, you could at least get away only with "mild steel", "alloy steel", "tool steel", and perhaps "spring steel". Or maybe with only three or even two of those. I could be wrong, though, because there are tons of weird issues when you start to think about the actual stresses a part will experience and the environment it'll be in.

If you do want to reduce the number of alloys to the absolute minimum, you probably also have to be able to be very sophisticated about your heat treating. I'd be pretty shocked, for instance, if a high-quality bearing ball is actually in the same condition all the way through. You'd want to be able to case-harden things and carburize things and do other stuff I don't even know about. And, by the way, where are you quenching the stuff?

Even if you can use ingenuity to only absolutely need a relatively small number of alloys, on a similar theme to what I said above, there's efficiency to worry about. The reason there are so many standard alloys isn't necessarily that you can't substitute X for Y, but that X costs three or four or ten times as much as Y for the specific application that Y is optimized for. Costs come from the ingredients, from their purification, from their processing when the alloy is formulated, and from post-processing (how hard is the stuff on your tooling, how much wear and tear does it put on the heating elements in your furnace, how much energy do you use, how much coolant do you go through, etc).

Comment by jbash on It's time for a self-reproducing machine · 2024-08-08T00:54:15.205Z · LW · GW

How many different alloys are you expecting to have in stock in there?

HSS is apparently pretty finicky to work with. As I understand it most hobbyists content themselves with O1 or A1 or whatever, which wear a lot faster. But it's true they'll cut themselves.

There's probably a reason, above and beyond cost, why the "bodies" of machines tend to be cast iron rather than tool steel. And for the truly staggering number of standardized alloys that are out there in general.