Posts

[Cosmology Talks] New Probability Axioms Could Fix Cosmology's Multiverse (Partially) - Sylvia Wenmackers 2024-04-14T01:26:38.515Z
All About Concave and Convex Agents 2024-03-24T21:37:17.922Z
Do not delete your misaligned AGI. 2024-03-24T21:37:07.724Z
Elon files grave charges against OpenAI 2024-03-01T17:42:13.963Z
Verifiable private execution of machine learning models with Risc0? 2023-10-25T00:44:48.643Z
Eleuther releases Llemma: An Open Language Model For Mathematics 2023-10-17T20:03:45.419Z
A thought about the constraints of debtlessness in online communities 2023-10-07T21:26:44.480Z
The point of a game is not to win, and you shouldn't even pretend that it is 2023-09-28T15:54:27.990Z
Cohabitive Games so Far 2023-09-28T15:41:27.986Z
Do agents with (mutually known) identical utility functions but irreconcilable knowledge sometimes fight? 2023-08-23T08:13:05.631Z
Apparently, of the 195 Million the DoD allocated in University Research Funding Awards in 2022, more than half of them concerned AI or compute hardware research 2023-07-07T01:20:20.079Z
Using Claude to convert dialog transcripts into great posts? 2023-06-21T20:19:44.403Z
The Gom Jabbar scene from Dune is essentially a short film about what Rationality is for 2023-03-22T08:33:38.321Z
Will chat logs and other records of our lives be maintained indefinitely by the advertising industry? 2022-11-29T00:30:46.415Z
[Video] How having Fast Fourier Transforms sooner could have helped with Nuclear Disarmament - Veritaserum 2022-11-03T21:04:35.839Z
The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter 2022-11-03T06:47:56.376Z
I just watched the Open C3 Subcommittee Hearing on Unidentified Aerial Phenomena (UFOs). Here's a succinct summary and commentary + some background 2022-05-18T04:15:11.681Z
Alex Tabarrok advocates for crowdfunding systems with *Refund Bonuses*. I think this might be a natural occurrence of a money pump against Causal Decision Theory pledgers 2022-03-14T07:27:06.955Z
Grabby Aliens could be Good, could be Bad 2022-03-07T01:24:43.769Z
Would (myopic) general public good producers significantly accelerate the development of AGI? 2022-03-02T23:47:09.322Z
Are our community grouphouses typically rented, or owned? 2022-03-02T03:36:58.251Z
We need a theory of anthropic measure binding 2021-12-30T07:22:34.288Z
Venture Granters, The VCs of public goods, incentivizing good dreams 2021-12-17T08:57:30.858Z
Is progress in ML-assisted theorem-proving beneficial? 2021-09-28T01:54:37.820Z
Auckland, New Zealand – ACX Meetups Everywhere 2021 2021-08-23T08:49:53.187Z
Violent Unraveling: Suicidal Majoritarianism 2021-07-29T09:29:05.182Z
We should probably buy ADA? 2021-05-24T23:58:05.395Z
Deepmind has made a general inductor ("Making sense of sensory input") 2021-02-02T02:54:26.404Z
In software engineering, what are the upper limits of Language-Based Security? 2020-12-27T05:50:46.772Z
The Fermi Paradox has not been dissolved - James Fodor 2020-12-12T23:18:32.081Z
Propinquity Cities So Far 2020-11-16T23:12:52.065Z
Shouldn't there be a Chinese translation of Human Compatible? 2020-10-09T08:47:55.760Z
Should some variant of longtermism identify as a religion? 2020-09-11T05:02:43.740Z
Design thoughts for building a better kind of social space with many webs of trust 2020-09-06T02:08:54.766Z
Investment is a useful societal mechanism for getting new things made. Stock trading shares some functionality with investment, but seems very very inefficient, at that? 2020-08-24T01:18:19.808Z
misc raw responses to a tract of Critical Rationalism 2020-08-14T11:53:10.634Z
A speculative incentive design: self-determined price commitments as a way of averting monopoly 2020-04-28T07:44:52.440Z
MakoYass's Shortform 2020-04-19T00:12:46.448Z
Being right isn't enough. Confidence is very important. 2020-04-07T01:10:52.517Z
Thoughts about Dr Stone and Mythology 2020-02-25T01:51:29.519Z
When would an agent do something different as a result of believing the many worlds theory? 2019-12-15T01:02:40.952Z
What do the Charter Cities Institute likely mean when they refer to long term problems with the use of eminent domain? 2019-12-08T00:53:44.933Z
Mako's Notes from Skeptoid's 13 Hour 13th Birthday Stream 2019-10-06T09:43:32.464Z
The Transparent Society: A radical transformation that we should probably undergo 2019-09-03T02:27:21.498Z
Lana Wachowski is doing a new Matrix movie 2019-08-21T00:47:40.521Z
Prokaryote Multiverse. An argument that potential simulators do not have significantly more complex physics than ours 2019-08-18T04:22:53.879Z
Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening? 2019-08-05T00:12:14.630Z
Will autonomous cars be more economical/efficient as shared urban transit than busses or trains, and by how much? What's some good research on this? 2019-07-31T00:16:59.415Z
If I knew how to make an omohundru optimizer, would I be able to do anything good with that knowledge? 2019-07-12T01:40:48.999Z
In physical eschatology, is Aestivation a sound strategy? 2019-06-17T07:27:31.527Z

Comments

Comment by mako yass (MakoYass) on Cooperation is optimal, with weaker agents too  -  tldr · 2024-05-08T22:13:56.401Z · LW · GW

:( that isn't what cooperation would look like. The gazelles can reject a deal that would lead to their extinction (they have better alternatives) and impose a deal that would benefit both species.

Cooperation isn't purely submissive compliance.

Comment by mako yass (MakoYass) on Cohabitive Games so Far · 2024-05-08T20:30:40.649Z · LW · GW

(I'm aware of most of these games)

I made it pretty clear in the article that it isn't about purely cooperative games. (Though I wonder if they'd be easier to adapt. Cooperative + complications seems closer to the character of a cohabitive game than competitive + non-zero-sum score goals do...)

Gloomhaven seems, and describes itself as being a cooperative game. What competitive elements are you referring to?

The third tier is worth talking about. I think these sorts of games might, if you played them enough, teach the same skills, but I think you'd have to play them for a long time. My expectation is that basically all of them end with a ranking? as you said, first, second, third. The ranking isn't scored, (ie, we aren't told that being second is half as good as being first) so there's not much clarity about how much players should value them, which is one obstacle to learning. Rankings also keep the game zero sum on net, and zero sum dynamics between first and second or between first and the alliance have the focus of your attention most of the time. The fewer or the more limited mutually beneficial deals are, the less social learning there will be. Zero sum dynamics need to be discussed in cohabitive games, but the games will support more efficient learning if they're reduced.
And there really are a lot of people who think that the game that humans are playing in the real world is zero sum, that all real games are zero sum, so, I also suspect that these sorts of games might never teach the skill, because to teach the skill you have to show them a way out of that mindset, and all they do is reinforce it.

competitive [...] not usually permanent alliances are critical to victory: Diplomacy, Twilight Imperium (all of them), Cosmic Encounter

This category is really interesting, because the alliances expire and have to be remade multiple times per game, and I've been meaning to play some games from this category, but they're also a lot more foggy, the agreements are of poor quality, they invite only limited amounts of foresight and social creativity, in contrast, writing good legislation in the real world seems to require more social creativity than we can currently produce.

Comment by mako yass (MakoYass) on quila's Shortform · 2024-05-08T06:57:15.749Z · LW · GW

Imagining a pivotal act of generating very convincing arguments for like voting and parliamentary systems that would turn government into 1) an working democracy 2) that's capable of solving the problem. Citizens and congress read arguments, get fired up, problem is solved through proper channels.

Comment by mako yass (MakoYass) on Industrial literacy · 2024-05-07T01:32:01.603Z · LW · GW

Yeah.

Well that's the usual reason to invoke it, I was more talking about the reason it lands as a believable or interesting explanation.

Notably, Terra Ignota managed to produce a mcguffin by having the canner device be extremely illegal by having even knowledge of its existence be a threat to the world's information infrastructure, so I'd guess that's the reason, iirc, they only made one.

Comment by mako yass (MakoYass) on KAN: Kolmogorov-Arnold Networks · 2024-05-05T18:43:56.091Z · LW · GW

I'm guessing they mean that the performance curve seems to reach much lower loss before it begins to trail off, while MLPs lose momentum much sooner. So even if MLPs are faster per unit of performance at small parameter counts and data, there's no way they will be at scale, to the extent that it's almost not worth comparing in terms of compute? (which would be an inherently rough measure anyway because, as I touched on, the relative compute will change as soon as specialized spline hardware starts to be built. Due to specialization for matmul|relu the relative performance comparison today is probably absurdly unfair to any new architecture.)

Comment by mako yass (MakoYass) on KAN: Kolmogorov-Arnold Networks · 2024-05-04T04:30:11.194Z · LW · GW

Theoretically and em-
pirically, KANs possess faster neural scaling laws than MLPs

What do they mean by this? Isn't that contradicted by this recommendation to use the an ordinary architecture if you want fast training:

A section from their diagram where they disrecommend KANs if you want fast training

It seems like they mean faster per parameter, which is an... unclear claim given that each parameter or step, here, appears to represent more computation (there's no mention of flops) than a parameter/step in a matmul|relu would? Maybe you could buff that out with specialized hardware, but they don't discuss hardware.

One might worry that KANs are hopelessly expensive, since each MLP’s weight
parameter becomes KAN’s spline function. Fortunately, KANs usually allow much smaller compu-
tation graphs than MLPs. For example, we show that for PDE solving, a 2-Layer width-10 KAN
is 100 times more accurate than a 4-Layer width-100 MLP (10−7 vs 10−5 MSE) and 100 times
more parameter efficient (102 vs 104 parameters) [this must be a typo, this would only be 1.01 times more parameter efficient].

I'm not sure this answers the question. What are the parameters, anyway, are they just single floats? If they're not, pretty misleading.

Comment by mako yass (MakoYass) on "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case · 2024-05-04T02:41:11.305Z · LW · GW

often means "train the model harder and include more CoT/code in its training data" or "finetune the model to use an external reasoning aide", and not "replace parts of the neural network with human-understandable algorithms". 

The intention of this part of the paragraph wasn't totally clear but you seem to be saying this wasn't great? From what I understand, these actually did all made the model far more interpretable?

Chain of thought is a wonderful thing, it clears a space where the model will just earnestly confess its inner thoughts and plans in a way that isn't subject to training pressure, and so it, in most ways, can't learn to be deceptive about it.

Comment by mako yass (MakoYass) on "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case · 2024-05-04T02:29:53.541Z · LW · GW

This is good! I would recommend it to a friend!

Some feedback.

  • An individual human can be inhumane, but the aggregate of human values kind of visibly isn't and in most ways couldn't be: Human cultures are getting more humane reliably as transparency/reflection and coordination increases over time, but also inevitably if you aggregate a bunch of concave values it will produce a value system that treats all of the subjects of the aggregation pretty decently.
    A lot of the time, when people accuse us of conflating something, we equate those things because we have an argument that they're going to turn out to be equivalent.
    So emphasizing a difference between these two things could be really misleading, and possibly kinda harmful, given that it could undermine the implementation of the simplest/most arguably correct solutions to alignment (which are just aggregations of humans' values). This could be a whole conversation, but could we just not define humane values as being necessarily distinct from human values? How about this:
    • People are sometimes confused by 'Human values', as it seems to assume that all humans value the same things, but many humans have values that conflict with the preferences of other humans. When we say 'Humane values', we're defining a value system that does a decent job at balancing and reconciling the preferences of every human (Humans, Every one).
  • [graph point for "systems programmer with mlp shirt"] would it be funny if there were another point, "systems programmer without mlp shirt", and it was pareto-inferior
  • "What if System 2 is System 1". This is a great insight, I think it is, and I think the main reason nerdy types often fail to notice how permeable and continuous the boundary is a kind of tragic habitual cognitive autoimmune disease, and I have a post brewing about this after I used a repaired relationship with the unconscious bulk to cure my astigmatism (I'm going to let it sit for a year just to confirm that the method actually worked and myopia really was averted)
  • Exponential growth is usually not slow, and even if it were slow, it wouldn't entail that "we'll get "warning shots" & a chance to fight back", it only takes a small sustained advantage to be able to utterly win a war (though contemporary humans don't like to carry wars to completion these days, the 20th century should have been a clear lesson that such things are within our abilities at current tech levels). Even if progress in capabilities over time continued to be linear, impact over capabilities is not going to be linear, it never has been.

But overall I think it addresses a certain audience who I know much better than my version of this that I hastily wrote last year when I was summoned to speak at a conference would have (and so I never showed it to them. Maybe one day I will show them yours.).

Comment by MakoYass on [deleted post] 2024-05-03T21:31:48.860Z

Uh I'm saying I think henry's is better. Except for the title maybe.

Comment by MakoYass on [deleted post] 2024-05-03T21:09:39.552Z

this one is better

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T23:14:23.131Z · LW · GW

Possibly incidental, but if people were successfully maintaining continuous secure access to their signal account you wouldn't even notice because it doesn't even make an attempt to transfer encrypted data to new sessions.

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:48:00.179Z · LW · GW

I don't think e2e encryption is warranted here for the first iteration. Generally, keypair management is too hard, today, everyone I know who used encrypted Element chat has lost their keys lmao. (I endorse element chat, but I don't endorse making every channel you use encrypted, you will lose your logs!), and keypairs alone are a terrible way of doing secure identity. Keys can be lost or stolen, and though that doesn't happen every day, the probability is always too high to build anything serious on top of them. I'm waiting for a secure identity system with key rotation and some form of account recovery process (which can be an institutional service or a "social recovery" thing) before building anything important on top of e2e encryption.

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:47:12.937Z · LW · GW

Then, users can put in their own private key to see a post

This was probably a typo but just in case: you should never send a private key off your device. The public key is the part that you send.

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:35:10.890Z · LW · GW

So I wrote a feature recommendation: https://www.lesswrong.com/posts/55rc6LJcqRmyaEr9T/please-stop-publishing-ideas-insights-research-about-ai?commentId=6fxN9KPeQgxZY235M

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:34:23.371Z · LW · GW

On infrastructures for private sharing:

Feature recommendation: Marked Posts (name intentionally bland. Any variant of "private" (ie, secret, sensitive, classified) would attract attention and partially negate the point)

This feature prevents leaks, without sacrificing openness.

A marked post will only be seen by members in good standing. They'll be able to see the title and abstract in their feed, but before they're able to read it, they have to click "I declare that I'm going to read this", and then they'll leave a read receipt (or a "mark") visible to the post creator, admins, other members in good standing. (these would also just serve a useful social function of giving us more mutual knowledge of who knows what, while making it easier to coordinate to make sure every post gets read by people who'd understand it and be able to pass it along to interested parties.)

If a member "reads" an abnormally high number of these posts, the system detects that, and they may have their ability to read more posts frozen. Admins, and members who've read many of the same posts, are notified, and you can investigate. If other members find that this person actually is reading this many posts, that they seem to truly understand the content, they can be given an expanded reading rate. Members in good standing should be happy to help with this, if that person is a leaker, well that's serious, if they're not a leaker, what you're doing in the interrogation setting is essentially you're just getting to know a new entrant to the community who reads and understands a lot, talking about the theory with them, and that is a happy thing to do.

Members in good standing must be endorsed by another member in good standing before they will be able to see Marked posts. The endorsements are also tracked. If someone issues too many endorsements too quickly (or the people downstream of their endorsements are collectively doing so in a short time window), this sends an alert. The exact detection algorithm here is something I have funding to develop so if you want to do this, tell me and I can expedite that project.

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:02:03.731Z · LW · GW

There never will be an infrastructure for this.

I should be less resolute about this. It would kind of be my job to look for a design that could do it.

One thing we've never seen is a system where read receipts are tracked and analyzed on the global level and read permissions are suspended and alerts are sent to admins if an account is doing too many unjustified reads.
This would prevent a small number of spies from extracting a large number of documents.
I suppose we could implement that today.

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T21:38:24.827Z · LW · GW

You think that studying agency and infrabayesianism wont make small contributions to capabilities? Even just saying "agency" in the context of AI makes capabilities progress.

Comment by mako yass (MakoYass) on Please stop publishing ideas/insights/research about AI · 2024-05-02T21:35:33.900Z · LW · GW

"So where do I privately share such research?" — good question! There is currently no infrastructure for this.

This is why I currently think you're completely wrong about this. There never will be an infrastructure for this. Privacy of communities isn't a solvable problem in general, as soon as your community is large enough to compete with the adversary, it's large enough and conspicuous enough that the adversary will pay attention to it and send in spies and extract leaks. If you make it compartmented enough to prevent leaks/weed out the spies, it wont have enough intellectual liveliness to solve the alignment problem.

There is nothing that makes differentially helping capabilities "fine if you're only differentially helping them a little bit".

If your acceptable lower limit for basically anything is zero you wont be allowed to do anything, really anything. You have to name some quantity of capabilities progress that's okay to do before you'll be allowed to talk about AI in a group setting.

Comment by mako yass (MakoYass) on Alien neuropunk slaver civilizations · 2024-05-01T05:43:12.967Z · LW · GW

It would seem to me that in this world brains would be much more expensive (or impossible) to copy. Which is worth talking about, because there are designs in our own era for very efficient very dense neural networks that have that same quality. They can be trained, but the weights can't be accessed.

Comment by mako yass (MakoYass) on Tamsin Leake's Shortform · 2024-04-28T20:51:48.231Z · LW · GW

what does it even mean?

There actually is a meaningful question there: Would you enter the experience machine? Or do you need it to be real. Do you just want the experience of pleasing others or do you need those people being pleased out there to actually exist.

There are a lot of people who really think, and might truly be experience oriented. If given the ability, they would instantly self-modify into a Victory Psychopath Protecting A Dream.

Comment by mako yass (MakoYass) on Tamsin Leake's Shortform · 2024-04-27T22:10:34.390Z · LW · GW

An interesting question for me is how much true altruism is required to give rise to a generally altruistic society under high quality coordination frameworks. I suspect it's quite small.

Another question is whether building coordination frameworks to any degree requires some background of altruism. I suspect that this is the case. It's the hypothesis I've accreted for explaining the success of post-war economies (guessing that war leads to a boom in nationalistic altruism, generally increased fairness and mutual faith).

Comment by mako yass (MakoYass) on Losing Faith In Contrarianism · 2024-04-25T22:45:35.392Z · LW · GW

It may be useful to write about how a consumer can distinguish contrarian takes from original insights. Until that's a common skill, there will remain a market for contrarians.

Comment by mako yass (MakoYass) on "You're the most beautiful girl in the world" and Wittgensteinian Language Games · 2024-04-21T15:31:19.353Z · LW · GW

I didn't, but I often want to downvote articles that seem to be lecturing a group who wouldn't read or be changed by the article. I know a lot of idiots will upvote such articles out of a belief that by doing so they are helping or attacking that group. On reddit, it often felt like that is the main reason people upvote things, to engage indirectly with others, and it kills the sub, clogging it with posts that the people who visit the sub are not themselves getting anything from.

If you engaged with the target group successfully, they would upvote the post themselves, so a person should generally never upvote on others' behalf, because they don't actually know what would work for them.

Unfortunately, the whole anonymous voting thing makes it impossible to properly address voting norm issues like this. So either I address it improperly by making deep guesses about why people are voting, in this way (no, don't enjoy) or I prepare to depose lesswrong.com with a better system (that's what I'm doing)

Comment by mako yass (MakoYass) on "You're the most beautiful girl in the world" and Wittgensteinian Language Games · 2024-04-20T21:56:59.831Z · LW · GW

On reflection, it must have played out more than once that a kiwi lad, in a foreign country, drunk, has asked a girl if she wants to get a kebab. The girl thinks he means shish-kebab but says yes enthusiastically because she likes him and assumes he wouldn't ask that unless it was an abnormally good shish-kebab. The kiwi realizes too late that there are no kebabs in america, but they end up going ahead and getting shish-kebabs out of a combination of face-saving, and an infatuation-related coordination problem: The girl now truly wants a shish-kebab, it is too late to redirect the desires of the group.

So that detail might have just been inspired by a true story.

Comment by mako yass (MakoYass) on "You're the most beautiful girl in the world" and Wittgensteinian Language Games · 2024-04-20T20:46:31.597Z · LW · GW

Americans don't know how much they had to compromise in this video by using shish-kebabs instead of what a new zealander would really mean when someone at a party says "do you want to get a kebab with me", which are instead like, the turkish version of burritos, instead of mince, beans and cheese; turkish meat, hummus, veges and wider choice of sauces. They're a fixture of nightlife and tend to be open late.

Comment by mako yass (MakoYass) on AI #60: Oh the Humanity · 2024-04-19T02:02:23.174Z · LW · GW

If you wanna talk about the humanity(ies), well I looked up Chief Vision Officer of AISI Adam Russel, and he has an interesting profile.

Russell completed a Bachelor of Arts in Cultural Anthropology from Duke University, and an M.Phil. and a D.Phil. in Social Anthropology from University of Oxford, where he was a Rhodes Scholar.[2] He played with the Oxford University RFC for four varsity matches and also worked with the United States national rugby union team, and worked as High Performance director for the United States women's national rugby union team in the 2014 and 2017 Women's Rugby World Cups.[3]

Russell was in the industry, where he was a senior scientist and principal investigator on a wide range of human performance and social science research projects and provided strategic assessments for a number of different government organizations.[2][4] Russell joined Intelligence Advanced Research Projects Activity (IARPA) as a program manager.[2][4] He developed and managed a number of high-risk, high-payoff research projects for the Office of the Director of National Intelligence.[2] Russell joined DARPA as a program manager in July 2015.[2][4] His work there focused on new experimental platforms and tools to facilitate discovery, quantification and "big validation" of fundamental measures in social science, behavioral science and human performance.[2]

In 2022, secretary Xavier Becerra selected Russell to serve as the acting deputy director for the Advanced Research Projects Agency for Health (ARPA-H), effective June 6. In this role, Russell leads the process to stand up ARPA-H.[5]

Hmm he's done a lot of macho human-enhancement-adjacent stuff. I wonder if there were some centaurists involved here.

  • I previously noted a lot of research projects in neurotech research in DoD funding awards. I'm making a connection between this and a joke I heard recently on a navy seals podcast. "The guys often ask what they can do to deal with drones. So you start showing them how to work the jammer devices, or net guns, and their eyes glaze over, it's not what they wanted, they're disappointed. They're thinking like, 'no... how can I deal with it. Myself.' "
  • So even though alignment-by-merger is kinda obviously not going to work (you'd have to reverse-engineer two vats of inscrutable matrices, instead of one. And the fleshy pink one wasn't designed to be read from and can only be read on a neuron-by-neuron level after being plastinated (which also kills it). AGI alignment is something that a neuralink cannot solve.), it's conceivable that it's an especially popular line of thought among military/sports types.

Otherwise, this kinda lines up with my confessions on manhattan projects for AGI. You arguably need an anthropologist to make decisions about what 'aligned' means. I don't know if you really need one (a philosophically inclined decision theorist, likely to already be involved already, would be enough for me) but I wouldn't be surprised to see an anthropologist appointed in the most serious projects.

Comment by mako yass (MakoYass) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-18T19:54:00.481Z · LW · GW

Feel like there's a decent chance they already changed their minds as a result of meeting him or engaging with their coworkers about the issue. EAs are good at conflict resolution.

Comment by mako yass (MakoYass) on Speedrun ruiner research idea · 2024-04-14T23:59:08.244Z · LW · GW

Wouldn't really need reward modelling for narrow optimizers. Weak general real-world optimizers, I find difficult to imagine, and I'd expect them to be continuous with strong ones, the projects to make weak ones wouldn't be easily distinguishable from the projects to make strong ones.

Oh, are you thinking of applying it to say, simulation training.

Comment by mako yass (MakoYass) on Speedrun ruiner research idea · 2024-04-14T19:20:24.938Z · LW · GW

Cool then.

Are you aware that prepotence is the default for strong optimizers though?

Comment by mako yass (MakoYass) on Speedrun ruiner research idea · 2024-04-14T17:49:01.232Z · LW · GW

Are you proposing applying this to something potentially prepotent? Or does this come with corrigibility guarantees? If you applied it to a prepotence, I'm pretty sure this would be an extremely bad idea. The actual human utility function (the rules of the game as intended) supports important glitch-like behavior, where cheap tricks can extract enormous amounts of utility, which means that applying this to general alignment has the potential of foreclosing most value that could have existed.

Example 1: Virtual worlds are a weird out-of-distribution part of the human utility function that allows the AI to "cheat" and create impossibly good experiences by cutting the human's senses off from the real world and showing them an illusion. As far as I'm concerned, creating non-deceptive virtual worlds (like, very good video games) is correct behavior and the future would be immeasurably devalued if it were disallowed.

Example 2: I am not a hedonist, but I can't say conclusively that I wouldn't become one (turn out to be one) if I had full knowledge of my preferences, and the ability to self-modify, as well as lots of time and safety to reflect, settle my affairs in the world, set aside my pride, and then wirehead. This is a glitchy looking behavior that allows the AI to extract a much higher yield of utility from each subject by gradually warping them into a shape where they lose touch with most of what we currently call "values", where one value dominates all of the others. If it is incorrect behavior, then sure, it shouldn't be allowed to do that, but humans don't have the kind of self-reflection that is required to tell whether it's incorrect behavior or not, today, and if it's correct behavior, forever forbidding it is actually a far more horrifying outcome, what you'd be doing is, in some sense of 'suffering', forever prolonging some amount of suffering. That's fine if humans tolerate and prefer some amount of suffering, but we aren't sure of that yet.

Comment by mako yass (MakoYass) on MakoYass's Shortform · 2024-04-12T22:23:03.426Z · LW · GW

(instutitional reform take, not important due to short timelines, please ignore)

The kinds of people who do whataboutism, stuff like "this is a dangerous distraction because it takes funding away from other initiatives", tend also to concentrate in low-bandwidth institutions, the legislature, the committee, economies righteously withering, the global discourse of the current thing, the new york times, the ivy league. These institutions recognize no alternatives to them, while, by their nature, they can never grow to the stature required to adequately perform the task assigned to them.
I don't think this is a coincidence, and it makes it much easier for me to sympathize with these people: They actually believe that we can't deal with more than one thing at a time.

They generally have no hope for decentralized decisionmaking, and when you examine them closely you find that they don't really seem to believe in democracy, they've given up on it, they don't talk about reforming it, they don't want third parties, they've generally never heard of decentralized public funding mechanisms, certainly not futarchy. So it's kind of as simple as that. They're not being willfully ignorant. We just have to show them the alternatives, and properly, we basically haven't done it yet. The minarchists never offered a solution to negative externalities or public goods provision. There are proposals but the designs are still vague and poorly communicated. There has never been an articulation of enlightened technocracy, which is essentially just succeeding at specialization or parallelization in executive decisionmaking. I'm not sure enlightened technocracy was ever possible until the proposal of futarchy, a mechanism by which non-experts can hold claimed experts accountable.

Comment by mako yass (MakoYass) on What does Eliezer Yudkowsky think of the meaning of life now? · 2024-04-11T19:05:48.886Z · LW · GW

If that's really the only thing he drew meaning from, and if he truly thinks that failure is inevitable, today, then I guess he must be getting his meaning from striving to fail in the most dignified possible way.

But I'd guess that like most humans, he probably also draws meaning from love, and joy. You know, living well. The point of surviving was that a future where humans survive would have a lot of that in it.
If failure were truly inevitable (though I don't personally think it is[1]), I'd recommend setting the work aside and making it your duty to just generate as much love and joy as you can with the time you have available. That's how we lived for most of history, and how most people still live today. We can learn to live that way.

  1. ^

    Reasons I don't understand why anyone would have a P(Doom) higher than 75%: Governments are showing indications of taking the problem seriously. Inspectability techniques are getting pretty good, so misalignment is likely to be detectable before deployment, so a sufficiently energetic government response could be possible, and sub-AGI tech is sufficient for controlling the supply chain and buying additional time, and China isn't suicidal. Major inner misalignment might just not really happen. Self-correcting from natural language instructions to "be good, you know" could be enough. There are very deep principled reasons to expect that having two opposing AGIs debate and check each others' arguments works well.

Comment by mako yass (MakoYass) on Alexander Gietelink Oldenziel's Shortform · 2024-04-11T18:46:16.615Z · LW · GW

Yeah I'm pretty sure you would need to violate heisenberg uncertainty in order to make this and then you'd have to keep it in a 0 kelvin cleanroom forever.

A practical locked battery with tamperproofing would mostly just look like a battery.

Comment by mako yass (MakoYass) on romeostevensit's Shortform · 2024-04-10T21:29:51.701Z · LW · GW

I don't recognize wikipedia's theories as predictive. Mine has some predictions, but I hope it's obvious why I would not be interested in making this a debate or engaging much in the conceptual dismantling of subcultures at all.

Comment by mako yass (MakoYass) on romeostevensit's Shortform · 2024-04-10T19:27:38.030Z · LW · GW

I didn't read RS's claim as the claim that all subcultures persist through failure, but now that you ask, no, yeah, ime a really surprising number of these subcultures actually persist through failure.

  • I know of a fairly influential subculture of optics-oriented politics technologists who've committed to a hostile relationship towards transhumanism. Transhumanism (the claim that people want to change in deep ways and that technology will fairly soon permit it) suggests that racial distinctions will become almost entirely irrelvant, so in order to maintain their version of afrofuturism where black and white futurism remain importantly distinct projects, they have to find some way to deny transhumanism. But rejecting transhumanism means they are never allowed to actually do high quality futurism because they can't ask transhumanist questions and get a basic sense of what the future is going to be like. Or like, as soon as any of them do start asking those questions, those people wake up and drop out of that subculture. I've also met black transhumanists who identified as afrofuturists though. I can totally imagine articulations of afrofuturism that work with transhumanism. So I don't know how the entire thing's going to turn out.
  • Anarcho-punks fight only for the underdogs. That means they're attached to the identity of being underdogs, as soon as any of them start really winning, they'd no longer be recognised as punk, and they know this, so they're uninterested in — and in many cases, actively opposed to — succeeding in any of their goals. There are no influential anarcho-punks, and as far as I could gather, no living heroes.
  • BDSM: My model of fetishes is that they represent hedonic refuges for currently unmeetable needs, like, deep human needs that for one reason or another a person can't pursue or even recognise the real version of the thing they need in the world as they understand it, I think it's a protective mechanism to keep the basic drive roughly in tact and wired up by having the subject pursue symbolic fantasy versions of it. This means that getting the real thing (EG, for submissives, a committed relationship with someone you absolutely trust. For doms... probably a sense of safety?) would obsolete the kink, and it would wither away. I think they mostly don't know this, but the mindset in which the kink is seen as the objective requires that the real thing is never recognised or attained, so these communities reproduce best by circulating memes that make it harder to recognise the real thing.

I guess this is largely about how you define the movements' goals. If the goal of punk is to have loud parties with lots of drugs, it's perfect at that. If the goal is to bring about anarchosocialism or thrive under a plural geopolitical order, it's a sworn loser.

Comment by mako yass (MakoYass) on MakoYass's Shortform · 2024-04-10T19:10:46.130Z · LW · GW

Strong evidence is incredibly ordinary, and that genuinely doesn't seem to be intuitive. Like,
every time you see a bit string longer than a kilobyte there is a claim in your corpus that goes from roughly zero to roughly one, and you are doing that all day. I don't know about you, but I still don't think I've fully digested that.

Comment by mako yass (MakoYass) on MakoYass's Shortform · 2024-04-10T18:32:02.042Z · LW · GW

I have this draft, Extraordinary Claims Routinely Get Proven with Ordinary Evidence, a debunking of that old Sagan line. We actually do routinely prove extraordinary claims like evolution or plate tectonics with old evidence that's been in front of our faces for hundreds of years, and that's important.

But Evolution and plate tectonics are the only examples I can think of, because I'm not really particularly interested in the history of science, for similar underlying reasons to being the one who wants to write this post. Collecting buckets of examples is not as useful as being able to deeply interpret and explain the examples that you have.

But I'm still not posting this until someone gives me more examples! I want the post to fight and win on the terms of the people it's trying to reach. Subdue the stamp collectors with stamps. It's the only way they'll listen.

Comment by mako yass (MakoYass) on ChristianKl's Shortform · 2024-04-08T21:26:12.800Z · LW · GW

most of the rest will be solar panels

Cole Nielson-cole is working towards designing fiber composit construction stages for space, he has thoughts about this, in short, microwave lasers as energy transmission and rectifying antennas as energy receivers. But he doesn't get into the topic of lasers and I'm pretty sure we don't have that today, right?

But I thought the whole interview was great.

Comment by mako yass (MakoYass) on My intellectual journey to (dis)solve the hard problem of consciousness · 2024-04-07T22:23:50.602Z · LW · GW

I think that's kind of what meditation can lead to.

It should, right? But isn't there a very large overlap between meditators and people who mystify consciousness?

Maybe in the same way as there's also a very large overlap between people who are pursuing good financial advice and people who end up receiving bad financial advice... Some genres are majority shit, so if I characterise the genre by the average article I've encountered from it, of course I will think the genre is shit. But there's a common adverse selection process where the majority of any genre, through no fault of its own, will be shit, because shit is easier to produce, and because it doesn't work, it creates repeat customers, so building for the audience who want shit is far far more profitable.

Comment by mako yass (MakoYass) on Thomas Kwa's Shortform · 2024-04-07T02:31:11.219Z · LW · GW

You may be interested in Kenneth Stanley's serendipity-oriented social network, maven

Comment by mako yass (MakoYass) on Vanessa Kosoy's Shortform · 2024-04-07T02:26:18.610Z · LW · GW

They have superintelligence, the augmenting technologies that come of it, and the self-reflection that follows receiving those, they are not the same types of people.

Comment by mako yass (MakoYass) on My intellectual journey to (dis)solve the hard problem of consciousness · 2024-04-07T00:29:13.510Z · LW · GW

I've traveled these roads too. At some point I thought that the hard problem reduced to the problem of deriving an indexical prior, a prior on having a particular position in the universe, which we should expect to derive from specifics of its physical substrate, and it's apparent that whatever the true indexical prior is, it can't be studied empirically, it is inherently mysterious. A firmer articulation of "why does this matter experience being". Today, apparently, I think of that less as a deeply important metaphysical mystery and more just as another imperfect logical machine that we have to patch together just well enough to keep our decision theory working. Last time I scratched at this I got the sense that there's really no truth to be found beyond that. IIRC Wei Dai's UDASSA answers this with the inverse kolmogorov complexity of the address of the observer within the universe, or something. It doesn't matter. It seems to work.

But after looking over this, reexamining, yeah, what causes people to talk about consciousness in these ways? And I get the sense that almost all of the confusion comes from the perception of a distinction between Me and My Brain. And that could come from all sorts of dynamics, sandboxing of deliberative reasoning due to hostile information environments, to more easily lie in external politics, and as a result of outcomes of internal (inter-module) politics (meme wont attempt to supercede gene if meme is deluded into thinking it's already in control, so that's what gene does).

That sort of sandboxing dynamic arises inevitably from other-modelling. In order to simulate another person, you need to be able to isolate the simulation from your own background knowledge and replace it with your approximations of their own, the simulation cannot feel the brain around it. I think most peoples' conception of consciousness is like that, a simulation of what they imagine to be themselves, similarly isolated from most of the brain.

Maybe the way to transcend it is to develop a more sophisticated kind of self-model.

But that's complicated by the fact that when you're doing politics irl you need to be able to distinguish other peoples' models of you from your own model of you, so you're going to end up with an abundance of shitty models of yourself. I think people fall into a mistake of thinking that the you that your friend sees when you're talking is the actual you. They really want to believe it.

Humans sure are rough.

Comment by mako yass (MakoYass) on Please Understand · 2024-04-02T19:39:05.953Z · LW · GW

even existing GenAI can make good-enough content that would otherwise have required nontrivial amounts of human cognitive effort

This doesn't seem to be true to me. Good enough for what? We're still in the "wow, an AI made this" stage. We find that people don't value AI art, and I don't think that's because of its unscarcity or whatever, I think it's because it isn't saying anything. It either needs to be very tightly controlled by an AI-using human artist, or the machine needs to understand the needs of the world and the audience, and as soon as machines have that...

Ending the world? Where does that come in?

All communications assume that the point they're making is important and worth reading in some way (cooperative maxim of quantity). I'm contending that that assumption isn't true in light of what seems likely to actually happen immediately or shortly after the point starts to become applicable to the technology, and I have explained why, but I might be able to understand if it's still confusing, because:

The space of 'anything we can imagine' will shrink as our endogenous understanding of concepts shrinks. It will never not be 'our problem'

is true, but that doesn't mean we need to worry about this today. By the time we have to worry about preserving our understanding of the creative process against automation of it, we'll be on the verge of receiving post-linguistic knowledge transfer technologies and everything else, quicker than the automation can wreak its atrophying effects. Eventually it'll be a problem that we each have to tackle, but we'll have a new kind of support, paradoxically, learning the solutions to the problem will not be our problem.

Comment by mako yass (MakoYass) on All About Concave and Convex Agents · 2024-04-02T02:35:51.537Z · LW · GW

This seems to be talking about situations where a vector of inputs has an optimal setting at extremes (convex), in contrast to situations where the optimal setting is a compromise (concave).

I'm inclined to say it's a very different discussion than this one, as an agent's resource utility function is generally strictly increasing, so wont take either of these forms. The optimal will always be at the far end of the function.

But no, I see the correspondence: Tradeoffs in resource distribution between agents. A tradeoff function dividing resources between two concave agents (, where  is the hoard being divided between them,  ) will produce that sort of concave bulge, with its optimum being a compromise in the middle, while a tradeoff function between two convex agents will have its optima at one or both of the ends.

Comment by mako yass (MakoYass) on Please Understand · 2024-04-01T22:11:03.295Z · LW · GW

The post seems to assume a future version of generative AI that no longer has the limitations of the current paradigm which obligate humans to check, understand, and often in some way finely control and intervene in the output, but where that tech is somehow not reliable and independent enough to be applied to ending the world, and somehow we get this long period where we get to feel the cultural/pedagogical impacts of this offloading of understanding, where it's worth worrying about, where it's still our problem. That seems contradictory. I really don't buy it.

Comment by mako yass (MakoYass) on All About Concave and Convex Agents · 2024-03-29T04:43:45.115Z · LW · GW

Alternate phrasing, "Oh, you could steal the townhouse at a 1/8billion probability? How about we make a deal instead. If the rng rolls a number lower than 1/7billion, I give you the townhouse, otherwise, you deactivate and give us back the world." The convex agent finds that to be a much better deal, accepts, then deactivates.

I guess perhaps it was the holdout who was being unreasonable, in the previous telling.

Comment by mako yass (MakoYass) on All About Concave and Convex Agents · 2024-03-26T22:49:19.855Z · LW · GW

Yeah, to clarify, I'm also not familiar enough with RL to assess exactly how plausible it is that we'll see this compensatory convexity, around today's techniques. For investigating, "Reward shaping" would be a relevant keyword. I hear they do some messy things over there.

But I mention it because there are abstract reasons to expect to see it become a relevant idea in the development of general optimizers, which have to come up with their own reward functions. It also seems relevant in evolutionary learning, where very small advantages over the performance of the previous state of the art equates to a complete victory, so if there are diminishing returns at the top, competition kind of amplifies the stakes, and if an adaptation to this amplification of diminishing returns trickles back into a utility function, you could get a convex agent.

Comment by mako yass (MakoYass) on Do not delete your misaligned AGI. · 2024-03-25T21:49:00.133Z · LW · GW

I see.

My response would be that any specific parameters of the commitment should vary depending on each different AI's preferences and conduct.

Comment by mako yass (MakoYass) on All About Concave and Convex Agents · 2024-03-25T20:01:46.628Z · LW · GW

In what way would Kelly instruct you to be concave?

Comment by mako yass (MakoYass) on All About Concave and Convex Agents · 2024-03-25T05:13:36.345Z · LW · GW

Mm on reflection, the Holdout story glossed over the part where the agent had to trade off risk against time to first intersolar launch (launch had already happened). I guess they're unlikely to make it through that stage.
Accelerating cosmological expansion means that we lose, iirc, 6 stars every day we wait before setting out. The convex AGI knows this, so even in its earliest days it's plotting and trying to find some way to risk it all to get out one second sooner. So I guess what this looks like is it says something totally feverish to its operators to radicalize them as quickly and energetically as possible, messages that'll tend to result in a "what the fuck, this is extremely creepy" reaction 99% of the time.

But I guess I'm still not convinced this is true with such generality that we can stop preparing for that scenario. Situations where you can create an opportunity to gain a lot by risking your life might not be overwhelmingly common, given the inherent tension between those things (usually, safeguarding your life is an instrumental goal), and given that risking your life is difficult to do once you're a lone superintelligence with many replicas.