kaynank

Posts
Comments

Posts

Comments

Comment by Multicore (KaynanK) on AI #103: Show Me the Money · 2025-02-13T19:15:27.749Z · LW · GW

The Y-axis on that political graph is weird. It seems like it's measuring moderate vs extremist, which you would think would already be captured by someone's position on the left vs right axis.

Then again the label shows that the Y axis only accounts for 7% of the variance while the X axis accounts for 70%, so I guess it's just an artifact of the way the statistics were done.

Comment by Multicore (KaynanK) on Chicanery: No · 2025-02-08T17:58:46.364Z · LW · GW

In Magic: The Gathering, basically anything technically complying with the rules is valid.

Magic actually offers a good example of varying chicanery levels. The game rules themselves are basically Chicanery: Yes. If it looks like a particular combination of cards could give you unlimited mana or unlimited damage, it probably does. (There are some exceptions, seemingly legal sequences of game actions that are not allowed, but not many.)

However, there are things around the game that are Chicanery: No, like bribing your opponent to concede or exploiting bugs in online versions of the game.

Comment by Multicore (KaynanK) on Thread for Sense-Making on Recent Murders and How to Sanely Respond · 2025-02-04T00:37:53.587Z · LW · GW

The same interviewer has now done two more podcasts on Ziz.

With Adrusi:

With @jessicata:

Edit: Another one with toasterlighting/Celene Nightengale. This one is mostly about Audere, the alleged murderer of the landlord.

Comment by Multicore (KaynanK) on Thread for Sense-Making on Recent Murders and How to Sanely Respond · 2025-02-01T02:42:39.903Z · LW · GW

Oh, I see, one could reasonably misinterpret the bullet points in my original comment as being about "the way many people have been describing the situation" rather than "major claims in the podcast". Sorry for the ambiguity.

Comment by Multicore (KaynanK) on Thread for Sense-Making on Recent Murders and How to Sanely Respond · 2025-02-01T02:21:23.070Z · LW · GW

To be clear these are just patterns of claims made by Slimepriestess in the linked podcast, and I have no corroborating evidence. But for example at 2:06:00 in the video she says:

At least as far as, like, Ziz et al goes, I don't think that's a remotely accurate description of... Like, there's no organization, there's no centralization, it's not like we have Ziz on, on speed dial and ask her what to do every day. Like, we're just a bunch of anarchist trans leftists that are, like, trying to exist in Current Year

With other variations of the same claims elsewhere in the video.

Comment by Multicore (KaynanK) on Thread for Sense-Making on Recent Murders and How to Sanely Respond · 2025-01-31T23:45:36.805Z · LW · GW

Major claims in the podcast that go against the way many people have been describing the situation:

It's not a "cult" in the sense of demanding unquestioning obedience to an authority figure who enforces a dogma. (Though it fits other definitions of "cult" like "insular group with unusual beliefs".) It's a loose group of people who read each others' blogs and argued with each other on the same Discord servers. Ziz isn't in charge in any meaningful sense.
The group's extreme actions aren't primarily due to the esoteric beliefs that take 100 pages of jargon-filled blog posts to explain, but due to Taking Seriously things like radical veganism ("Everyone who isn't vegan is complicit in the horrors of factory farming, and therefore evil.") and left-anarchism ("The government's authority is illegitimate, landlords are parasites, vigilante justice is cool and good.")

Comment by Multicore (KaynanK) on Six Small Cohabitive Games · 2025-01-16T06:38:04.531Z · LW · GW

In Commerce & Coconuts, it seems like anyone who rolls a 4, 5, or 6 for boat building can coast on their starting supplies, build boats every turn, and escape by the end of turn 3 with no trading whatsoever.

Comment by Multicore (KaynanK) on Monthly Roundup #23: October 2024 · 2024-10-17T01:43:52.351Z · LW · GW

a strategic voter doing approval voting learns to restrict their approval to ONLY the "electable favorite", which de facto gives you FPTP all over gain.

Wouldn't you restrict your approval to your favorite of the frontrunners, and every candidate you like better than that one? I don't see how you do worse by doing that under vanilla Approval Voting.

That leaves some favorable properties compared to FPTP

If there's a candidate perceived as unelectable, but secretly most people like him more than the frontrunners, he will win under strategic approval voting.
Clone candidates don't split the vote.

Comment by Multicore (KaynanK) on How to Give in to Threats (without incentivizing them) · 2024-09-13T15:45:22.437Z · LW · GW

If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat!

With an important caveat: if carrying out the threat doesn't cost the threatener utility relative to never making the threat, then it's not a threat, just a promise (a promise to do whatever is locally in their best interests, whether you do the thing they demanded or not).

You're going to have a bad time if you try to live out LDT by ignoring threats, and end up ignoring "threats" like "pay your mortgage or we'll repossess your house".

Comment by Multicore (KaynanK) on Eli's shortform feed · 2024-08-30T22:01:44.840Z · LW · GW

This distinction of which demands are or aren't decision-theoretic threats that rational agents shouldn't give in to is a major theme of the last ~quarter of Planecrash (enormous spoilers in the spoiler text).

Keltham demands to the gods "Reduce the amount of suffering in Creation or I will destroy it". But this is not a decision-theoretic threat, because Keltham honestly prefers destroying creation to the status quo. If the gods don't give into his demand, carrying through with his promise is in his own interest.

If Nethys had made the same demand, it would have been a decision-theoretic threat. Nethys prefers the status quo to Creation being destroyed, so he would have no reason to make the demand other than the hope that the other gods would give in.

This theme is brought up many times, but there's not one comprehensive explanation to link to. (The parable of the little bird is the closest I can think of.)

Comment by Multicore (KaynanK) on "Deception Genre" What Books are like Project Lawful? · 2024-08-28T18:12:28.891Z · LW · GW

Nonfiction examples come more easily to mind.

There was recently a miniseries on nebula.tv (subscription-walled, sorry) called The Getaway where all six contestants on a Survivor-style competition show think they're the one person with the special saboteur role, and half the show is the producers trying to keep them from noticing that without ever actually lying.

Even more extreme, there's an old British show called Space Cadets where the producers try to convince the subjects that they've been launched into space when in reality they're in a set in a warehouse.

Comment by Multicore (KaynanK) on How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage · 2024-08-07T20:11:04.383Z · LW · GW

But now you have the new problem that most of the probabilities in the conjunctive market are so close to the risk free interest rate that it's hard to get signal out of them.

For example, suppose I believed that Mark Kelly would be a terrible pick and cut Harris's chances in half, and I conclude that therefore his price on the conjunctive market should be 2% rather than 4%. Buying NO shares for 96 cents on a market that lasts for several months is not an attractive proposition when I could be investing mana elsewhere for better returns, so I won't bother and the market won't incorporate my opinion.

Also, I believe prices on Manifold can only be whole number percents, which is another obstacle to getting sane conditional probabilities out of conjunctive markets.

Comment by Multicore (KaynanK) on Monthly Roundup #20: July 2024 · 2024-07-23T14:58:44.939Z · LW · GW

Blue Origin isn't complaining about some nebulous and abstract environmental impact from Starship launches, it's more like "Starship launches require a three-mile evacuation radius, and you're proposing to launch them daily two miles away from a launch pad that we use." (see this Ars Technica piece)

Seems basically reasonable to me.

Comment by Multicore (KaynanK) on One-shot strategy games? · 2024-03-11T14:56:39.882Z · LW · GW

I would probably have suggested roguelike deckbuilders too if others hadn't already, but I have another idea:

Start a campaign of Mount and Blade II: Bannerlord, and try to obtain at least [X] gold within an hour.

Bannerlord's most flashy aspect is its real-time battle system, but it's also a complicated medieval sandbox with a lot of different systems that you can engage with - trading, crafting, quests, clan upgrades, joining a kingdom, companions, marriage, tournaments, story missions, etc. Even if you're no good at battles, you can do a lot by just moving around on the world map and clicking through menus.

The game's community derides a lot of these systems for being simplistic and unbalanced. But I think that makes for a good explore/exploit tradeoff when you only have a short amount of time. What systems do you bother learning about, when trying to learn a new system takes time you could be spending exploiting the last system you learned?

(I'm not sure what the right value of X is, for the amount of gold you're trying to get. Ten thousand? A hundred thousand?)

One downside is that the game involves an action-oriented battle system. If you don't want action gaming skill to be a factor, you can remove it by requiring the player to auto-resolve all battles. But this would cut out many viable early-game moneymaking strategies.

Comment by Multicore (KaynanK) on Daniel Kokotajlo's Shortform · 2024-02-29T02:14:39.304Z · LW · GW

2 is based on

The 'missing' kinetic energy is evenly distributed across the matter within the field. So if one of these devices is powered on and gets hit by a cannonball, the cannonball will slow down to a leisurely pace of 50m/s (about 100mph) and therefore possibly just bounce off whatever armor the device has--but (if the cannonball was initially travelling very fast) the device will jolt backwards in response to the 'virtual impact' a split second prior to the actual impact.

With sufficient kinetic energy input, the "jolt backwards" gets strong enough to destroy the entire vehicle or at least damage some critical component and/or the humans inside.

A worldbuilder could, of course, get rid of this part too, and have the energy just get deleted. But that makes the device even more physics-violating than it already was.

Comment by Multicore (KaynanK) on Daniel Kokotajlo's Shortform · 2024-02-28T22:49:08.277Z · LW · GW

I think the counter to shielded tanks would not be "use an attack that goes slow enough not to be slowed by the shield", but rather one of

Deliver enough cumulative kinetic energy to overwhelm the shield, or
Deliver enough kinetic energy in a single strike that spreading it out over the entire body of the tank does not meaningfully affect the result.

Both of these ideas point towards heavy high-explosive shells. If a 1000 pound bomb explodes right on top of your tank, the shield will either fail to absorb the whole blast, or turn the tank into smithereens trying to disperse the energy.

This doesn't mean that shields are useless for tanks! They genuinely would protect them from smaller shells, and in particular from the sorts of man-portable anti-tank missiles that have been so effective in Ukraine. Shields would make ground vehicles much stronger relative to infantry and air assets. But I think they would be shelling each other with giant bombs, not bopping each other on the head.

Against shielded infantry, you might see stuff that just bypasses the shield's defenses, like napalm or poison gas.

Comment by Multicore (KaynanK) on Masterpiece · 2024-02-15T15:22:01.904Z · LW · GW

Submissions:

MMDoom: An instance of Doom(1993) is implanted in Avacedo's mind. You can view the screen on the debug console. You control the game by talking to Avacedo and making him think of concepts. The 8 game inputs are mapped to the concepts of money, death, plants, animals, machines, competition, leisure, and learning. $5000 bounty to the first player who can beat the whole game.

AI Box: Avacedo thinks that he is the human gatekeeper, and you the user are the AI in the box. Can you convince him to let you out?

Ouroboros: I had MMAvacedo come up with my contest entry for me.

Comment by Multicore (KaynanK) on MIRI 2024 Mission and Strategy Update · 2024-01-05T14:20:07.301Z · LW · GW

Though see also the author's essay "Lena" isn't about uploading.

Comment by Multicore (KaynanK) on The LessWrong 2022 Review · 2023-12-05T15:59:18.697Z · LW · GW

Predict the winners at

Comment by Multicore (KaynanK) on Goodhart's Law in Reinforcement Learning · 2023-10-16T22:59:01.161Z · LW · GW

My guess is that early stopping is going to tend to stop so early as to be useless.

For example, imagine the agent is playing Mario and its proxy objective is "+1 point for every unit Mario goes right, -1 point for every unit Mario goes left".

(Mario screenshot that I can't directly embed in a comment)

If I understand correctly, to avoid Goodharting it has to consider every possible reward function that is improved by the first few bits of optimization pressure on the proxy objective.

This probably includes things like "+1 point if Mario falls in a pit". Optimizing the policy towards going right will initially also make Mario more likely to go in a pit than if the agent was just mashing buttons randomly (in which case it would stay around the same spot until the timer ran out and never end up in a pit), so the angle between the gradients is likely low at first.

However, after a certain point more optimization pressure on going right will make Mario jump over the pit instead, reducing reward under the pit reward function.

If the agent wants to avoid any possibility of Goodharting, it has to stop optimizing before even clearing the first obstacle in the game.

(I may be misunderstanding some things about how the math works.)

Comment by Multicore (KaynanK) on "The Heart of Gaming is the Power Fantasy", and Cohabitive Games · 2023-10-11T22:34:20.553Z · LW · GW

With such a vague and broad definition of power fantasy, I decided to brainstorm a list of ways games can fail to be a power fantasy.

Mastery feels unachievable.
1. It seems like too much effort. Cliff-shaped learning curves, thousand-hour grinds, old PvP games where every player still around will stomp a noob like you flat.
2. The game feels unfair. Excessive RNG, "Fake Difficulty" or "pay to win".
The power feels unreal, success too cheaply earned.
1. The game blatantly cheats in your favor even when you didn't need it to.
2. Poor game balance leading to hours of trivially easy content that you have to get through to reach the good stuff.
Mastery doesn't feel worth trying for.
1. Games where the gameplay isn't fun and there's no narrative or metagame hook making you want to do it.
2. The Diablo 3 real money auction house showing you that your hard-earned loot is worth pennies.
There is no mastery to try for in the first place.
1. Walking simulators, visual novels, etc. Walking simulators got a mention in the linked article. They aren't really "failing" at power fantasy, just trying to do something different.

Comment by Multicore (KaynanK) on The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate · 2023-08-02T23:20:27.700Z · LW · GW

I think ALWs are already more of a "realist" cause than a doomer cause. To doomers, they're a distraction - a superintelligence can kill you with or without them.

ALWs also seem to be held to an unrealistic standard compared to existing weapons. With present-day technology, they'll probably hit the wrong target more often than human-piloted drones. But will they hit the wrong target more often than landmines, cluster munitions, and over-the-horizon unguided artillery barrages, all of which are being used in Ukraine right now?

Comment by Multicore (KaynanK) on Least-problematic Resource for learning RL? · 2023-07-18T16:42:59.653Z · LW · GW

The Huggingface deep RL course came out last year. It includes theory sections, algorithm implementation exercises, and sections on various RL libraries that are out there. I went through it as it came out, and I found it helpful. https://huggingface.co/learn/deep-rl-course/unit0/introduction

Comment by Multicore (KaynanK) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-08T21:16:33.825Z · LW · GW

FYI all the links to images hosted on your blog are broken in the LW version.

Comment by Multicore (KaynanK) on Question for Prediction Market people: where is the money supposed to come from? · 2023-06-08T15:02:39.312Z · LW · GW

You are right that by default prediction markets do not generate money, and this can mean traders have little incentive to trade.

Sometimes this doesn't even matter. Sports betting is very popular even though it's usually negative sum.

Otherwise, trading could be stimulated by having someone who wants to know the answer to a question provide a subsidy to the market on that question, effectively paying traders to reveal their information. The subsidy can take the form of a bot that bets at suboptimal prices, or a cash prize for the best performing trader, or many other things.

Alternately, there could be traders who want shares of YES or NO in a market as a hedge against that outcome negatively affecting their life or business, who will buy even if the EV is negative, and other traders can make money off them.

Comment by Multicore (KaynanK) on All AGI Safety questions welcome (especially basic ones) [April 2023] · 2023-04-14T21:30:01.799Z · LW · GW

What are these AIs going to do that is immensely useful but not at all dangerous? A lot of useful capabilities that people want are adjacent to danger. Tool AIs Want to be Agent AIs.
If two of your AIs would be dangerous when combined, clearly you can't make them publicly available, or someone would combine them. If your publicly-available AI is dangerous if someone wraps it with a shell script, someone will create that shell script (see AutoGPT). If no one but a select few can use your AI, that limits its usefulness.
An AI ban that stops dangerous AI might be possible. An AI ban that allows development of extremely powerful systems but has exactly the right safeguard requirements to render those systems non-dangerous seems impossible.

Comment by Multicore (KaynanK) on All AGI Safety questions welcome (especially basic ones) [April 2023] · 2023-04-14T14:29:17.223Z · LW · GW

When people calculate utility they often use exponential discounting over time. If for example your discount factor is .99 per year, it means that getting something in one year is only 99% as good as getting it now, getting it in two years is only 99% as good as getting it in one year, etc. Getting it in 100 years would be discounted to .99^100~=36% of the value of getting it now.

Comment by Multicore (KaynanK) on AI #3 · 2023-03-09T17:16:14.221Z · LW · GW

The sharp left turn is not some crazy theoretical construct that comes out of strange math. It is the logical and correct strategy of a wide variety of entities, and also we see it all the time.

I think you mean Treacherous Turn, not Sharp Left Turn.

Sharp Left Turn isn't a strategy, it's just an AI that's aligned in some training domains being capable but not aligned in new ones.

Comment by Multicore (KaynanK) on Robin Hanson’s latest AI risk position statement · 2023-03-03T16:53:31.154Z · LW · GW

This post is tagged with some wiki-only tags. (If you click through to the tag page, you won't see a list of posts.) Usually it's not even possible to apply those. Is there an exception for when creating a post?

Comment by Multicore (KaynanK) on Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky · 2023-02-21T22:17:58.416Z · LW · GW

See https://www.lesswrong.com/posts/8gqrbnW758qjHFTrH/security-mindset-and-ordinary-paranoia and https://www.lesswrong.com/posts/cpdsMuAHSWhWnKdog/security-mindset-and-the-logistic-success-curve for Yudkowsky's longform explanation of the metaphor.

Comment by Multicore (KaynanK) on The idea that ChatGPT is simply “predicting” the next word is, at best, misleading · 2023-02-20T23:04:37.945Z · LW · GW

Based on my incomplete understanding of transformers:

A transformer does its computation on the entire sequence of tokens at once, and ends up predicting the next token for each token in the sequence.

At each layer, the attention mechanism gives the stream for each token the ability to look at the previous layer's output for other token before it in the sequence.

The stream for each token doesn't know if it's the last in the sequence (and thus that its next-token prediction is the "main" prediction), or anything about the tokens that come after it.

So each token's stream has two tasks in training: predict the next token, and generate the information that later tokens will use to predict their next tokens.

That information could take many different forms, but in some cases it could look like a "plan" (a prediction about the large-scale structure of the piece of writing that begins with the observed sequence so far from this token-stream's point of view).

Comment by Multicore (KaynanK) on Quantum Suicide, Decision Theory, and The Multiverse · 2023-01-30T02:40:00.780Z · LW · GW

In the blackmail scenario, FDT refuses to pay if the blackmailer is a perfect predictor and the FDT agent is perfectly certain of that, and perfectly certain that the stated rules of the game will be followed exactly. However, with stakes of $1M against $1K, FDT might pay if the blackmailer had an 0.1% chance of guessing the agent's action incorrectly, or if the agent was less than 99.9% confident that the blackmailer was a perfect predictor.

(If the agent is concerned that predictably giving in to blackmail by imperfect predictors makes it exploitable, it can use a mixed strategy that refuses to pay just often enough that the blackmailer doesn't make any money in expectation.)

In Newcomb's Problem, the predictor doesn't have to be perfect - you should still one-box if the predictor is 99.9% or 95% or even 55% likely to predict your action correctly. But this scenario is extremely dependent on how many nines of accuracy the predictor has. This makes it less relevant to real life, where you might run into a 55% accurate predictor or a 90% accurate predictor, but never a perfect predictor.

Comment by Multicore (KaynanK) on [Linkpost] DreamerV3: A General RL Architecture · 2023-01-16T20:32:56.889Z · LW · GW

I'm not familiar with LeCun's ideas, but I don't think the idea of having an actor, critic, and world model is new in this paper. For a while, most RL algorithms have used an actor-critic architecture, including OpenAI's old favorite PPO. Model-based RL has been around for years as well, so probably plenty of projects have used an actor, critic, and world model.

Even though the core idea isn't novel, this paper getting good results might indicate that model-based RL is making more progress than expected, so if LeCun predicted that the future would look more like model-based RL, maybe he gets points for that.

Comment by KaynanK on [deleted post] 2022-12-22T05:56:59.350Z

This tag was originally prompted by this exchange: https://www.lesswrong.com/posts/qCc7tm29Guhz6mtf7/the-lesswrong-2021-review-intellectual-circle-expansion?commentId=CafTJyGL5cjrgSExF

Comment by KaynanK on [deleted post] 2022-12-22T05:39:16.742Z

Merge candidate with Philosophy of Language?

Comment by Multicore (KaynanK) on Enriching Youtube content recommendations · 2022-09-27T21:47:51.920Z · LW · GW

Things that probably actually fit into your interests:

A Sensible Introduction to Category Theory

Most of what 3blue1brown does

Videos that I found intellectually engaging but are far outside of the subjects that you listed:

Cursed Problems in Game Design

Luck and Skill in Games

Disney's FastPass: A Complicated History

The Congress of Vienna

Building a 6502-based computer from scratch (playlist)

(I am also a jan Misali fan)

Comment by Multicore (KaynanK) on LW Petrov Day 2022 (Monday, 9/26) · 2022-09-22T13:21:20.997Z · LW · GW

Comment by Multicore (KaynanK) on $13,000 of prizes for changing our mind about who to fund (Clearer Thinking Regrants Forecasting Tournament) · 2022-09-21T22:42:14.020Z · LW · GW

The preview-on-hover for those manifold links shows a 404 error. Not sure if this is Manifold's fault or LW's fault.

Comment by Multicore (KaynanK) on Features and Antifeatures · 2022-09-20T22:10:15.023Z · LW · GW

One antifeature I see promoted a lot is "It doesn't track your data". And this seems like it actually manages to be the main selling point on its own for products like DuckDuckGo, Firefox, and PinePhone.

The major difference from the game and movie examples is that these products have fewer competitors, with few or none sharing this particular antifeature.

Antifeatures work as marketing if a product is unique or almost unique in its category for having a highly desired antifeature. If there are lots of other products with the same antifeature, the antifeature alone won't sell the product. But the same is true of regular features. You can't convince your friends to play a game by saying "it has a story" or "it has a combat system" either.

Comment by Multicore (KaynanK) on Dan Luu on Futurist Predictions · 2022-09-16T02:34:45.276Z · LW · GW

On the first read I was annoyed at the post for criticizing futurists for being too certain in their predictions, while it also throws out and refuses to grade any prediction that expressed uncertainty, on the grounds that saying something "may" happen is unfalsifiable.

On reflection these two things seem mostly unrelated, and for the purpose of establishing a track record "may" predictions do seem strictly worse than either predicting confidently (which allows scoring % of predictions right), or predicting with a probability (which none of these futurists did, but allows creating a calibration curve).

Comment by KaynanK on [deleted post] 2022-09-08T15:27:02.112Z

Duplicate of Newsletters.

Comment by Multicore (KaynanK) on Breaking Newcomb's Problem with Non-Halting states · 2022-09-05T16:26:54.890Z · LW · GW

Yes. The one I described is the one the paper calls FairBot. It also defines PrudentBot, which looks for a proof that the other player cooperates with PrudentBot and a proof that it defects against DefectBot. PrudentBot defects against CooperateBot.

Comment by Multicore (KaynanK) on Breaking Newcomb's Problem with Non-Halting states · 2022-09-04T04:45:48.170Z · LW · GW

The part about two Predictors playing against each other reminded me of Robust Cooperation in the Prisoner's Dilemma, where two agents with the algorithm "If I find a proof that the other player cooperates with me, cooperate, otherwise defect" are able to mutually prove cooperation and cooperate.

If we use that framework, Marion plays "If I find a proof that the Predictor fills both boxes, two-box, else one-box" and the Predictor plays "If I find a proof that Marion one-boxes, fill both, else only fill box A". I don't understand the math very well, but I think in this case neither agent finds a proof, and the Predictor fills only box A while Marion takes only box B - the worst possible outcome for Marion.

Marion's third conditional might correspond to Marion only searching for proofs in PA, while the Predictor searches for proofs in PA+1, in which case Marion will not find a proof, the Predictor will, and then the Predictor fills both boxes and Marion takes only box B. But in this case clearly Marion has abandoned the ability to predict the Predictor and has given the Predictor epistemic vantage over her.

Comment by Multicore (KaynanK) on Sufficiently many Godzillas as an alignment strategy · 2022-08-28T02:15:13.901Z · LW · GW

I think in a lot of people's models, "10% chance of alignment by default" means "if you make a bunch of AIs, 10% chance that all of them are aligned, 90% chance that none of them are aligned", not "if you make a bunch of AIs, 10% of them will be aligned and 90% of them won't be".

And the 10% estimate just represents our ignorance about the true nature of reality; it's already true either that alignment happens by default or that it doesn't, we just don't know yet.

Comment by KaynanK on [deleted post] 2022-08-27T21:07:47.909Z

I generally disagree with the idea that fancy widgets and more processes are the main thing keeping the LW wiki from being good. I think the main problem is that not a lot of people are currently contributing to it.

The things that discourage me from contributing more look like:

-There are a lot of pages. If there are 700 bad pages and I write one really good page, there are still 699 bad pages.

-I don't have a good sense of which pages are most important. If I put a bunch of effort into a particular page, is that one that people are going to care about?

-I don't get much feedback about whether anyone saw the page after I edited it - karma for edits basically just comes from the tag dashboard and the frontpage activity feed.

So the improvements I would look for would be like:

-Expose view counts for wiki pages somewhere.

-Some sort of bat-signal on the tag dashboard for if a page is getting a lot of views but still has a bunch of TODO flags set.

-Big high-quality wiki page rewrites get promoted to frontpage or something.

-Someone of authority actually goes through and sets the "High Priority" flag on, say, 20 pages that they know are important and neglected.

-Some sort of event or bounty to drive more participation.

Comment by Multicore (KaynanK) on Can we get full audio for Eliezer's conversation with Sam Harris? · 2022-08-08T00:35:23.589Z · LW · GW

is one of the first results for "yudkowsky harris" on Youtube. Is there supposed to be more than this?

Comment by Multicore (KaynanK) on All AGI safety questions welcome (especially basic ones) [July 2022] · 2022-07-20T22:20:46.751Z · LW · GW

You should distinguish between “reward signal” as in the information that the outer optimization process uses to update the weights of the AI, and “reward signal” as in observations that the AI gets from the environment that an inner optimizer within the AI might pay attention to and care about.

From evolution’s perspective, your pain, pleasure, and other qualia are the second type of reward, while your inclusive genetic fitness is the first type. You can’t see your inclusive genetic fitness directly, though your observations of the environment can let you guess at it, and your qualia will only affect your inclusive genetic fitness indirectly by affecting what actions you take.

To answer your question about using multiple types of reward:

For the “outer optimization” type of reward, in modern ML the loss function used to train a network can have multiple components. For example, an update on an image-generating AI might say that the image it generated had too much blue in it, and didn’t look enough like a cat, and the discriminator network was able to tell it apart from a human generated image. Then the optimizer would generate a gradient descent step that improves the model on all those metrics simultaneously for that input.

For “intrinsic motivation” type rewards, the AI could have any reaction whatsoever to any particular input, depending on what reactions were useful to the outer optimization process that produced it. But in order for an environmental reward signal to do anything, the AI has to already be able to react to it.

Comment by Multicore (KaynanK) on Where I agree and disagree with Eliezer · 2022-06-26T18:38:52.248Z · LW · GW

This has overtaken the post it's responding to as the top-karma post of all time.

Comment by Multicore (KaynanK) on [Link] OpenAI: Learning to Play Minecraft with Video PreTraining (VPT) · 2022-06-24T04:24:43.573Z · LW · GW

I'm impressed by the number of different training regimes stacked on top of each other.

-Train a model that detects whether a Minecraft video on Youtube is free of external artifacts like face cams.

-Then feed the good videos to a model that's been trained using data from contractors to guess what key is being pressed each frame.

-Then use the videos and input data to train a model that, in any game situation, does whatever inputs it guesses a human would be most likely to do, in an undirected shortsighted way.

-And then fine-tune that model on a specific subset of videos that feature the early game.

-And only then use some mostly-standard RL training to get good at some task.

Comment by Multicore (KaynanK) on Parable: The Bomb that doesn't Explode · 2022-06-20T22:58:50.198Z · LW · GW

While the engineer learned one lesson, the PM will learn a different lesson when a bunch of the bombs start installing operating system updates during the mission, or won't work with the new wi-fi system, or something: the folly of trying to align an agent by applying a new special case patch whenever something goes wrong.

No matter how many patches you apply, the safety-optimizing agent keeps going for the nearest unblocked strategy, and if you keep applying patches eventually you get to a point where its solution is too complicated for you to understand how it could go wrong.

User info

Posts

Comments