Posts

Piling bounded arguments 2024-09-19T22:27:41.534Z
What criterion would you use to select companies likely to cause AI doom? 2023-07-13T20:31:31.152Z
Cheat sheet of AI X-risk 2023-06-29T04:28:32.292Z
Was Eliezer Yudkowsky right to give himself 10% to succeed with HPMoR in 2010? 2022-06-14T07:00:29.955Z
Do you like excessive sugar? 2021-10-09T10:40:29.942Z
How can there be a godless moral world ? 2021-06-21T12:34:13.770Z

Comments

Comment by momom2 (amaury-lorin) on When to join a respectability cascade · 2024-09-25T13:50:51.622Z · LW · GW

I agree with the broad idea, but I'm going to need a better implementation.
In particular, the 5 criteria you give are insufficient because the example you give scores well on them, and is still atrocious: if we decreed that "black people" was unacceptable and should be replaced by "black peoples", it would cause a lot of confusion on account of how similar the two terms are and how ineffective the change is.

The cascade happens because of a specific reason, and the change aims at resolving that reason. For example, "Jap" is used as a slur, and not saying it shows you don't mean to use a slur. For black people/s, I guess the reason would be something like not implying that there is a single black people, which only makes sense in the context of a specialized discussion.

I can't adhere to the criteria you proposed because they don't work, and I don't want to bother thinking that deep about every change of term on an everyday basis, so I'll keep on using intuition to choose when to solve respectability cascades for now.
For deciding when to trigger a respectability cascade, your criteria are interesting for having any sort of principled approach, but I'm still not sure they outperform unconstrained discussion on the subject (which I assume is the default alternative for anyone who cares enough about deliberately triggering respectability cascades to have read your post in the first place).

Comment by momom2 (amaury-lorin) on Why the 2024 election matters, the AI risk case for Harris, & what you can do to help · 2024-09-25T13:12:43.874Z · LW · GW
  • Probability of existential catastrophe before 2032 assuming AGI arrives in that period and Harris wins[12] = 30%

  • Probability of existential catastrophe before 2032 assuming AGI arrives in that period and Trump wins[13] = 35%.

A lot of your AI-risk reason to support Harris seems to hinge on this, which I find very shaky. How wide are your confidence intervals here?
My own guesses are much more fuzzy. According to your argument, if my intuition was .2 vs .5, then it's an overwhelming case for Harris but I'm unfamiliar enough with the topic that it could easily be the reverse.

I would greatly appreciate more details on how you reach your numbers (and if they're vibes, reason whether to trust those vibes).
Alternatively, I feel like I should somehow discount the strength of the AI-risk reason based on how likely I think these numbers are to more or less hold true, but I don't know a principled way to do it.

Comment by momom2 (amaury-lorin) on No One Can Exempt You From Rationality's Laws · 2024-09-18T11:29:44.180Z · LW · GW

Seems like you need to go beyond arguments of authority and stating your conclusions and instead go down to the object-level disagreements. You could say instead "Your argument for ~X is invalid because blah blah" and if Jacob says "Your argument for the invalidity of my argument for ~X is invalid because blah blah" then it's better than before because it's easier to evaluate argument validity than ground truth.
(And if that process continues ad infinitam, consider that someone who cannot evaluate the validity of the simplest arguments is not worth arguing with.)

Comment by momom2 (amaury-lorin) on The Mountain Troll · 2024-09-18T10:43:26.889Z · LW · GW

It's thought-provoking.
Many people here identify as Bayesians, but are as confused as Saundra by the troll's questions, which indicates that they're missing something important.

Comment by momom2 (amaury-lorin) on Secret Collusion: Will We Know When to Unplug AI? · 2024-09-16T19:19:38.688Z · LW · GW
Comment by momom2 (amaury-lorin) on No Safe Defense, Not Even Science · 2024-09-12T21:34:49.605Z · LW · GW

It wasn't mine. I did grow up in a religious family, but becoming a rationalist came gradually, without sharp divide with my social network. I always figured people around me were making all sorts of logical mistakes though, and noticed very early deep flaws in what I was taught.

Comment by momom2 (amaury-lorin) on KAN: Kolmogorov-Arnold Networks · 2024-09-12T09:25:31.545Z · LW · GW

It's not. The paper is hype, the authors don't actually show that this could replace MLPs.

Comment by momom2 (amaury-lorin) on Survey: How Do Elite Chinese Students Feel About the Risks of AI? · 2024-09-05T23:20:14.537Z · LW · GW

This is very interesting!
I did not expect that Chinese would be more optimistic about benefits than worried about risks and that they would rank it so low as an existential risk. 
This is in contrast with posts I see on social media and articles showcasing safety institutes and discussing doomer opinions, which gave me the impression that Chinese academia was generally more concerned about AI risk and especially existential risk than the US.

I'm not sure how to reconcile this survey's results with my previous model. Was I just wrong and updating too much on anecdotal evidence?
How representative of policymakers and of influential scientists do you think these results are?

Comment by momom2 (amaury-lorin) on Talking Snakes: A Cautionary Tale · 2024-09-02T15:44:01.757Z · LW · GW

About the Christians around me: it is not explicitly considered rude, but it is a signal that you want to challenge their worldview, and if you are going to predictably ask that kind of question often, you won't be welcome in open discussions.
(You could do it once or twice for anecdotal evidence, but if you actually want to know whether many Christians believe in a literal snake, you'll have to do a survey.)

Comment by momom2 (amaury-lorin) on Solving adversarial attacks in computer vision as a baby version of general AI alignment · 2024-08-29T22:10:28.191Z · LW · GW

I disagree – I think that no such perturbations exist in general, rather than that we have simply not had any luck finding them.

I have seen one such perturbation. It was two images of two people, one which was clearly male and the other female, though I wasn't be able to tell any significant difference between the two images on 15s of trying to find one except for a slight difference in hue. 
Unfortunately, I can't find this example again on a 10mn search. It was shared on Discord; the people in the image were white and freckled. I'll save it if I find it again.

Comment by momom2 (amaury-lorin) on But There's Still A Chance, Right? · 2024-08-01T20:27:59.575Z · LW · GW

The pyramids and Mexico and the pyramids in Egypt are related via architectural constraints and human psychology.

Comment by momom2 (amaury-lorin) on But There's Still A Chance, Right? · 2024-08-01T20:20:27.807Z · LW · GW

In practice, when people say "one in a million" in that kind of context, it's much higher than that. I haven't watched Dumb and Dumber, but I'd be surprised if Lloyd did not, actually, have a decent chance of ending together with Mary.

On one hand, we claim [dumb stuff using made up impossible numbers](https://www.lesswrong.com/posts/GrtbTAPfkJa4D6jjH/confidence-levels-inside-and-outside-an-argument) and on the other hand, we dismiss those numbers and fall back on there's-a-chancism.
These two phenomena don't always perfectly compensate one another (as examples show in both posts), but common sense is more reliable that it may seem at first. (I'm not saying it's the correct approach nonetheless.)

Comment by momom2 (amaury-lorin) on Is objective morality self-defeating? · 2024-07-30T18:52:20.749Z · LW · GW

Epistemic status: amateur, personal intuitions.

If this were the case, it makes sense to hold dogs (rather than their owners, or their breeding) responsible for aggressive or violent behaviour.

I'd consider whether punishing the dog would make the world better, or whether changing the system that led to its breeding, or providing incentives to the owner or any combination of other actions would be most effective.

Consequentialism is about considering the consequences of actions to judge them, but various people might wield this in various ways. 
Implicitly, with this concept of responsibility, you're considering a deontological approach to bad behavior: punish the guilty (perhaps using consequentialism to determine who's guilty though that's unclear from your argumentation afaict).

In an idealized case, I care about whether the environment I evolve in (including other people's and other people's dogs' actions) is performing well only insofar as I can change it, or said otherwise, I care only about how I can perform better.

(Then, because the world is messy, and I need to account for coordination with other people whose intuitions might not match mine, and society's recommendations, and my own human impulses etc... My moral system is only an intuition pump for lack of satisfactory metaethics.)

Comment by momom2 (amaury-lorin) on Towards more cooperative AI safety strategies · 2024-07-18T21:45:20.112Z · LW · GW

I can imagine plausible mechanisms for how the first four backlash examples were a consequence of perceived power-seeking from AI safetyists, but I don't see one for e/acc. Does someone have one?

Alternatively, what reason do I have to expect that there is a causal relationship between safetyist power-seeking and e/acc even if I can't see one?

Comment by momom2 (amaury-lorin) on Every Cause Wants To Be A Cult · 2024-07-13T12:11:42.295Z · LW · GW

That's not interesting to read unless you say what your reasons are and they differ from other critics'. Perhaps not say it all in a comment, but at least a link to a post.

Comment by momom2 (amaury-lorin) on Proving Too Much · 2024-07-09T13:45:37.584Z · LW · GW

Interestingly, I think that one of the examples of proving too much on Wikipedia can itself be demolished by a proving too much argument, but I’m not going to say which one it is because I want to see if other people independently come to the same conclusion.

For those interested in the puzzle, here is the page Scott was linking to at the time: https://en.wikipedia.org/w/index.php?title=Proving_too_much&oldid=542064614
The article was edited a few hours later, and subsequent conversation showed that Wikipedia editors came to the conclusion Scott hinted at, though the suspicious timing indicates that they probably did so on reading Scott's article rather than independently.

Comment by momom2 (amaury-lorin) on Proving Too Much · 2024-07-09T13:31:04.906Z · LW · GW

Another way to avoid the mistake is to notice that the implication is false, regardless of the premises. 
In practice, people's beliefs are not deductively closed, and (in the context of a natural language argument) we treat propositional formulas as tools for computing truths rather than timeless statements.

Comment by momom2 (amaury-lorin) on Proving Too Much · 2024-07-09T13:26:42.897Z · LW · GW

it can double as a method for creating jelly donuts on demand

For those reading this years later, here's the comic that shows how to make ontologically necessary donuts.

Comment by momom2 (amaury-lorin) on No really, the Sticker Shortcut fallacy is indeed a fallacy · 2024-06-22T10:03:22.549Z · LW · GW

I'd appreciate examples of the sticker shortcut fallacy with in-depth analysis of why they're wrong and how the information should have been communicated instead.

Comment by momom2 (amaury-lorin) on OpenAI #8: The Right to Warn · 2024-06-19T15:49:28.579Z · LW · GW

"Anyone thinks they're a reckless idiot" is far too easy a bar to reach for any public figure.
I do not know of major anti-Altman currents in my country, but considering surveys consistently show a majority of people worried about AI risk, a normal distribution of extremeness of opinion on the subject ensures there'll be many who do consider Sam Altman a reckless idiot (for good or bad reason - I expect a majority of them to consider Sam Altman to have any negative trait that comes to their attention because it is just that easy to have a narrow hateful opinion on a subject for a large portion of the population).

Comment by momom2 (amaury-lorin) on Boycott OpenAI · 2024-06-19T14:54:46.231Z · LW · GW

I have cancelled my subscription as well. I don't have much to add to the discussion, but I think signalling participation in the boycott will help conditional on the boycott having positive value.

Comment by momom2 (amaury-lorin) on Boycott OpenAI · 2024-06-19T14:53:36.033Z · LW · GW

Thanks for the information.
Consider though that for many people the price of the subscription is motivated by convenience of access and use.

It took me a second to see how your comment was related to the post so here it is for others: 
Given this information, using the API preserves most of the benefits of access to SOTA AI (assuming away the convenience value) while destroying most of the value for OpenAI, which makes this a very effective intervention compared to cancelling the subscription entirely.

Comment by momom2 (amaury-lorin) on LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!") · 2024-06-19T12:05:37.508Z · LW · GW

When I vote, I basically know the full effect this has on what is shown to other users or to myself. 

Mindblowing moment: It has been a private pet peeve of mine that it was very unclear what policy I should follow for voting.

In practice, I vote mostly on vibes (and expect most people to), but given my own practices for browsing LW, I also considered alternative approaches.
- Voting in order to assign a specific score (weighted for inflation by time and author) to the post. Related uses: comparing karma of articles, finding desirable articles on a given topic.
- Voting in order to match an equivalent-value article. Related uses: same; perhaps effective as a community norm but more effortful.
- Voting up if the article is good, down if it's bad (after memetic/community/bias considerations) (regardless of current karma). Related uses: karma as indicator of community opinion.

In the end, making my votes consistent turned out to be too much effort in every case for extensive calculations, which is why I came back to vibes, amended by implicit considerations of consistent ways to vote.
 

I was trying to figure out ways to vote which would put me in a class of voters that marginally improved my personal browsing experience.
It never occurred to me to model the impact it would have on others and to optimize for their experience.
This sounds like an obviously better way to vote.

So for anyone who was in the same case as me, please optimize for others' browsing experience (or your own) directly rather than overcalculate decision-theoretic whatevers.

Comment by momom2 (amaury-lorin) on What Failure Looks Like is not an existential risk (and alignment is not the solution) · 2024-02-03T19:52:03.540Z · LW · GW

Not everything suboptimal, but suboptimal in a way that causes suffering on an astronomical scale (e.g. galactic dystopia, or dystopia that lasts for thousands of years, or dystopia with an extreme number of moral patients (e.g. uploads)).
I'm not sure what you mean by Ord, but I think it's reasonable to have a significant probability of S-risk from a Christiano-like failure.

Comment by momom2 (amaury-lorin) on What Failure Looks Like is not an existential risk (and alignment is not the solution) · 2024-02-02T23:17:00.195Z · LW · GW

I think you miss one important existential risk separate from extinction, which is having a lastingly suboptimal society. Like, systematic institutional inefficiency, and being unable to change anything because of disempowerment.
In that scenario, maybe humanity is still around because one of the things we can measure and optimize for is making sure a minimum amount of humans are alive, but the living conditions are undesirable.

Comment by momom2 (amaury-lorin) on This might be the last AI Safety Camp · 2024-01-27T22:40:12.070Z · LW · GW

I'm not sure either, but here's my current model:
Even though it looks pretty likely that AISC is an improvement on no-AISC, there are very few potential funders:
1) EA-adjacent caritative organizations.
2) People from AIS/rat communities.

Now, how to explain their decisions?
For the former, my guess would be a mix of not having heard of/received an application from AISC and preferring to optimize heavily towards top-rated charities. AISC's work is hard to quantify, as you can tell from the most upvoted comments, and that's a problem when you're looking for projects to invest because you need to avoid being criticized for that kind of choice if it turns out AISC is crackpotist/a waste of funds. The Copenhagen interpretation of ethics applies hard there for an opponent with a tooth against the organization.
For the latter, it depends a lot on individual people, but here are the possibilities that come to mind:
- Not wanting donate anything but feeling like having to, which leads to large donations to few projects when you feel like donating enough to break the status quo bias.
- Being especially mindful of one's finances and donating only to preferred charities, because of a personal attachment (again, not likely to pick AISC a priori) or because they're provably effective.

To answer 2), you can say why you don't donate to AISC? Your motivations are probably very similar to other potential donators here.

Comment by momom2 (amaury-lorin) on Making a Secular Solstice Songbook · 2024-01-27T22:20:59.800Z · LW · GW

Follow this link to find it. The translation is made by me, and open to comments. Don't hesitate to suggest improvements.

Comment by momom2 (amaury-lorin) on ' petertodd'’s last stand: The final days of open GPT-3 research · 2024-01-23T22:01:59.704Z · LW · GW

It's not obvious at all to me, but it's certainly a plausible theory worth testing!

Comment by momom2 (amaury-lorin) on Making a Secular Solstice Songbook · 2024-01-23T21:45:36.677Z · LW · GW

To whom it may concern, here's a translation of "Bold Orion" in French.

Comment by momom2 (amaury-lorin) on Is being sexy for your homies? · 2023-12-14T02:03:55.274Z · LW · GW

A lot of the argumentation in this post is plausible, but also, like, not very compelling?
Mostly the "frictionless" model of sexual/gender norms, and the examples associated: I can see why these situations are plausible (if at least because they're very present in my local culture) but I wouldn't be surprised if they are a bunch of social myth either, in which case the whole post is invalidated.

I appreciate the effort though; it's food for thought even if it doesn't tell me much about how to update based on the conclusion.

Comment by momom2 (amaury-lorin) on Critique-a-Thon of AI Alignment Plans · 2023-12-05T21:25:53.763Z · LW · GW

Epistemic status: Had a couple conversations on AI Plans with the founder, participated in the previous critique-a-thon. I've helped AI Plans a bit before, so I'm probably biased towards optimism.

 

Neglectedness: Very neglected. AI Plans wants to become a database of alignment plans which would allow quick evaluation of whether an approach is worth spending effort on, at least as a quick sanity check for outsiders. I can't believe it didn't exist before! Still very rough and unuseable for that purpose for now, but that's what the critique-a-thon is for: hopefully, as critiques accumulate and more votes are fed into the system, it will become more useful.

Tractability: High. It may be hard to make winning critiques, but considering the current state of AI Plans, it's very easy to make an improvement. If anything, you can filter out the obvious failures.

Impact: I'm not as confident here. If AI Plans works as intended, it could be very valuable to allocate funds more efficiently and save time by figuring out which approaches should be discarded. However, it's possible that it will just fail to gain steam and become a stillborn project. I've followed it for a couple months, and I've been positively surprised several times, so I'm pretty optimistic.

 

The bar to entry is pretty low; if you've been following AIS blogs or forums for several months, you probably have something to contribute. It's very unlikely you'll have a negative impact.
It may also be an opportunity for you to discuss with AIS-minded people and check your opinions on a practical problem; if you feel like an armchair safetyist and tired to be one, this is the occasion to level up.
Another way to think about it is that the engagement was very low in previous critique-a-thon so if you have a few hours to spare, you can make some easy money and fuzzies even if you're not sure about the value in utilons.

Comment by momom2 (amaury-lorin) on Game Theory without Argmax [Part 2] · 2023-12-05T15:51:07.923Z · LW · GW

Thank you, this is incredibly interesting! Did you ever write up more on the subject? I'm excited to see how it relates to mesa-optimisation in particular.

In the finite case, where , then 

Typo: I think you mean  ?

Comment by momom2 (amaury-lorin) on Integrity in AI Governance and Advocacy · 2023-11-22T12:51:19.843Z · LW · GW

I'm surprised to hear they're posting updates about CoEm.

At a conference held by Connor Leahy, I said that I thought it was very unlikely to work, and asked why they were interested in this research area, and he answered that they were not seriously invested in it.

We didn't develop the topic and it was several months ago, so it's possible that 1- I misremember or 2- they changed their minds 3- I appeared adversarial and he didn't feel like debating CoEm. (For example, maybe he actually said that CoEm didn't look promising and this changed recently?)
Still, anecdotal evidence is better than nothing, and I look forward to seeing OliviaJ compile a document to shed some light on it.

Comment by momom2 (amaury-lorin) on Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example · 2023-11-21T16:21:12.983Z · LW · GW

Nice! Is this on ai-plans already?

Comment by momom2 (amaury-lorin) on Age changes what you care about · 2023-11-21T14:06:03.183Z · LW · GW

I invite you. You can send me this summary in private to avoid downvotes.

Comment by momom2 (amaury-lorin) on “Why can’t you just turn it off?” · 2023-11-21T09:28:51.895Z · LW · GW

There's a whole part of the argument which is missing which is the framing of this as being about AI risk.
I've seen various propositions for why this happened, and the board being worried about AI risk is one of them but not the most plausible afaict.
 

In addition this is phrased similarly to technical problems like the corrigibility, which it is very much not about.
People who say "why can't you just turn it off" typically refer to literally turning off the AI if it appears to be dangerous, which this is not about. This is about turning off the AI company, not the AI.

Comment by momom2 (amaury-lorin) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-10-31T07:08:01.880Z · LW · GW

1- I didn't know Executive Order could be repealed easily. Could you please develop?
2- Why is it good news? To me, this looks like a clear improvement on the previous status of regulations.

Comment by momom2 (amaury-lorin) on Architects of Our Own Demise: We Should Stop Developing AI Carelessly · 2023-10-26T07:59:02.231Z · LW · GW

AlexNet dates back to 2012, I don't think previous work on AI can be compared to modern statistical AI.
Paul Christiano's foundational paper on RLHF dates back to 2017.
Arguably, all of agent foundations work turned out to be useless so far, so prosaic alignment work may be what Roko is taking as the beginning of AIS as a field.

Comment by momom2 (amaury-lorin) on AI Safety is Dropping the Ball on Clown Attacks · 2023-10-22T09:50:12.036Z · LW · GW

The AI safety leaders currently see slow takeoff as humans gaining capabilities, and this is true; and also already happening, depending on your definition. But they are missing the mathematically provable fact that information processing capabilities of AI are heavily stacked towards a novel paradigm of powerful psychology research, which by default is dramatically widening the attack surface of the human mind.

I assume you do not have a mathematical proof of that, or you'd have mentioned it. What makes you think it is mathematically provable?
I would be very interested in reading more about the avenues of research dedicated to showing how AI can be used for psychological attacks from the perspective of AIS (I'd expect such research to be private by default due to infohazards).

Comment by amaury-lorin on [deleted post] 2023-10-19T09:20:32.617Z

I don't understand how the parts fit together. For example, what's the point of presenting the (t-,n)-AGI framework or the Four Background Claims?

Comment by amaury-lorin on [deleted post] 2023-10-19T09:17:12.130Z

I assume it's incomplete. It doesn't present the other 3 anchors mentioned, nor forecasting studies.

Comment by momom2 (amaury-lorin) on How should TurnTrout handle his DeepMind equity situation? · 2023-10-17T08:08:40.970Z · LW · GW

To avoid being negatively influenced by perverse incentives to make societally risky plays, couldn't TurnTrout just leave the handling of his finances to someone else and be unaware of whether or not he has Google stock?
 

Doesn't matter if he does, as long as he doesn't think he does; and if he's uncertain about it, I think psychologically it'll already greatly reduce caring about Google stock.

Comment by momom2 (amaury-lorin) on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-09-29T20:58:06.304Z · LW · GW

Not before reading the link, but Elizabeth did state that they expected the pro-meat section to be terrible without reading it, presumably because of the first part.

Since the article is low-quality in the part they read and expected low-quality in the part they didn't, they shouldn't take it as evidence of anything at all; that is why I think it's probably confirmation bias to take it as evidence against excess meat being related to health issues.

Reason for retraction: In hindsight, I think my tone was unjustifiably harsh and incendiary. Also the karma tells that whatever I wrote probably wasn't that interesting.

Comment by momom2 (amaury-lorin) on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-09-29T12:04:43.654Z · LW · GW

That’s the first five subsections. The next set maybe look better sourced, but I can’t imagine them being good enough to redeem the paper. I am less convinced of the link between excess meat and health issues than I was before I read it, because surely if the claim was easy to prove the paper would have better supporting evidence, or the EA Forum commenter would have picked a better source.

That's confirmation bias if I've ever seen it.
It seems likely to me that you're exposed to a lot of low-quality anti-meat content, and you should correct for selection bias since you're likely to only read what will support your views that the arguments are bad, and recommendation algorithms often select for infuriatingness.

[Note: I didn’t bother reading the pro-meat section. It may also be terrible, but this does not affect my position.]

??? Surely if meat-being-good was easy to prove, the paper would have better supporting (expected) evidence.

You should probably take a step back and disengage from that topic to restore your epistemics about how you engage with (expected low-quality) pro-vegan content.

Comment by momom2 (amaury-lorin) on Understanding strategic deception and deceptive alignment · 2023-09-26T09:07:36.055Z · LW · GW

A model is deceptively aligned with its designers. However, the designers have very good control mechanisms in place such that they would certainly catch the AI if it tried to act misaligned. Therefore, the model acts aligned with the designers' intentions 100% of the time. In this world, a model that is technically deceptively aligned may still be safe in practice (although this equilibrium could be fragile and unsafe in the long run).

In that case, there is no strategic deception (the designers are not induced in error by the AI).

I think we consider this case strategic deception, because we have an intuition of what inputs the AI receives (one where it would be controlled little enough to be deceptive) that differs from the actual one.

To fix this, I propose strategic deception not be defined according to the AI's behavior, but to its hypothetical behavior on an idealized class of inputs that represents all situations where the designers want the AI to behave in a certain way.

E.g. The designers are holding the AI in a simulation to check if it's deceptive. This information is given to the AI in a pre-prompt due to technical issues. However, the designers want the AI to avoid strategic deception even during deployment where it won't be told it's in a simulation, so their idealized test set includes prompts without this information. 
By this definition they cannot check if the AI exhibits strategic deception before deployment in this situation.


Also, I am unsatisfied by "in order to accomplish some outcome" and "[the AI's] goals" because this assumes an agentic framework, which might not be relevant in real-world AI.

How to fix the first, for agentic AI only: "for which the AI predicts an outcome that can be human-interpreted as furthering its goals"
Not sure how to talk about deceptive non-agentic AI.

Comment by momom2 (amaury-lorin) on Interpreting OpenAI's Whisper · 2023-09-26T08:46:37.828Z · LW · GW

At a glance, I couldn't find any significant capability externality, but I think that all interpretability work should, as a standard, have a paragraph explaining why the authors won't think their work will be used to improve AI systems in an unsafe manner.

Comment by momom2 (amaury-lorin) on Logical Pinpointing · 2023-09-25T06:38:40.877Z · LW · GW

Seeing as the above response wasn't very upvoted, I'll try to explain in simpler terms.
If 2+2 comes out 5 the one-thrillionth-and-first time we compute it, then our calculation does not match numbers.
... which we can tell because?
...and writing this now I realize why the answer was more upvoted, because this is circular reasoning. ':-s
Sorry, I have no clue.

Comment by momom2 (amaury-lorin) on Where might I direct promising-to-me researchers to apply for alignment jobs/grants? · 2023-09-19T21:14:36.421Z · LW · GW

In France, EffiSciences is looking for new members and interns.

Comment by momom2 (amaury-lorin) on The salt in pasta water fallacy · 2023-09-18T15:53:43.098Z · LW · GW

Sounds like those people are victim of a salt-in-pasta-water fallacy.

Comment by momom2 (amaury-lorin) on Find Hot French Food Near Me: A Follow-up · 2023-09-09T17:16:13.774Z · LW · GW

It's also very old-fashioned. Can't say I've ever heard anyone below 60 say "pétard" unironically.