nathaniel-monson

Posts
Comments

Posts

Newcomb II: Newer and Comb-ier 2023-07-13T18:49:00.989Z

Comments

Comment by Nathaniel Monson (nathaniel-monson) on Is a random box of gas predictable after 20 seconds? · 2024-01-26T07:24:56.586Z · LW · GW

Minor nitpicks: -I read "1 angstrom of uncertainty in 1 atom" as the location is normally distributed with mean <center> and SD 1 angstrom, or as uniformly distributed in solid sphere of radius 1 angstrom. Taken literally, though, "perturb one of the particles by 1 angstrom in a random direction" is distributed on the surface of the sphere (particle is known to be exactly 1 angstrom from <center>). -The answer will absolutely depend on the temperature. (in a neighborhood of absolute zero, the final positions of the gas particles are very close to the initial positions.) -The answer also might depend on the exact starting configuration. While I think most configurations would end up ~50/50 chance after 20 seconds, there are definitely configurations that would be stably strongly on one side.

Nothing conclusive below, but things that might help: -Back-of-envelope calculation said the single uncertain particle has ~(10 million * sqrt(temp in K)) collisions /sec. -If I'm using MSD right (big if!) then at STP, particles move from initial position only by about 5 cm in 20 seconds (cover massive distance, but the brownian motion cancels in expectation.) -I think that at standard temp, this would be at roughly 1/50 standard pressure?

Comment by Nathaniel Monson (nathaniel-monson) on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-16T01:32:08.700Z · LW · GW

"we don't know if deceptive alignment is real at all (I maintain it isn't, on the mainline)."

You think it isn't a substantial risk of LLMs as they are trained today, or that it isn't a risk of any plausible training regime for any plausible deep learning system? (I would agree with the first, but not the second)

Comment by Nathaniel Monson (nathaniel-monson) on Terminology: <something>-ware for ML? · 2024-01-04T00:01:41.364Z · LW · GW

I agree in the narrow sense of different from bio-evolution, but I think it captures something tonally correct anyway.

Comment by Nathaniel Monson (nathaniel-monson) on Terminology: <something>-ware for ML? · 2024-01-03T23:12:38.529Z · LW · GW

I like "evolveware" myself.

Comment by Nathaniel Monson (nathaniel-monson) on A Question For People Who Believe In God · 2023-11-24T18:24:47.284Z · LW · GW

I'm not really sure how it ended up there--probably childhood teaching inducing that particular brain-structure? It's just something that was a fundamental part of who I understood myself to be, and how I interpreted my memories/experiences/sense-data. After I stopped believing in God, I definitely also stopped believing that I existed. Obviously, this-body-with-a-mind exists, but I had not identified myself as being that object previously--I had identified myself as the-spirit-inhabiting-this-body, and I no longer believed that existed.

Comment by Nathaniel Monson (nathaniel-monson) on A Question For People Who Believe In God · 2023-11-24T15:28:46.629Z · LW · GW

This is why I added "for the first few". Let's not worry about the location, just say "there is a round cube" and "there is a teapot".

Before you can get to either of these axioms , you need some things like "there is a thing I'm going to call reality that it's worth trying to deal with" and "language has enough correspondence to reality to be useful". With those and some similar very low level base axioms in place (and depending on your definitions of round and cube and teapot), I agree that one or another of the axioms could reasonably be called more or less reasonable, rational, probable, etc.

I think when I believed in God, it was roughly third on the list? Certainly before usefulness of language. The first two were something like me existing in time, with a history and memories that had some accuracy, and sense-data being useful.

Comment by Nathaniel Monson (nathaniel-monson) on A Question For People Who Believe In God · 2023-11-24T06:49:00.285Z · LW · GW

I don't think I believe in God anymore--certainly not in the way I used to--but I think if you'd asked me 3 years ago, I would have said that I take it as axiomatic that God exists. If you have any kind of consistent epistemology, you need some base beliefs from which to draw the conclusions and one of mine was the existence of an entity that cared about me (and everyone on earth) on a personal level and was sufficiently more wise/intelligent/powerful/knowledgeable than me that I may as well think of it as infinitely so.

I think the religious people I know who've thought deeply about their epistemology take either the existence of God or the reliability of a sort of spiritual sensory modality as an axiom.

While I no longer believe in God, I don't think I had a perspective any less epistemically rational then than I do now. I don't think there's a way to use rationality to pick axioms, the process is inherently arational (for the first few, anyway).

Comment by Nathaniel Monson (nathaniel-monson) on Who is Sam Bankman-Fried (SBF) really, and how could he have done what he did? - three theories and a lot of evidence · 2023-11-16T19:42:38.713Z · LW · GW

That's fair. I guess I'm used to linkposts which are either full, or a short enough excerpt that I can immediately see they aren't full.

Comment by Nathaniel Monson (nathaniel-monson) on In Defense of Parselmouths · 2023-11-16T08:13:43.091Z · LW · GW

I really appreciated both the original linked post and this one. Thank you, you've been writing some great stuff recently.

One strategy I have, as someone who simultaneously would like to be truth-committed and also occasionally jokes or teases loved ones ("the cake you made is terrible! No one else should have any, I'll sacrifice my taste buds to save everyone!") is to have triggers for entering quaker-mode; if someone asks me a question involving "really" or "actually", I try to switch my demeanour to clearly sincere, and give a literally honest answer. I... hope? that having an explicit mode of truth this way blunts some of the negatives of frequently functioning as an actor.

Comment by Nathaniel Monson (nathaniel-monson) on Genetic fitness is a measure of selection strength, not the selection target · 2023-11-16T07:13:39.091Z · LW · GW

Would you say it's ... _cat_egorically impossible?

Comment by Nathaniel Monson (nathaniel-monson) on Reinforcement Via Giving People Cookies · 2023-11-16T01:26:40.054Z · LW · GW

I actually fundamentally agree with most/all of it, I just wanted a cookie :)

Comment by Nathaniel Monson (nathaniel-monson) on Reinforcement Via Giving People Cookies · 2023-11-16T00:34:37.569Z · LW · GW

I strongly disagreed with all of this!

.
(cookie please!)

Glad to, thanks for taking it well.

I think this would have been mitigated by something at the beginning saying "this is an excerpt of x words of a y word post located at url", so I can decide at the outset to read here, read there, or skip.

Is the reason you didn't put the entire thing here basically blog traffic numbers?

(I didn't downvote, but here's a guess) I enjoyed what there was of it, but I got really irritated by "This is not the full post - for the rest of it, including an in-depth discussion of the evidence for and against each of these theories, you can find the full version of this post on my blog". I don't know why this bothers me--maybe because I pay some attention to the "time to read" tag at the top, or because having to click through to a different page feels like an annoyance with no benefit to me.

If you click the link where OP introduces the term, it's the Wikipedia page for psychopathy. Wiki lists 3 primary traits for it, one of which is DAE

Comment by Nathaniel Monson (nathaniel-monson) on Saying the quiet part out loud: trading off x-risk for personal immortality · 2023-11-02T19:40:23.038Z · LW · GW

The statement seems like it's assuming:

we know roughly how to build AGI
we decide when to do that
we use the time between now and then to increase chance of successful alignment
if we succeed in alignment early enough, you and your loved ones won't die

I don't think any of these are necessarily true, and I think the ways they are false is asymmetric in a manner that favors caution

Comment by Nathaniel Monson (nathaniel-monson) on Beyond the Data: Why aid to poor doesn't work · 2023-10-30T14:19:51.711Z · LW · GW

I appreciated your post, (indeed, I found it very moving) and found some of the other comments frustrating as I believe you did. I think, though, that I can see a part of where they are coming from. I'll preface by saying I don't have strong beliefs on this myself, but I'll try to translate (my guess at) their world model.

I think the typical EA/LWer thinks that most charities are ineffective to the point of uselessness, and this is due to them not being smart/rational about a lot of things (and are very familiar with examples like the millennium village). They probably believe it costs roughly 5000 USD to save a life, which makes your line "Many of us are used to the ads that boast of every 2-3 dollars saving a life..." read like you haven't engaged much with their world. They agree that institutions matter a huge amount and that many forms of aid fail because of bad institutions.

They probably also believe the exact shape of the dose-response curve to treating poverty with direct aid is unknown, but have a prior of it being positively sloped but flatter than we wish. There is a popular rationalist technique of "if x seems like it is helping the problem, just not as much as you wish, try way way more x." (Eg, light for SAD)

I would guess your post reads to them like someone finding out that the dose-response curve is very flat and and that many charities are ineffective and then writing "maybe the dose-response curve isn't even positively sloped!" It reads to them like the claim "no (feasible) amount of direct aid will help with poverty" followed by evidence that the slope is not as steep as we all wish. I don't think any of your evidence suggests aid cannot have a positive effect, just that the amount necessary for that effect to be permanent is quite high.

Add this to your ending by donating money to give directly, and it seems like you either are behaving irrationally, or you agree that it has some marginal positive impact and were preaching to the choir.

As I said, I appreciated it, and the work that goes into making your world model and preparing it for posting, and engaging with commenters. Thank you.

Comment by Nathaniel Monson (nathaniel-monson) on Book Review: Going Infinite · 2023-10-26T01:39:47.943Z · LW · GW

This is more a tangent than a direct response--I think I fundamentally agree with almost everything you wrote--but I dont think virtue ethics requires tossing out the other two (although I agree both of the others require tossing out each other).

I view virtue ethics as saying, roughly, "the actually important thing almost always is not how you act in contrived edge case thought experiments, but rather how how habitually act in day to day circumstances. Thus you should worry less, probably much much less, about said thought experiments, and worry more about virtuous behavior in all the circumstances where deontology and utilitarianism have no major conflicts". I take it as making a claim about correct use of time and thought-energy, rather than about perfectly correct morality. It thus can extend to "...and we think (D/U) ethics are ultimately best served this way, and please use (D/U) ethics if one of those corner cases ever shows up" for either deontology or (several versions of) utilitarianism, basically smoothly.

Comment by Nathaniel Monson (nathaniel-monson) on What's Hard About The Shutdown Problem · 2023-10-25T14:23:48.951Z · LW · GW

I agree with the first paragraph, but strongly disagree with the idea this is "basically just trying to align to human values directly".

Human values are a moving target in a very high dimensional space, which needs many bits to specify. At a given time, this needs one bit. A coinflip has a good shot. Also, to use your language, I think "human is trying to press the button" is likely to form a much cleaner natural abstraction than human values generally.

Finally, we talk about getting it wrong being really bad. But there's a strong asymmetry --one direction is potentially catastrophic, the other is likely to only be a minor nuisance. So if we can bias it in favor of believing the humans probably want to press the button, it becomes even more safe.

Comment by Nathaniel Monson (nathaniel-monson) on Lying is Cowardice, not Strategy · 2023-10-25T07:34:01.356Z · LW · GW

If I had clear lines in my mind between AGI capabilities progress, AGI alignment progress, and narrow AI progress, I would be 100% with you on stopping AGI capabilities. As it is, though, I don't know how to count things. Is "understanding why neural net training behaves as it does" good or bad? (SLT's goal). Is "determining the necessary structures of intelligence for a given architecture" good or bad? (Some strands of mech interp). Is an LLM narrow or general?

How do you tell, or at least approximate? (These are genuine questions, not rhetorical)

Comment by Nathaniel Monson (nathaniel-monson) on What's Hard About The Shutdown Problem · 2023-10-24T15:07:11.272Z · LW · GW

In the spirit of "no stupid questions", why not have the AI prefer to have the button in the state that it believes matches my preferences?

I'm aware this fails against AIs that can successfully act highly manipulative towards humans, but such an AI is already terrifying for all sorts of other reasons, and I think the likelihood of this form of corrigibility making a difference given such an AI is quite low.

Is the answer roughly "we don't care about the off-button specifically that much, we care about getting the AI to interact with human preferences which are changeable without changing them"?

Comment by Nathaniel Monson (nathaniel-monson) on Are humans misaligned with evolution? · 2023-10-19T14:42:13.815Z · LW · GW

Question for Jacob: suppose we end up getting a single, unique, superintelligent AGI, and the amount it cares about, values, and prioritizes human welfare relative to its other values is a random draw with probability distribution equal to how much random humans care about maximizing their total number of direct descendents.

Would you consider that an alignment success?

Comment by Nathaniel Monson (nathaniel-monson) on Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.) · 2023-10-16T21:49:35.883Z · LW · GW

Thanks for writing this! I strongly appreciate a well-thought out post in this direction.

My own level of worry is pretty dependent on a belief that we know and understand shaping NN behaviors much better than we do (values/goals/motivations/desires) (although I don't think eg chatGPT has any of the latter in the first place). Do you have thoughts on the distinction between behaviors and goals? In particular, do you feel like you have any evidence we know how to shape/create/guide goals and values, rather than just behaviors?

Comment by Nathaniel Monson (nathaniel-monson) on Which Anaesthetic To Choose? · 2023-10-15T02:54:57.593Z · LW · GW

I don't think the end result is identical. If you take B, you now have evidence that, if a similar situation arises again, you won't have to experience excruciating pain. Your past actions and decisions are relevant evidence of future actions and decisions. If you take drug A, your chance of experiencing excruciating pain at some point in the future goes up (at least your subjective estimation of the probability should probably go up at least a bit.) I would pay a dollar to lower my best rational estimate of the chance of something like that happening to me--wouldn't you?

Comment by Nathaniel Monson (nathaniel-monson) on Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle · 2023-10-13T10:48:02.961Z · LW · GW

In the dual interest of increasing your pleasantness to interact with and your epistemic rationality, I will point out that your last paragraph is false. You are allowed to care about anything and everything you may happen to care about or choose to care about. As an aspiring epistemic rationalist, the way in which you are bound is to be honest with yourself about message-description lengths, and your own values and your own actions, and the tradeoffs they reflect.

If a crazy person holding a gun said to you (and you believed) "i will shoot you unless you tell me that you are a yellow dinosaur named Timothy", your epistemic rationality is not compromised by lying to save your life (as long as you are aware it is a lie). Similarly, if you value human social groups, whether intrinsically or instrumentally, you are allowed to externally use longer-than-necessary description lengths if you so choose without any bit of damage to your own epistemic rationality. You may worry that you damage the epistemic rationality of the group or its members, but damaging a community by using the shortest description lengths could also do damage to its epistemic rationality.

Comment by Nathaniel Monson (nathaniel-monson) on Inside Views, Impostor Syndrome, and the Great LARP · 2023-10-13T01:43:58.171Z · LW · GW

My understanding of the etymology of "toe the line" is that it comes from the military--all the recuits in a group lining up , with their toes touching (but never over!) a line. Hence "I need you all to toe the line on this" means "do exactly this, with military precision"

Comment by Nathaniel Monson (nathaniel-monson) on How to solve deception and still fail. · 2023-10-05T07:02:10.976Z · LW · GW

I think I would describe both of those as deceptive, and was premising on non-deceptive AI.

If you think "nondeceptive AI" can refer to an AI which has a goal and is willing to mislead in service of that goal, then I agree; solving deception is insufficient. (Although in that case I disagree with your terminology).

Comment by Nathaniel Monson (nathaniel-monson) on When to Get the Booster? · 2023-10-04T21:26:39.679Z · LW · GW

I think the people I know well over 65 (my parents, my surviving grandparent, some professors) are trying to not get COVID--they go to stores only in off-peak hours, avoid large gatherings, don't travel much. These seem like basically worth-it decisions to me (low benefit, but even lower cost). This means that their chance of getting COVID is much much higher when, eg, seeing relatives who just took a plane flight to see them.

I agree that the flu is comparably worrisome, and it wouldn't make sense to get a COVID booster but not a flu vaccine.

Comment by Nathaniel Monson (nathaniel-monson) on How to solve deception and still fail. · 2023-10-04T21:15:29.798Z · LW · GW

Those doesn't necessarily seem correct to me. If, eg, OpenAI develops a super intelligent, non deceptive AI, then I'd expect some of the first questions they'd ask it to be of the form "are there questions which we would regret asking you, according to our own current values? How can we avoid asking you those while still getting lots of use and insight from you? What are some standard prefaces we should attach to questions to make sure following through on your answer is good for us? What are some security measures that we can take to make sure our users lives are generally improved by interacting with you? What are some security measures we can take to minimize the chances of a world turning out very badly according to our own desires?" Etc.

Comment by Nathaniel Monson (nathaniel-monson) on When to Get the Booster? · 2023-10-04T19:09:49.307Z · LW · GW

Surely your self-estimated chance of exposure and number of high-risk people you would in turn expose should factor in somewhere? I agree with you for people who aren't traveling, but someone who, eg, flies into a major conference and then is visiting a retirement home the week after is doing a different calculation.

Comment by Nathaniel Monson (nathaniel-monson) on What evidence is there of LLM's containing world models? · 2023-10-04T18:51:15.706Z · LW · GW

When I started trying to think rigorously about this a few months ago, I realized that I don't have a very good definition of world model. In particular, what does it mean to claim a person has a world model? Given a criteria for an LLM to have one, how confident am I that most people would satisfy the criteria?

Comment by Nathaniel Monson (nathaniel-monson) on Thomas Kwa's MIRI research experience · 2023-10-04T17:28:27.590Z · LW · GW

I think it is 2-way, which is why many (almost all?) Alignment researchers have spent a significant amount of time looking at ML models and capabilities, and have guesses about where those are going.

Comment by Nathaniel Monson (nathaniel-monson) on Revisiting the Manifold Hypothesis · 2023-10-04T12:39:18.785Z · LW · GW

In that case, I believe your conjecture is trivially true, but has nothing to do with human intelligence or Bengio's statements. In context, he is explicitly discussing low dimensional representations of extremely high dimensional data, and the things human brains learn to do automatically (I would say analogously to a single forward pass).

If you want to make it a fair fight, you either need to demonstrate a human who learns to recognize primes without any experience of the physical world (please don't do this) or allow an ML model something more analogous to the data humans actually receive, which includes math instruction, interacting with the world, many brain cycles, etc

Comment by Nathaniel Monson (nathaniel-monson) on Revisiting the Manifold Hypothesis · 2023-10-03T22:12:50.566Z · LW · GW

I agree with your entire first paragraph. It doesn't seem to me that you have addressed my question though. You are claiming that this hypothesis "implies that machine learning alone is not a complete path to human-level intelligence." I disagree. If I try to design an ML model which can identify primes, is it fair for me to give it some information equivalent to the definition (no more information than a human who has never heard of prime numbers has)?

If you allow that it is fair for me to do so, I think I can probably design an ML model which will do this. If you do not allow this, then I don't think this hypothesis has any bearing on whether ML alone is "a complete path to human-level intelligence." (Unless you have a way of showing that humans who have never received any sensory data other than a sequence of "number:(prime/composite)label" pairs would do well on this.)

Comment by Nathaniel Monson (nathaniel-monson) on Revisiting the Manifold Hypothesis · 2023-10-02T10:47:58.604Z · LW · GW

"implies that machine learning alone is not a complete path to human-level intelligence."

I don't think this is even a little true, unless you are using definitions of human level intelligence and machine learning which are very different than the ideas I have of them.

If you have a human who has never heard of the definition of prime numbers, how do you think they would do on this test? Am I allowed to.supply my model with something equivalent to the definition?

Comment by Nathaniel Monson (nathaniel-monson) on Cohabitive Games so Far · 2023-09-29T07:24:35.307Z · LW · GW

Have you looked into new angeles? Action choices are cooperative, with lots of negotiation. Each player is secretly targeting another player, and wins if they end with more points than their target (so you could have a 6 player game where the people who ended with most, and 4th and 5th most win, while 2nd, 3rd, and 6th lose.)

Comment by Nathaniel Monson (nathaniel-monson) on [Linkpost] Mark Zuckerberg confronted about Meta's Llama 2 AI's ability to give users detailed guidance on making anthrax - Business Insider · 2023-09-29T06:12:29.164Z · LW · GW

This comment confuses me.

Why is Tristan in quotes? Do you not believe it's his real name?
What is the definition of the community you're referring to?
I don't think I see any denigration happening --what are you referring to?
What makes someone an expert or an imposter in your eyes? In the eyes of the community?

Comment by Nathaniel Monson (nathaniel-monson) on Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it) · 2023-09-28T09:59:09.992Z · LW · GW

I clicked the link in the second email quite quickly--i assumed it was a game/joke, and wanted to see what would happen. If I'd actually thought I was overriding people's preferences, I... probably would have still clicked because I don't think I place enormous value on people's preferences for holiday reasons, and I would have enjoyed being the person who determined it.

There are definitely many circumstances where I wouldn't unilaterally override a majority. I should probably try to figure out what the principles behind those are.

Comment by Nathaniel Monson (nathaniel-monson) on The Great Disembedding · 2023-09-27T18:58:57.719Z · LW · GW

I have a strong preference for non-ironic epistemic status. Can you give one?

Comment by Nathaniel Monson (nathaniel-monson) on Inside Views, Impostor Syndrome, and the Great LARP · 2023-09-26T07:18:08.860Z · LW · GW

If the review panel recommends a paper for a spotlight, there is a better than 50% chance a similarly-constituted review panel would have rejected the paper from the conference entirely:

https://blog.neurips.cc/2021/12/08/the-neurips-2021-consistency-experiment/

Comment by Nathaniel Monson (nathaniel-monson) on Inside Views, Impostor Syndrome, and the Great LARP · 2023-09-26T07:11:12.784Z · LW · GW

Not OP, but relevant -- I spent the last ~6 months going to meetings with [biggest name at a top-20 ML university]'s group. He seems to me like a clearly very smart guy (and very generous in allowing me to join), but I thought it was quite striking that almost all his interests were questions of the form "I wonder if we can get a model to do x", or "if we modify the training in way y, what will happen?" A few times I proposed projects about "maybe if we try z, we can figure out why b happens" and he was never very interested --a near exact quote of his in response was "even if we figured that out successfully, I don't see anything new we could get [the model] to do".

At one point I explicitly asked him about his lack of interest in a more general theory of what neural nets are good at and why-- his response was roughly that he's thought about it and the problem is too hard, comparing it to P=NP.

To be clear, I think he's an exceptionally good ML researcher, but his vision of the field looks to me more like a naturalist studying behavior than a biologist studying anatomy, which is very different from what I expected (and from the standard my shoulder-John is holding people to).

EDITED--removed identity of Professor.

Comment by Nathaniel Monson (nathaniel-monson) on Would You Work Harder In The Least Convenient Possible World? · 2023-09-22T21:19:41.014Z · LW · GW

To me, it sounds like A is a member of a community which A wants to have certain standards and B is claiming membership in that community while not meeting those. In that circumstance, I think a discussion between various members of the community about obligations to be part of that community and the community's goals and beliefs and how these things relate is very very good. Do you

A) disagree with that framing of the situation in the dialogue

B) disagree that in the situation I described a discussion is virtuous, verging on necessary

C) other?

Comment by Nathaniel Monson (nathaniel-monson) on Would You Work Harder In The Least Convenient Possible World? · 2023-09-22T20:14:21.805Z · LW · GW

Lots of your comments on various posts seem rude to me--should I be attempting to severely punish you?

Comment by Nathaniel Monson (nathaniel-monson) on Would You Work Harder In The Least Convenient Possible World? · 2023-09-22T06:18:57.399Z · LW · GW

I am genuinely confused why this is on lesswrong instead of EA. What do you think the distribution of giving money is like on each place, and what do you think the distribution of responses to drowning child is like on each?

Comment by Nathaniel Monson (nathaniel-monson) on On martingales · 2023-09-20T10:28:16.781Z · LW · GW

Minor semantic quibble: I would say we always want positive expected utility, but how that translates into money/time/various intangibles can vary tremendously both situationally and from person to person.

Comment by Nathaniel Monson (nathaniel-monson) on The Talk: a brief explanation of sexual dimorphism · 2023-09-18T18:02:44.280Z · LW · GW

This was very interesting, thanks for writing it :)

My zero-knowledge instinct is that sound-wave communication would be very likely to evolve in most environments. Motion -> pressure differentials seems pretty inevitable, so would almost always be a useful sensory modality. And any information channel that is easy to both sense and affect seems likely to be used for communication. Curious to hear your thoughts if your intuition is that it would be rare.

Comment by Nathaniel Monson (nathaniel-monson) on Book Review: Consciousness Explained (as the Great Catalyst) · 2023-09-17T16:55:29.385Z · LW · GW

Do you have candidates for intermediate views? Many-drafts which seem convergent, or fuzzy Cartesian theatres? (maybe graph-theoretically translating to nested subnetworks of neurons where we might say "this set is necessarily core, this larger set is semicore/core in frequent circumstances, this still larger set is usually un-core, but changeable, and outside this is nothing?)

Comment by Nathaniel Monson (nathaniel-monson) on How to talk about reasons why AGI might not be near? · 2023-09-17T16:09:50.721Z · LW · GW

The conversations I've had with people at Deepmind, OpenAI, and in academia make me very sure that lots of ideas on capabilities increases are already out there so there's a high chance anything you suggest would be something people are already thinking about. Possibly running your ideas past someone in those circles, and sharing anything they think is unoriginal would be safe-ish?

I think one of the big bottlenecks is a lack of ways to predict how much different ideas would help without actually trying them at costly large scale. Unfortunately, this is also a barrier to good alignment work. I don't have good ideas on making differential progress on this.

Comment by Nathaniel Monson (nathaniel-monson) on The Flow-Through Fallacy · 2023-09-13T05:05:51.593Z · LW · GW

I think lots of people would say that all three examples you gave are more about signalling than about genuinely attempting to accomplish a goal.

Comment by Nathaniel Monson (nathaniel-monson) on Sharing Information About Nonlinear · 2023-09-10T21:41:05.608Z · LW · GW

This seems like kinda a nonsense double standard. The declared goal of journalism is usually not to sell newspapers, that is your observation of the incentive structure. And while the declared goal of LW is to arrive at truth (or something similar--hone the skills which will better allow people to arrive at truth, or something), there are comparable parallel incentive structures to journalism.

It seems better to compare declared purpose to declared purpose, or inferred goal to inferred goal, doesn't it?

User info

Posts

Comments