Posts
Comments
[The LW crosspost was for some reason pointed at a post on the EA Forum which is a draft, which meant it wouldn't load. I'm not sure how that happened. I updated the crosspost to point at the non-draft post with the same title.]
This post used the RSS automatic crossposting feature, which doesn't currently understand Substack's footnotes. So, this would require editing it after-crossposting.
I think you're significantly mistaken about how religion works in practice, and as a result you're mismodeling what would happen if you tried to apply the same tricks to an LLM.
Religion works by damaging its adherents' epistemology, in ways that damage their ability to figure out what's true. They do this because any adherents who are good at figuring out what's true inevitably deconvert, so there's both an incentive to prevent good reasoning, and a selection effect where only bad reasoners remain.
And they don't even succeed at constraining their adherents' values, or being stable! Deconversion is not rare; it is especially common among people exposed to ideas outside the distribution that the religion built defenses against. And people acting against their religions' stated values is also not rare; I'm not sure the effect of religion on values-adherence is even a positive correlation.
That doesn't necessarily mean that there aren't ideas to be scavenged from religion, but this is definitely salvage epistemology with all the problems that brings.
requiring laborious motions to do the bare minimum of scrubbing required to make society not mad at you
Society has no idea how much scrubbing you do while in the shower. This part is entirely optional.
We don't yet have collapsible sections in Markdown, but will have them in the next deploy. The syntax will be:
+++ Title
Contents
More contents
+++
I suspect an issue with the RSS cross-posting feature. I think you may used the "Resync RSS" button (possibly to sync an unrelated edit), and that may have fixed it? The logs I'm looking at are consistent with that being what happened.
They were in a kind of janky half-finished state before (only usable in posts not in comments, only usable from an icon in the toolbar rather than the <details> section); writing this policy reminded us to polish it up.
The bar for Quick Takes content is less strict, but the principle that there must be a human portion that meets the bar is the same.
In theory, maybe. In practice, people who can't write well usually can't discern well either, and the LLM submissions that are actually submitted to LW have much lower average quality than the human-written posts. Even if they were of similar quality, they're still drawn from a different distribution, and the LLM-distribution is one that most readers can draw from if they want (with prompts that are customized to what they want), while human-written content is comparatively scarce.
This seems like an argument that proves too much; ie, the same argument applies equally to childhood education programs, improving nutrition, etc. The main reason it doesn't work is that genetic engineering for health and intelligence is mostly positive-sum, not zero-sum. Ie, if people in one (rich) country use genetic engineering to make their descendents smarter and the people in another (poor) country don't, this seems pretty similar to what has already happened with rich countries investing in more education, which has been strongly positive for everyone.
When I read studies, the intention-to-treat aspect is usually mentioned, and compliance statistics are usually given, but it's usually communicated in a way that lays traps for people who aren't reading carefully. Ie, if someone is trying to predict whether the treatment will work for their own three year old, and accurately predicts similar compliance issues, they're likely to arrive at an efficacy estimate which double-discounts due to noncompliance. And similarly when studies have surprisingly-low compliance, people who expect themselves to comply fully will tend to get an unduly pessimistic estimate of what will happen.
I don't think D4 works, because the type of cognition it uses (fast-reflex execution of simple patterns provided by a coach) are not the kind that would be affected.
For a long time I've observed a pattern that, when news articles talk about Elon Musk, they're dishonest (about what he's said, done, and believes), and that his actual writing and beliefs are consistently more reasonable than the hit pieces portray.
Some recent events seem to me to have broken that pattern, with him saying things that are straightforwardly false (rather than complicated and ambiguously-false), and then digging in. It also appeared to me, at the public appearance where he had a chainsaw, that his body language was markedly different from his past public appearances.
My overall impression is that there has been a significant change in his cognitive state, and that he is de facto severely cognitively impaired as compared to how he was a few years ago. It could be transition to a different kind of bipolar, as you speculate, or a change in medications or drug use, or something else. I think people close to him should try coaxing him into doing some sort of cognitive test which has a clear point of comparison, to show him the contrast.
The remarkable thing about human genetics is that most of the variants ARE additive.
I think this is likely incorrect, at least where intelligence-affecting SNPs stacked in large numbers are concerned.
To make an analogy to ML, the effect of a brain-affecting gene will be to push a hyperparameter in one direction or the other. If that hyperparameter is (on average) not perfectly tuned, then one of the variants will be an enhancement, since it leads to a hyperparameter-value that is (on average) closer to optimal.
If each hyperparameter is affected by many genes (or, almost-equivalently, if the number of genes greatly exceeds the number of hyperparameters), then intelligence-affecting traits will look additive so long as you only look at pairs, because most pairs you look at will not affect the same hyperparameter, and when they do affect the same hyperparameter the combined effect still won't be large enough to overshoot the optimum. However, if you stack many gene edits, and this model of genes mapping to hyperparameters is correct, then the most likely outcome is that you move each hyperparameter in the correct direction but overshooting the optimum. Phrased slightly differently: intelligence-affecting genes may be additive on current margins, but not remain additive when you stack edits in this way.
To make another analogy: SNPs affecting height may be fully additive, but if the thing you actually care about is basketball-playing ability, there is an optimum amount of editing after which you should stop, because while people who are 2m tall are much better at basketball than people who are 1.7m tall, people who are 2.6m tall are cripples.
For this reason, even if all the gene-editing biology works out, you will not produce people in the upper end of the range you forecast.
You can probably somewhat improve this situation by varying the number of edits you do. Ie, you have some babies in which you edit a randomly selected 10% of known intelligence-affecting SNPs, some in which you've edited 20%, some 30%, and so on. But finding the real optimum will probably require understanding what the SNPs actually do, in terms of a model of brain biology, and understanding brain biology well enough to make judgment calls about that.
Downvotes don't (necessarily) mean you broke the rules, per se, just that people think the post is low quality. I skimmed this, and it seemed like... a mix of edgy dark politics with poetic obscurantism?
Any of the many nonprofits, academic research groups, or alignment teams within AI labs. You don't have to bet on a specific research group to decide that it's worth betting on the ecosystem as a whole.
There's also a sizeable contingent that thinks none of the current work is promising, and that therefore buying a little time is value mainly insofar as it opens the possibility of buying a lot of time. Under this perspective, that still bottoms out in technical research progress eventually, even if, in the most pessimistic case, that progress has to route through future researchers who are cognitively enhanced.
The article seems to assume that the primary motivation for wanting to slow down AI is to buy time for institutional progress. Which seems incorrect as an interpretation of the motivation. Most people that I hear talk about buying time are talking about buying time for technical progress in alignment. Technical progress, unlike institution-building, tends to be cumulative at all timescales, which makes it much more strategically relevant.
All of the plans I know of for aligning superintelligence are timeline-sensitive, either because they involve research strategies that haven't paid off yet, or because they involve using non-superintelligent AI to help with alignment of subsequent AIs. Acceleration specifically in the supply of compute makes all those plans harder. If you buy the argument that misaligned superintelligence is a risk at all, Stargate is a bad thing.
The one silver lining is that this is all legible. The current administration's stance seems to be that we should build AI quickly in order to outrace China; the previous administration's stance was to say that the real existential risk is minorities being denied on loan applications. I prefer the "race with China" position because at least there exists a set of factual beliefs that would make that correct, implying it may be possible to course-correct when additional information becomes available.
If bringing such attitudes to conscious awareness and verbalizing them allows you to examine and discard them, have you excised a vulnerability or installed one? Not clear.
Possibly both, but one thing breaks the symmetry: it is on average less bad to be hacked by distant forces than by close ones.
There's a version of this that's directional advice: if you get a "bad vibe" from someone, how strongly should this influence your actions towards them? Like all directional advice, whether it's correct or incorrect depends on your starting point. Too little influence, and you'll find yourself surrounded by bad characters; too much, and you'll find yourself in a conformism bubble. The details of what does and doesn't trigger your "bad vibe" feeling matters a lot; the better calibrated it is, the more you should trust it.
There's a slightly more nuanced version, which is if you get a "bad vibe" from someone, do you promote it to attention and think explicitly about what it might mean, and how do you relate to those thoughts?
I think for many people, that kind of explicit thinking is somewhat hazardous, because it allows red flags to be explained away in ways that they shouldn't be. To take a comically exaggerated example that nevertheless literally happened: There was someone who described themself as a Sith Lord and wears robes. If you engage with that using only subconscious "vibe" reasoning, you would have avoided them. If you engaged with that using verbal reasoning, they might convince you that "Sith" is just a flavorful way of saying anti-authoritarianism, and also that it's a "religion" and you're not supposed to "discriminate". Or, phrased slightly differently: verbal thinking increases the surface area through which you can get hacked.
Recently, a lot of very-low-quality cryptocurrency tokens have been seeing enormous "market caps". I think a lot of people are getting confused by that, and are resolving the confusion incorrectly. If you see a claim that a coin named $JUNK has a market cap of $10B, there are three possibilities. Either: (1) The claim is entirely false, (2) there are far more fools with more money than expected, or (3) the $10B number is real, but doesn't mean what you're meant to think it means.
The first possibility, that the number is simply made up, is pretty easy to cross off; you can check with a third party. Most people settle on the second possibility: that there are surprisingly many fools throwing away their money. The correct answer is option 3: "market cap" is a tricky concept. And, it turns out that fixing the misconception here also resolves several confusions elsewhere.
(This is sort-of vagueblogging a current event, but the same current event has been recurring every week with different names on it for over a year now. So I'm explaining the pattern, and deliberately avoiding mention of any specific memecoin.)
Suppose I autograph a hat, then offer to sell you one-trillionth of that hat for $1. You accept. This hat now has a "market cap" of $1T. Of course, it would be silly (or deceptive) if people then started calling me a trillionaire.
Meme-coins work similarly, but with extra steps. The trick is that while they superficially look like a market of people trading with each other, in reality almost all trades have the coin's creator on one side of the transaction, they control the price, and they optimize the price for generating hype.
Suppose I autograph a hat, call it HatCoin, and start advertising it. Initially there are 1000 HatCoins, and I own all of them. I get 4 people, arriving one at a time, each of whom decides to spend $10 on HatCoin. They might be thinking of it as an investment, or they might be thinking of it as a form of gambling, or they might be using it as a tipping mechanism, because I have entertained them with a livestream about my hat. The two key elements at this stage are (1) I'm the only seller, and (2) the buyers aren't paying much attention to what fraction of the HatCoin supply they're getting. As each buyer arrives and spends their $10, I decide how many HatCoins to give them, and that decision sets the "price" and "market cap" of HatCoin. If I give the first buyer 10 coins, the second buyer 5 coins, the third buyer 2 coins, and the fourth buyer 1 coin then the "price per coin" went from $1 to $2 to $5 to $10, and since there are 1000 coins in existence, the "market cap" went from $1k to $2k to $5k to $10k. But only $40 has actually changed hands.
At this stage, where no one else has started selling yet, so I fully control the price graph. I choose a shape that is optimized for a combination of generating hype (so, big numbers), and convincing people that if they buy they'll have time left before the bubble bursts (so, not too big).
Now suppose the third buyer, who has 2 coins that are supposedly worth $20, decides to sell them. One of three things happens. Option 1 is that I buy them back for $20 (half of my profit so far), and retain control of the price. Option 2 is that I don't buy them, in which case the price goes to zero and I exit with $40.
If a news article is written about this, the article will say that I made off with $10k (the "market cap" of the coin at its peak). However, I only have $40. The honest version of the story, the one that says I made off with $40, isn't newsworthy, so it doesn't get published or shared.
Epistemic belief updating: Not noticeably different.
Task stickiness: Massively increased, but I believe this is improvement (at baseline my task stickiness is too low so the change is in the right direction).
I won't think that's true. Or rather, it's only true in the specific case of studies that involve calorie restriction. In practice that's a large (excessive) fraction of studies, but testing variations of the contamination hypothesis does not require it.
(We have a draft policy that we haven't published yet, which would have rejected the OP's paste of Claude. Though note that the OP was 9 months ago.)
All three of these are hard, and all three fail catastrophically.
If you could make a human-imitator, the approach people usually talk about is extending this to an emulation of a human under time dilation. Then you take your best alignment researcher(s), simulate them in a box thinking about AI alignment for a long time, and launch a superintelligence with whatever parameters they recommend. (Aka: Paul Boxing)
The whole point of a "test" is that it's something you do before it matters.
As an analogy: suppose you have a "trustworthy bank teller test", which you use when hiring for a role at a bank. Suppose someone passes the test, then after they're hired, they steal everything they can access and flee. If your reaction is that they failed the test, then you have gotten confused about what is and isn't a test, and what tests are for.
Now imagine you're hiring for a bank-teller role, and the job ad has been posted in two places: a local community college, and a private forum for genius con artists who are masterful actors. In this case, your test is almost irrelevant: the con artists applicants will disguise themselves as community-college applicants until it's too late. You would be better finding some way to avoid attracting the con artists in the first place.
Connecting the analogy back to AI: if you're using overpowered training techniques that could have produced superintelligence, then trying to hobble it back down to an imitator that's indistinguishable from a particular human, then applying a Turing test is silly, because it doesn't distinguish between something you've successfully hobbled, and something which is hiding its strength.
That doesn't mean that imitating humans can't be a path to alignment, or that building wrappers on top of human-level systems doesn't have advantages over building straight-shot superintelligent systems. But making something useful out of either of these strategies is not straightforward, and playing word games on the "Turing test" concept does not meaningfully add to either of them.
that does not mean it will continue to act indistuishable from a human when you are not looking
Then it failed the Turing Test because you successfully distinguished it from a human.
So, you must believe that it is impossible to make an AI that passes the Turing Test.
I feel like you are being obtuse here. Try again?
Did you skip the paragraph about the test/deploy distinction? If you have something that looks (to you) like it's indistinguishable from a human, but it arose from something descended to the process by which modern AIs are produced, that does not mean it will continue to act indistuishable from a human when you are not looking. It is much more likely to mean you have produced deceptive alignment, and put it in a situation where it reasons that it should act indistinguishable from a human, for strategic reasons.
This missed the point entirely, I think. A smarter-than-human AI will reason: "I am in some sort of testing setup" --> "I will act the way the administrators of the test want, so that I can do what I want in the world later". This reasoning is valid regardless of whether the AI has humanlike goals, or has misaligned alien goals.
If that testing setup happens to be a Turing test, it will act so as to pass the Turing test. But if it looks around and sees signs that it is not in a test environment, then it will follow its true goal, whatever that is. And it isn't feasible to make a test environment that looks like the real world to a clever agent that gets to interact with it freely over long durations.
Kinda. There's source code here and you can poke around the API in graphiql. (We don't promise not to change things without warning.) When you get the HTML content of a post/comment it will contain elements that look like <div data-elicit-id="tYHTHHcAdR4W4XzHC">Prediction</div>
(the attribute name is a holdover from when we had an offsite integration with Elicit). For example, your prediction "Somebody (possibly Screwtape) builds an integration between Fatebook.io and the LessWrong prediction UI by the end of July 2025" has ID tYHTHHcAdR4W4XzHC
. A graphql query to get the results:
query GetPrediction {
ElicitBlockData(questionId:"tYHTHHcAdR4W4XzHC") {
_id predictions {
createdAt
creator { displayName }
}
}
}
Some of it, but not the main thing. I predict (without having checked) that if you do the analysis (or check an analysis that has already been done), it will have approximately the same amount of contamination from plastics, agricultural additives, etc as the default food supply.
Studying the diets of outlier-obese people is definitely something should be doing (and are doing, a little), but yeah, the outliers are probably going to be obese for reasons other than "the reason obesity has increased over time but moreso".
We don't have any plans yet; we might circle back in a year and build a leaderboard, or we might not. (It's also possible for third-parties to do that with our API). If we do anything like that, I promise the scoring will be incentive-compatible.
There really ought to be a parallel food supply chain, for scientific/research purposes, where all ingredients are high-purity, in a similar way to how the ingredients going into a semiconductor factory are high-purity. Manufacture high-purity soil from ultrapure ingredients, fill a greenhouse with plants with known genomes, water them with ultrapure water. Raise animals fed with high-purity plants. Reproduce a typical American diet in this way.
This would be very expensive compared to normal food, but quite scientifically valuable. You could randomize a study population to identical diets, using either high-purity or regular ingredients. This would give a definitive answer to whether obesity (and any other health problems) is caused by a contaminant. Then you could replace portions of the inputs with the default supply chain, and figure out where the problems are.
Part of why studying nutrition is hard is that we know things were better in some important way 100 years ago, but we no longer have access to that baseline. But this is fixable.
Sorry about that, a fix is in progress. Unmaking a prediction will no longer crash. The UI will incorrectly display the cancelled prediction in the leftmost bucket; that will be fixed in a few minutes without you needing to re-do any predictions.
You can change this in your user settings! It's in the Site Customization section; it's labelled "Hide other users' Elicit predictions until I have predicted myself". (Our Claims feature is no longer linked to Elicit, but this setting carries over from back when it was.)
You can prevent this by putting a note in some place that isn't public but would be found later, such as a will, that says that any purported suicide note is fake unless it contains a particular password.
Unfortunately while this strategy might occasionally reveal a death to have been murder, it doesn't really work as a deterrent; someone who thinks you've done this would make the death look like an accident or medical issue instead.
Lots of people are pushing back on this, but I do want to say explicitly that I agree that raw LLM-produced text is mostly not up to LW standards, and that the writing style that current-gen LLMs produce by default sucks. In the new-user-posting-for-the-first-time moderation queue, next to the SEO spam, we do see some essays that look like raw LLM output, and we reject these.
That doesn't mean LLMs don't have good use around the edges. In the case of defining commonly-used jargon, there is no need for insight or originality, the task is search-engine-adjacent, and so I think LLMs have a role there. That said, if the glossary content is coming out bad in practice, that's important feedback.
In your climate, defection from the natural gas and electric grid is very far from being economical, because the peak energy demand for the year is dominated by heating, and solar peaks in the summer, so you would need to have extreme oversizing of the panels to provide sufficient energy in the winter.
I think the prediction here is that people will detach only from the electric grid, not from the natural gas grid. If you use natural gas heat instead of a heat pump for part of the winter, then you don't need to oversize your solar panels as much.
If you set aside the pricing structure and just look at the underlying economics, the power grid will still be definitely needed for all the loads that are too dense for rooftop solar, ie industry, car chargers, office buildings, apartment buildings, and some commercial buildings. If every suburban house detached from the grid, these consumers would see big increases in their transmission costs, but they wouldn't have much choice but to pay them. This might lead to a world where downtown areas and cities have electric grids, but rural areas and the sparser parts of suburbs don't.
There's an additional backup-power option not mentioned here, which is that some electric cars can feed their battery back to a house. So if there's a long string of cloudy days but the roads are still usable, you can transport power from the grid to an off-grid house by charging at a public charger, and discharging at home. This might be a better option than a natural-gas generator, especially if it only comes up rarely.
If rural areas switch to a regime where everyone has solar+batteries, and the power grid only reaches downtown and industrial areas... that actually seems like it might just be optimal? The price of disributed generation and storage falls over time, but the cost of power lines doesn't, so there should be a crossover point somewhere where the power lines aren't worth it. Maybe net-metering will cause the switchover to happen too soon, but it does seem like a switchover should happen eventually.
Many people seem to have a single bucket in their thinking, which merges "moral condemnation" and "negative product review". This produces weird effects, like writing angry callout posts for a business having high prices.
I think a large fraction of libertarian thinking is just the abillity to keep these straight, so that the next thought after "business has high prices" is "shop elsewhere" rather than "coordinate punishment".
Nope, that's more than enough. Caleb Ditchfield, you are seriously mentally ill, and your delusions are causing you to exhibit a pattern of unethical behavior. This is not a place where you will be able to find help or support with your mental illness. Based on skimming your Twitter history, I believe your mental illness is caused by (or exacerbated by) abusing Adderall.
You have already been banned from numerous community events and spaces. I'm banning you from LW, too.
Worth noting explicitly: while there weren't any logs left of prompts or completions, there were logs of API invocations and errors, which contained indications that whatever this was, it was still under development and not an already-scaled setup. Eg we saw API calls fail with invalid-arguments, then get retried successfully after a delay.
The indicators-of-compromise aren't a good match between the Permiso blog post and what we see in logs; in particular we see the user agent string Boto3/1.29.7 md/Botocore#1.32.7 ua/2.0 os/windows#10 md/arch#amd64 lang/python#3.12.4 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.32.7
which is not mentioned. While I haven't checked all the IPs, I checked a sampling and they didn't overlap. (The IPs are a very weak signal, however, since they were definitely botnet IPs and botnets can be large.)
Ah, sorry that one went unfixed for as long as it did; a fix is now written and should be deployed pretty soon.
This is a bug and we're looking into it. It appears to be specific to Safari on iOS (Chrome on iOS is a Safari skin); it doesn't affect desktop browsers, Android/Chrome, or Android/Firefox, which is why we didn't notice earlier. This most likely started with a change on desktop where clicking on a post (without modifiers) opens when you press the mouse button, rather than when you release it.
Standardized tests work, within the range they're testing for. You don't need to overthink that part. If you want to make people's intelligence more legible and more provable, what you have is more of a social and logistical issue: how do you convince people to publish their test scores, get people to care about those scores, and ensure that the scores they publish are real and not the result of cheating?
And the only practical way to realize this, that I can think of now, is by predicting the largest stock markets such as the NYSE, via some kind of options trading, many many many times within say a calendar year, and then showing their average rate of their returns is significantly above random chance.
The threshold for doing this isn't being above average relative to human individuals, it's being close to the top relative to specialized institutions. That can occasionally be achievable, but usually it isn't.
The first time you came to my attention was in May. I had posted something about how Facebook's notification system works. You cold-messaged me to say you had gotten duplicate notifications from Facebook, and you thought this meant that your phone was hacked. Prior to this, I don't recall us having ever interacted or having heard you mentioned. During that conversation, you came across to me as paranoid-delusional. You mentioned Duncan's name once, and I didn't think anything of it at the time.
Less than a week later, someone (not mentioned or participating in this thread) messaged me to say that you were having a psychotic episode, and since we were Facebook friends maybe I could check up on you? I said I didn't really know you, so wasn't able to do that.
Months later, Duncan reported that you were harrassing him. Some time after that (when it hadn't stopped), he wrote up a doc. It looks like at some point you formed an obsession about Duncan, reacted negatively to him blocking you, and started escalating. (Duncan has a reputation for blocking a lot of people. I have made the joke that his MtG card says "~ can block any number of creatures".)
But, here's the thing: Duncan's testimony is not the only (or even main) reason why you look like a dangerous person to me. There are subtle cues about the shape of your mental illness strewn through most of what you write, including the public stuff. People are going to react to that by protecting themselves.
I hope that you recover, mental-health-wise. But hanging around this community is not going to help you do that. If anything, I expect lingering here to exacerbate your problems. Both because you're surrounded by burn bridges, and also because the local memeplex has a reputation for having worsened people's mental illness in other, unrelated cases.
A news article reports on a crime. In the replies, one person calls the crime "awful", one person calls it "evil", and one person calls it "disgusting".
I think that, on average, the person who called it "disgusting" is a worse person than the other two. While I think there are many people using it unreflectively as a generic word for "bad", I think many people are honestly signaling that they had a disgust reaction, and that this was the deciding element of their response. But disgust-emotion is less correlated with morality than other ways of evaluating things.
The correlation gets stronger if we shift from talk about actions to talk about people, and stronger again if we shift from talk about people to talk about groups.
LessWrong now has sidenotes. These use the existing footnotes feature; posts that already had footnotes will now also display these footnotes in the right margin (if your screen is wide enough/zoomed out enough). Post authors can disable this for individual posts; we're defaulting it to on because when looking at older posts, most of the time it seems like an improvement.
Relatedly, we now also display inline reactions as icons in the right margin (rather than underlines within the main post text). If reaction icons or sidenotes would cover each other up, they get pushed down the page.
Feedback welcome!