Posts
Comments
Habryka means we would have to pick one number per Stripe link (eg one like for $5/month, 1 for $100/month, etc)
Are you checking the box for “Save my info for 1-click checkout with Link”? That’s the only way I’ve figured out get Stripe to ask for my phone number. If so, you can safely uncheck that
(Also, I don’t know if it’s important you, but I don’t think we would see your phone number if you gave it to Stripe)
What do you mean by A?
Habryka is slightly sloppily referring to using Janus' 'base model jailbreak' for Claude 3.5 Sonnet
as I understand it, the majority of this money will go towards supporting Lighthaven
I think if you take Habryka's numbers at face value, a hair under half of the money this year will go to Lighthaven (35% of core staff salaries@1.4M = 0.49M. 1M for a deferred interest payment. And then the claim that otherwise Lighthaven is breaking even). And in future years, well less than half.
I worry that the future of LW will be endangered by the financial burden of Lighthaven
I think this is a reasonable worry, but I again want to note that Habryka is projecting a neutral or positive cashflow from Lighthaven to the org.
That said, I can think of a couple of reasons for financial pessimism[1]. Having Lighthaven is riskier. It involves a bunch of hard-to-avoid costs. So, if Lighthaven has a bad year, that does indeed endanger the project as a whole.
Another reason to be worried: Lightcone might stop trying to make Lighthaven break even. Lightcone is currently fairly focused on using Lighthaven in revenue-producing ways. My guess is that we'll always try and structure stuff at Lighthaven such that it pays its own way (for example, when we ran LessOnline we sold tickets[2]). But maybe not! Maybe Lightcone will pivot Lighthaven to a loss-making plan, because it foresees greater altruistic benefit (and expects to be able to fundraise to cover it).
So the bundling of the two projects still leaks some risk.
Of course, you might also think Lighthaven makes LessWrong more financially robust, if on the mainline it ends up producing a modest profit that can be used to subsidise LessWrong.
- ^
Other than just doubting Habryka's projections, which also might make sense.
- ^
My understanding of the numbers is that we lost money once you take into account staff time, but broke even if you don't. And it seems the people most involved with running it are hopeful about cutting a bunch of costs in future.
I worry that cos this hasn't received a reply in a bit, people might think it's not in the spirit of the post. I'm even more worried people might think that critical comments aren't in the spirit of the post.
Both critical comments and high-effort-demanding questions are in the spirit of the post, IMO! But the latter might take awhile to get a response
The EIN is 92-0861538
My impression matches your initial one, to be clear. Like my point estimate of the median is like 85%, my confidence only extends to >50%
Lightcone is also heterogeneous, but I think it's accurate that the median view at Lightcone is >50% on misaligned takeover
Maybe remove decimal numbers entirely throughout the graphs? This is what it looked like for me, and led to the error. And this image is way zoomed in compared to what I see naturally on my screen.
Good idea. Done.
Yes, I think you're right. I was confused by Shoshannah's numbers last night, but it was late and I didn't manage to summon enough sapience to realise something was wrong and offer a correction. Thanks for doing that!
as applied to them
A slight nitpick: I think this treats their like of the activity applied to them as a scalar, but I think it's also plausibly a function of how you, the applier, go about it. Like maybe they are very in to this activity as applied to them, but not in the way you do it.
If I had to pick a favourite, I'd probably go for Fire and AIs, but The GPT is also great: very terrifying sublime
Perhaps say some words on why they might want to?
Curated! A short and sweet post, with a useful rule of thumb for those who live and die by the Fermi. There are some more good ones in the comments, too!
I don't think this distinction is robust enough to rely on as much of a defensive property. I think it's probably not that hard to think "I probably would have tried something in direction X, or direction Y", and then gather lots of bits about how well the clusters X and Y work.
I'm curious about Claude's coauthorship of this post. How did it help?
The crucial point is: when the untrusted model gets to step 6, it sees R in the history instead of A, so it knows its attempt was replaced.
Why does it know its attempt was replaced? Because it can guess it wouldn't have produced R? If it can tell that, how come it can't guess it specifically would have produced A?
I thought this title meant the post would be making a case from conservative (i.e. minimal) assumptions.
Maybe change the title to "making a politically conservative case for alignment" or something?
I wonder what the lifetime spend on dating apps is. I expect that for most people who ever pay it's >$100
I think the credit assignment is legit hard, rather than just being a case of bad norms. Do you disagree?
I would guess they tried it because they hoped it would be competitive with their other product, and sunset it because that didn't happen with the amount of energy they wanted to allocate to the bet. There may also have been an element of updating more about how much focus their core product needed.
I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.
It's possible no one tried literally "recreate OkC", but I think dating startups are very oversubscribed by founders, relative to interest from VCs [1] [2] [3] (and I think VCs are mostly correct that they won't make money [4] [5]).
(Edit: I want to note that those are things I found after a bit of googling to see if my sense of the consensus was borne out; they are meant in the spirit of "several samples of weak evidence")
I don't particularly believe you that OkC solves dating for a significant fraction of people. IIRC, a previous time we talked about this, @romeostevensit suggested you had not sufficiently internalised the OkCupid blog findings about how much people prioritised physical attraction.
You mention manifold.love, but also mention it's in maintenance mode – I think because the type of business you want people to build does not in fact work.
I think it's fine to lament our lack of good mechanisms for public good provision, and claim our society is failing at that. But I think you're trying to draw an update that's something like "tech startups should be doing an unbiased search through viable valuable business, but they're clearly not", or maybe, "tech startups are supposed to be able to solve a large fraction of our problems, but if they can't solve this, then that's not true", and I don't think either of these conclusions seem that licensed from the dating data point.
Yes, though I'm not confident.
I saw this poll and thought to myself "gosh, politics, religion and cultural opinions sure are areas where I actively try to be non-heroic, as they aren't where I wish to spend my energy".
They load it in as a web font (i.e. you load Calibri from their server when you load that search page). We don't do that on LessWrong
Yeah, that's a google Easter Egg. You can also try "Comic Sans" or "Trebuchet MS".
One sad thing about older versions of Gill Sans: Il1 all look the same. Nova at least distinguishes the 1.
IMO, we should probably move towards system fonts, though I would like to choose something that preserves character a little more.
I don't think we've changed how often we use serifs vs sans serifs. Is there anything particular you're thinking of?
@gwern I think it prolly makes sense for me to assign this post to your account? Let me know if you're OK with that.
For me, Dark Forest Theory reads strongly as "everyone is hiding, (because) everyone is hunting", rather than just "everyone is hiding".
From the related book Elephant in the Brain:
Here is the thesis we’ll be exploring in this book: We, human beings, are a species that’s not only capable of acting on hidden motives—we’re designed to do it. Our brains are built to act in our self-interest while at the same time trying hard not to appear selfish in front of other people. And in order to throw them off the trail, our brains often keep “us,” our conscious minds, in the dark. The less we know of our own ugly motives, the easier it is to hide them from others.
I think Steve Hsu has written some about the evidence for additivity on his blog (Information Processing). He also talks about it a bit in section 3.1 of this paper.
It seems like there's a general principle here, that it's hard to use pure empiricism to bound behaviour over large input and action spaces. You either need to design the behaviour, or understand it mechanistically.
I don't understand why you would short the market if your P(Doom) is high. I think most Dooms don't involve shorts paying off?
ANT has a stronger safety culture, and so it is a more pleasant experience to work at ANT for the average safety researcher. This suggests that there might be a systematic bias towards ANT that pulls away from the "optimal allocation".
I think this depends on whether you think AI safety at a lab is more of an O-ring process or a swiss-cheese process. Also, if you think it's more of an O-ring process, you might be generally less excited about working at a scaling lab.
the idea that social media was sending them personalized messages
I imagine they were obsessed with false versions of this idea, rather than obsession about targeted advertising?
I'm not sure I'm understanding your setup (I only skimmed the post). Are you using takeoff to mean something like "takeoff from now" or ("takeoff from [some specific event that is now in the past]")? If I look at your graph at the end, it looks to me like "Paul Slow" is a faster timeline but a longer takeoff (Paul Slow's takeoff beginning near the beginning of the graph, and Fast takeoff beginning around the intersection of the two blue lines).
Wasn't the relevant part of your argument like, "AI safety research outside of the labs is not that good, so that's a contributing factor among many to it not being bad to lose the ability to do safety funding for governance work"? If so, I think that "most of OpenPhil's actual safety funding has gone to building a robust safety research ecosystem outside of the labs" is not a good rejoinder to "isn't there a large benefit to building a robust safety research ecosystem outside of the labs?", because the rejoinder is focusing on relative allocations within "(technical) safety research", and the complaint was about the allocation between "(technical) safety research" vs "other AI x-risk stuff".
I've not seen the claim that the scaling laws are bending. Where should I look?
possible worlds that split off when the photon was created
I don't think this is a very good way of thinking about what happens. I think worlds appear as fairly robust features of the wavefunction when quantum superpositions get entangled with large systems that differ in lots of degrees of freedom based on the state of the superposition.
So, when the intergalactic photon interacts non-trivially with a large system (e.g. Earth), a world becomes distinct in the wavefunction, because there's a lump of amplitude that is separated from other lumps of amplitude by distance in many, many dimensions. This means it basically doesn't interact with the rest of the wavefunction, and so looks like a distinct world.
I tried to replicate. At 20 it went on to 25, and I explained what it got wrong. I tried again. I interrupted at 6 and it stopped at 7, saying "Gotcha, stopped right at eleven!". I explained what happened and it said something like "Good job, you found the horrible, marrow cricket" (these last 3 words are verbatim) and then broke.
Thanks. I think a bunch of discussions I've seen or been part of could have been more focused by establishing whether the crux was "1 is bad" vs "I think this is an instance of 3, not 1".
IMO, pro Slack instances are wonderful for searching & good for many different kinds of media, though not mixed media (i.e. you can upload videos, photos, pdfs (and search over them all, including with speech recognition!) but inserting photos into a message is annoying).
I'm not really familiar with Zulip or Discord.
(Also, I'm not sure with pro Slack instances really qualify for (2) anymore)
I don't know what "private" means to you, but if you just mean you can control who joins, I think google groups are a good choice for 2 - 4.
Zulip, Discord and Slack are all options as well, though they all (to differing degrees) encourage shorter, chattier posts.
I also expect it would be a bit more expensive than something like Said’s suggestions
Is the central argumentative line of this post that high-quality & informative text in the training distribution rarely corrects itself, post-training locates the high-quality part of the distribution, and so LLMs rarely correct themselves?
Or is it the more specific claim that post-training is locating parts of the distribution where the text is generated by someone in a context that highlights their prestige from their competence, and such text rarely corrects itself?
I don't see yet why the latter would be true, so my guess is you meant the former. (Though I do think the latter prompt would more strongly imply non-self-correction).
I'm not sure whether this is important to the main thrust of the post, but I disagree with most of this paragraph:
Again, they're an expert in the field -- and this is the sort of claim that would be fairly easy to check even if you're not an expert yourself, just by Googling around and skimming recent papers. It's also not the sort of claim where there's any obvious incentive for deception. It's hard to think of a plausible scenario in which this person writes this sentence, and yet the sentence is false or even controversial.
In my experience, it's quite hard to check what "the gold standard" of something is, particularly in cutting-edge research fields. There are lots of different metrics on which methods compete, and it's hard to know their importance as an outsider.
And the obvious incentive for deception is that the physics prof works on NPsM, and so is talking it up (or has developed a method that beats NPsM on some benchmark, and so is talking it up to impress people with their new method ...)
Regarding the sign of Lightcone Offices: I think one sort of score for a charity is the stuff that it has done, and another is the quality of its generator of new projects (and the past work is evidence for that generator).
I'm not sure exactly the correct way to combine those scores, but my guess is most people who think the offices and their legacy were good should like us having money because of the high first score. And people who think they were bad should definitely be aware that we ran them (and chose to close them) when evaluating our second score.
So, I want us to list it on our impact track record section, somewhat regardless of sign.
What are the semantics of "otherwise"? Are they more like:
X otherwise Y
↦ X → ¬Y, orX otherwise Y
↦ X ↔ ¬Y