One of my takeaways of how the negotiations went is that it seems sama is extremely concerned with securing access to lots of compute, and that the person who ultimately got their way was the person who sat on the compute.
The "sama running Microsoft" idea seems a bit magical to me. Surely the realpolitik update here should be: power lies in the hands of those with legal voting power, and those controlling the compute. Sama has neither of those things at Microsoft. If he can be fired by a board most people have never heard of, then for sure he can get fired by the CEO of Microsoft.
People seem to think he is somehow a linchpin of building AGI. Remind me... how many of OpenAI's key papers did he coauthor? Paul Graham says if you dropped him into an island of cannibals he would be king in 5 years. Seems plausible. Paul Graham did not say he would've figured out how to engineer a raft good enough to get him out of there. If there were any Manifold markets on "Sama is the linchpin to building AGI", I would short them for sure.
We already have strong suspicion from the open letter vote counts there's a personality cult around Sama at OpenAI (no democratic election ever ends with a vote of 97% in favor). It also makes sense people in the LessWrong sphere would view AGI as the central thing to the future of the world and on everyone's minds, and thus fall in the trap of also viewing Sama as the most important thing at Microsoft. (Question to ask yourself about such a belief: who does it benefit? And is that beneficiary also a powerful agent deliberately attempting to shape narratives to their own benefit?)
Satya Nadella might have a very different perspective than that, on what's important for Microsoft and who's running it.
If there was actually a spooky capabilities advance that convinced the board that drastic action was needed, then the board's actions were on net justified, regardless of what other dynamics were at play and whether cooperative principles were followed.
If the board did not abide by cooperative principles in the firing nor acted on substantial evidence to warrant the firing in line with the charter, and nonetheless were largely EA motivated, then EA should be disavowed and dismantled.
Those are some interesting papers, thanks for linking.
In the case at hand, I do disagree with your conclusion though.
In this situation, the most a user could find out is who checked them in dialogues. They wouldn't be able to find any data about checks not concerning themselves.
If they happened to be a capable enough dev and were willing to go through the schleps to obtain that information, then, well... we're a small team and the world is on fire, and I don't think we should really be prioritising making Dialogue Matching robust to this kind of adversarial cyber threat for information of comparable scope and sensitivity! Folks with those resources could probably uncover all kinds of private vote data already, if they wanted to.
Here's some quick notes on how I think of LessWrong user data.
Any data that's already public -- reacts, tags, comments, etc -- is fair game. It just seems nice to do some data science and help folks uncover interesting patterns here.
On the other side of the spectrum, me and the team generally never look at users' up and downvotes, except in cases where there's strong enough suspicion of malicious voting behavior (like targeted mass downvoting).
Then there's stuff in the middle. Like, what if we tell a user "you and this user frequently upvote each other"? That particular example currently feels like it reveals too much private data. As another example, the other day me and a teammate had a discussion of whether, on the matchmaking page, we could show people recently active users who already checked you, to make it more likely you'd find a match. We tenatively postulated it would be fine to do this as long as seeing a name on your match page gave no more than like a 5:1 update about those people having checked you. We sketched out some algorithms to implement this, that would also be stable under repeated refreshing and similar. (We haven't implemented the algorithm nor the feature yet.)
So my general take on features "in the middle" is for now to treat them on a case by case basis, with some principles like "try hard to avoid revealing anything that's not already public, and if doing so, try to leave plausible deniability bounded by some number of leaked bits, only reveal metadata or aggregate data, reveal it only to one other or a smaller set of users, think about whether this is actually a piece of info that seems high or low stakes, and see if you can get away with just using data from people who opted in to revealing it".
Space flight doesn't involve a 100 percent chance of physical death
I think historically folks have gone to war or on other kinds of missions that had death rates of like, at least, 50%. And folks, I dunno, climb Mount Everest, or figured out how to fly planes before they could figure out how to make them safe.
Some of them were for sure fanatics or lunatics. But I guess I also think there's just great, sane, and in many ways whole, people, who care about things greater than their own personal life and death, and are psychologically consituted to be willing to pursue those greater things.
GPT4 can't solve IMO problems. Now take an IMO gold medalist about to walk into their exam, and upload them at that state into an Em without synaptic plasticity. Would the resulting upload would still be able to solve the exam at a similar level as the full human?
I don't have a strong belief, but my intuition is that they would. I recall once chatting to @Neel Nanda about how he solved problems (as he is in fact an IMO gold winner), and recall him describing something that to me sounded like "introspecting really hard and having the answers just suddenly 'appear'..." (though hopefully he can correct that butchered impression)
Do you think such a student Em would or would not perform similarly well in the exam?
Separately, I'm kind of awed by the idea of an "uploadonaut": the best and brightest of this young civilisation, undergoing extensive mental and research training to have their minds able to deal with what they might experience post upload, and then courageously setting out on a dangerous mission of crucial importance for humanity.
(I tried generating some Dall-E 1960's style NASA recruitment posters for this, but they didn't come out great. Might try more later)
Noting that I gave this a weak downvote as I found this comment to be stating many strong claims but without correspondingly strong (or sometimes not really any) arguments. I am still interested in the reasons you believe these things though (for example, like a fermi on inferece cost at runtime).
I don't think you're going to get a lot of volunteers for destructive uploading (or actually even for nondestructive uploading). Especially not if the upload is going to be run with limited fidelity. Anybody who does volunteer is probably deeply atypical and potentially a dangerous fanatic.
Seems falsified by the existence of astronauts?
Comment by jacobjacob on [deleted post]
Reference class: I'm old enough to remember the founding of the Partnership on AI. My sense from back in the day was that some (innocently misguided) folks wanted in their hearts for it to be an alignment collaboration vehicle. But I think it's decayed into some kind of epiphenomenal social justice thingy. (And for some reason they have 30 staff. I wonder what they all do all day.)
I hope Frontier Model Forum can be something better, but my hopes ain't my betting odds.
What played essentially no role in any of it, as far as I can tell, was AI.
One way I would expect it to play a role at this stage, would be armies of bot commenters on X and elsewhere, giving monkey brain the impression of broad support for a position. (Basically what Russia has been doing for ages, but now AI enabled.)
I haven't been able to tell whether or not that happened. Have you?
Hm, I was a bit confused reading this. My impression was "seems like there are multiple viable solutions", but then they were discarded for reasons that seemed kind of tangential, or not dealbreakers to me, where some extra fiddling might've done the trick?
If I get the time later will write up more concretely why some of them still seemed promising.
If you don't want to put your questions in public, there's a form you can fill in, where only the lesswrong team sees your suggestions, and will do the matchmaking to connect you with someone only if it seems like a good mutual fit :)
I'm often very up for interviewing people (as I've done here and here) -- if I have some genuine interest in the topic, and if it seems like lesswrong readers would like to see the interview happen. So if people are looking for someone-to-help-you-get-your-thoughts-out, I might be down. (See also the interview request form.)
For interviewing folks, I'm interested in more stuff than I can list. I'll just check what others are into being interviewed about and if there's a match.
For personally dialogueing, topics I might want to do (writing these out, quickly and roughly, rather than not at all!)
metaphors and isomorphisms between lean manufacturing <> functional programming <> maneuver warfare, as different operational philosophies in different domains that still seem to be "climbing the same mountain from different directions"
state space metaphors applied to operations. i have some thoughts here, for thinking about how effective teams and organisations function, drawing upon a bunch of concepts and abstractions that I've mostly found around writing by wentworth and some agent foundations stuff... I can't summarise it succinctly, but if someone is curious upon hearing this sentence, we could chat
strategically, what are technologies such that 1) in our timeline they will appear late (or toolate) on the automation tree, 2) they will be blocking for accomplishing certain things, and 3) there's tractable work now for causing them to happen sooner? For example: will software and science automation progress to a point where we will be able to solve a hard problem like uploading, in a way that then leaves us blocked on something like "having 1000 super-microscopes"? And if so, should someone just go try to build those microscopes now? Are there are other examples like this?
I like flying and would dialogue about it :)
enumerative safety sounds kind of bonkers... but what if it isn't? And beyond that, what kind of other ambitious, automated, alignment experiments would it be good if someone tried?
Cyborg interfaces. What are natural and/or powerful ways of exploring latent space? I've been thinking about sort of mindlessly taking some interfaces over which I have some command -- a guitar, a piano, a stick-and-rudder -- and hooking them up something allowing me to "play latent space" or "cruise through latent space". What other metaphors are there here? What other interfaces might be cool to play around with?
So if some boss often drove his employees to tears, as long as he was pretty insightful, you don't think that the employees should be able to know before taking the job? Surely that's not your position. But then what is?
I wanted to add a perspective to the conversation that I didn't see mentioned, moreso than advocating a very thought out position. I have conflicting intuitions, and the territory seems messy!
On the one hand, it does seem to me like there should be some kind of "heads up about intensity". It's real bad to create hidden slippery slopes along the intensity scale. It's real bad to first make people dependent on you (by, say, paying most of their salary in yet-to-be-vested equity, making them work long enough that they can't explore external opportunities and maintain outside friends, ...) and then shifting into a potentially abusive stance (heavily frame controlling, demoralising, etc). It is when these kinds of pressures are applied that I think things move into unacceptable territory. (And my suggested community response would probably be something like "Sandbox culprit in ways where they're able to remain highly productive while doing less damage, give people accurate indications about their style (conveying this might actually fall on someone else than the culprit to do -- that division of labor might just be our only way to get all the good stuff here!), and avoid giving people inaccurate impressions or being a wide-eyed feeder school."
For comparison, when I imagine pursuing a career in investment banking, it seems like I'd be opting into a shark tank. I'm just kind of accepting there'll be some real abusive folks around, following the $$$, and I'll be betting on my ability to navigate that without losing myself in the process. Being part of a healthy community means somehow having people around me who can help me see these things. I do think there are some young undergrads who naively will believe the faceless Goldman ads. I feel like Taleb would have a word for them -- the "sucker" or the "Intellectual Yet Idiot". They'll get hurt, and this is bad, and the recruiting ads that led them into this are immoral.
(From that perspective, I'm pretty into my straw version of military ads, which is more like "You'll have the worst time of your life and be tested to your limits. You're too weak for this. But you'll gain glory. Sign up here.")
On the other hand, I also have the intuition that requesting of individual researchers that they signpost and warn about their unusual communication style seems to be locating the onus of this in the wrong location... and I kind of just don't expect it to work, empirically? I feel like the getting-a-job-at-MIRI pipeline should somehow make it clear to people what level of f*ckery is about to happen to them, insofar as it is. I currently don't know whose responsibility I think that is (and I'm shipping this comment in a confused state, rather than not shipping it at all).
I don’t really know what I think about retrospectives in general, and I don’t always find them that helpful, because causal attribution is hard. Nonetheless, here are some reasons why I wanted to curate this:
I like that it both covers kind of a broad spectrum of stuff that influences a research project, and also manages to go into some interesting detail: high-level research direction and its feasibility, concrete sub-problems attempted and the outcome, particular cognitive and problem-solving strategies that were tried, as well as motivational and emotional aspects of the project. Hearing about what it was like when the various agent confusions collided with the various humans involved, was quite interesting and I feel like it actually gave me a somewhat richer view of both
It discusses some threads that seem important and that I’ve heard people talk about offline, but that I’ve seen less discussed online recently (the impact of infohazard policies, ambiguous lines between safety and capabilities research and how different inside views might cause one to pursue one rather than the other, people’s experience of interfacing with the MIRI_Nate way of doing research and communicating)
It holds different perspectives from the people involved in the research group, and I like how the dialogues feature kind of allows each to coexist without feeling a need to squeeze them into a unified narrative (the way things might feel if one were to coauthor a post or paper).
(Note: I chose to curate this, and I am also listed as a coauthor. I think this is fine because ultimately the impetus for writing up this content came from Thomas. Me and Raemon mostly just served as facilitators and interlocutors helping him get this stuff into writing.)
I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion.
I remember the introductory lecture for the Cognitive Neuroscience course I took at Oxford. I won't mention the professor's name, because he's got his own lab and is all senior and stuff, and might not want his blunt view to be public -- but his take was "this field is 95% nonsense. I'll try to talk about the 5% that isn't". Here's a lecture slide:
Wanted to briefly add a perspective I didn't see mentioned yet --
First -- seems like you had a particularly rough interaction, and I do want to express empathy for that. I feel like I recognise some of the things you point to, and think it's plausible that I might have been similarly demoralised by that situation, and that would really suck for me and I'd be really sad. So, genuinely sorry about that. I hope you'll find ways to regain motivation that was unfairly lost, and the ability to draw on insights that ended up involuntarily screened off from you.
Second, the perspective I've come to hold for these situations is... Basically the world does seem full of people who are extraordinarily productive in important ways, and who also... are kind of d*cks. (Important footnote: )
I think exceptional people are often sufficiently rare that, as things go, I'd rather take a bunch of productive d*cks than tune down their cognitive spikiness at the cost of mulling the productive peaks
I observe and am strategic about how I allocate my soul and motivation points to things. In the past I would kind of always pour full soul into things, but that led to a lot of sadness, because other people by default might not be able to hold things that are precious to me, and if I unilaterally pour it on them, they also really don't have a responsibility to hold it! Ouch.
I try to satisfy different needs from different people. In various professional domains I'll be pretty thick skinned and put up with a lot of nonsense to extract interesting insights from people or get things done. Then with my partner or close friends I'll do a bunch of stuff that's emotionally nurturing and care a lot about holding aspects of each other's experience in ways that aren't rude.
I beware of people employing dynamics that get inside and mess with my OODA loop, and have various allergic responses to this, and might often explicitly limit interactions, or hold the interaction in a particular way. Regardless of whether they've advertised being unusual in this regard, I just kind of have a way of holding my guard up
I think holding this stance is my best strategy for getting around. Man, sometimes you gain so much great stuff from people who are rude, or weird, or norm-violating in various other ways, and I think "developing your own set of personal strategies that allow you to put up with stuff" can be a pretty decent superpower, judiciously deployed.
if not, face the normal consequences for being rude, like 'proportional loss of social regard
So in light of the above, the way I orient to this would be something like: if someone is really killing it in terms of intellectual insight, or just getting important shit done -- that's the primary thing I care about around here (on LessWrong and the broader ecosystem). I'll try hard to carve a space for them to get those things out. If they're also a d*ck, I'll proably avoid inviting them to friendly gatherings I organise, and I might even just not have them work closely on my team specifically, because it'll mess too much with my OODA loop, and I want a certain culture.
But I do not think they have a responsibility to proactively inform people about their style.
On a community-wide level, the ratio of reward I'd give out for insight/productivity vs punishment for rudeness is like at least 30:1 or something, on some imaginary scale. I don't like rudeness and work best among people who are pretty empathetic and nurturing; but hey, the world is what it is, I'll take what I can get, and think this is the best point on the tradeoff curve.
(And again, to reiterate, I do hope you also have or will find a way to orient to these things where you can gain the good insights + motivation, and avoid taking annoying hit points!)
Note: I don't want to make any strong claims here about how insightful, or how much of a d*ck, this one particular empirical guy Nate is, with whom I have only interacted very little (though I like his blog posts!). Don't construe my comment as claiming that he is actually either of those things!
So, when a human lies over the course of an interaction, they'd be holding a hidden state in mind throughout. However, an LLM wouldn't carry any cognitive latent state over between telling the lie, and then responding to the elicitation question. I guess it feels more like "I just woke up from amnesia, and seems I have just told a lie. Okay, now what do I do..."
Stating this to:
Verify that indeed this is how the paper works, and there's no particular way of passing latent state that I missed, and
Any thoughts on how this affects the results and approach?
A program-like data structure is natural for representing locality + symmetry
Didn't quite get this from the lecture. For one, every rookie programmer has probably experienced that programs can work in ways with mysterious interactions that sure don't seem very local... but maybe in your case you'd still say that at the end of the day it would all just be unpackable into a graph of function calls, respecting locality at each step?
Question: what's an example of a data structure very similar to program-like ones, while failing to respect locality + symmetry?
I was only able to quickly skim this during my morning meeting, so might have missed a relevant point addressing this; but my first thought on seeing the results is "Sounds like you successfully trained a cohort of potential capabilities researchers"
Making the building simple, with repeated components (the window example was a great one) is a better answer
Yeah... I was once working on a remodeling project, and had the "clever" idea that we could save time by only selectively demoing certain sections. "Tear down this wall, but leave this window-sill, and this doorframe looks good, leave that too, oh and maybe leave this section of drywall which looks fine"...
Terrible idea. Crews got confused and paralyzed. I now believe it's much faster to just give clear and simple instructions -- "tear it all down to the studs". In the chaos and complexity of dealing with a building, simple instructions allow crews to move more independently and make their own decisions, and also makes it more feasible to deploy more labor (as it's easier to onboard and delegate).