Posts
Comments
More insightful than what is conserved under the scaling symmetry of ReLU networks is what is not conserved: the gradient. Scaling by scales by and by , which means that we can obtain arbitrarily large gradient norms by simply choosing small enough . And in general bad initializations can induce large imbalances in how quickly the parameters on either side of the neuron learn.
Some time ago I tried training some networks while setting these symmetries to the values that would minimize the total gradient norm, effectively trying to distribute the gradient norm as equally as possible throughout the network. This significantly accelerated learning, and allowed extremely deep (100+ layers) networks to be trained without residual layers. This isn't that useful for modern networks because batchnorm/layernorm seems to effectively do the same thing, and isn't dependent on having ReLU as the activation function.
Thus, the γ value is a “conserved quantity” under gradient descent associated with the symmetry. If the symmetry only holds for a particular solution in some region of the loss landscape rather than being globally baked into the architecture, the γ value will still be conserved under gradient descent so long as we're inside that region.
Minor detail, but this is false in practice because we are doing gradient descent with a non-zero learning rate, so there will be some diffusion between different hyperbolas in weight space as we take gradient steps of finite size.
I suspect the expert judges would need to resort to known jailbreaking techniques to distinguish LLMs. A fair interesting test might be against expert-but-not-in-ML judges.
Sorry to be blunt, but any distraction filter that can be disabled through the chrome extension menu is essentially worthless. Speaking from experience, for most people this will work for exactly 3 days until they find a website they really want to visit and just "temporarily" disable the extension in order to see it.
For #5, I think the answer would be to make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment. I can imagine this being the case if the AI safety idea somehow makes the AI much better at instruction-following using the spirit of the instruction (which is after all what we care about). The big players do care about having instruction-following AIs, and if the way to do that is to use the AI safety book, they will use it.
Do you expect Lecun to have been assuming that the entire field of RL stops existing in order to focus on his specific vision?
Very many things wrong with all of that:
- RL algorithms don't minimize costs, but maximize expected reward, which can well be unbounded, so it's wrong to say that the ML field only minimizes cost.
- LLMs minimize expected log probability of correct token, which is indeed bounded at zero from below, but achieving zero in that case means perfectly predicting every single token on the internet.
- The boundedness of the thing you're minimizing is totally irrelevant, since maximizing is exactly the same as maximizing where g is a monotonic function. You can trivially turn a bounded function into an unbounded one without changing anything to the solution sets.
- Even if utility is bounded between 0 and 1, an agent maximizing the expected utility will still never stop, because you can always decrease the probability you were wrong. Quadruple-check every single step and turn the universe into computronium to make sure you didn't make any errors.
This is very dumb, Lecun should know better, and I'm sure he *would* know better if he spent 5 minutes thinking about any of this.
The word "privilege" has been so tainted by its association with guilt that it's almost an infohazard to think you've got privilege at this point, it makes you lower your head in shame at having more than others, and brings about a self-flagellation sort of attitude. It elicits an instinct to lower yourself rather than bring others up. The proper reactions to all these things you've listed is gratitude to your circumstances and compassion towards those who don't have them. And certainly everyone should be very careful towards any instinct they have at publicly "acknowledging their privilege"... it's probably your status-raising instincts having found a good opportunity to boast about your intelligence, appearance and good looks while appearing like you're being modest.
Weird side effect to beware for retinoids: they make dry eyes worse, and in my experience this can significantly decrease your quality of life, especially if it prevents you from sleeping well.
Basically, this shows that every term in a standard Bayesian inference, including the prior ratio, can be re-cast as a likelihood term in a setting where you start off unsure about what words mean, and have a flat prior over which set of words is true.
If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.
Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?
Most of the weird stuff involving priors comes into being when you want posteriors over a continuous hypothesis space, where you get in trouble because reparametrizing your space changes the form of your prior, so a uniform "natural" prior is really a particular choice of parametrization. Using a discrete hypothesis space avoids big parts of the problem.
Wait, why doesn't the entropy of your posterior distribution capture this effect? In the basic example where we get to see samples from a bernoulli process, the posterior is a beta distribution that gets ever sharper around the truth. If you compute the entropy of the posterior, you might say something like "I'm unlikely to change my mind about this, my posterior only has 0.2 bits to go until zero entropy". That's already a quantity which estimates how much future evidence will influence your beliefs.
Surely something like the expected variance of would be a much simpler way of formalising this, no? The probability over time is just a stochastic process, and OP is expecting the variance of this process to be very high in the near future.
Unfortunately the entire complexity has just been pushed one level down into the definition of "simple". The L2 norm can't really be what we mean by simple, because simply scaling the weights in a layer by A, and the weights in the next layer by 1/A leaves the output of the network invariant, assuming ReLU activations, yet you can obtain arbitrarily high L2 norms by just choosing A high enough.
Unfortunately if OpenAI the company is destroyed, all that happens is that all of its employees get hired by Microsoft, they change the lettering on the office building, and sama's title changes from CEO to whatever high level manager positions he'll occupy within microsoft.
Hmm, but here the set of possible world states would be the domain of the function we're optimising, not the function itself. Like, No-Free-Lunch states (from wikipedia):
Theorem 1: Given a finite set and a finite set of real numbers, assume that is chosen at random according to uniform distribution on the set of all possible functions from to . For the problem of optimizing over the set , then no algorithm performs better than blind search.
Here is the set of possible world arrangements, which is admittedly much smaller than all possible data structures, but the theorem still holds because we're averaging over all possible value functions on this set of worlds, a set which is not physically restricted by anything.
I'd be very interested if you can find Byrnes' writeup.
Obviously LLMs memorize some things, the easy example is that the pretraining dataset of GPT-4 probably contained lots of cryptographically hashed strings which are impossible to infer from the overall patterns of language. Predicting those accurately absolutely requires memorization, there's literally no other way unless the LLM solves an NP-hard problem. Then there are in-between things like Barack Obama's age, which might be possible to infer from other language (a president is probably not 10 yrs old or 230), but within the plausible range, you also just need to memorize it.
There is no optimization pressure from “evolution” at all. Evolution isn’t tending toward anything. Thinking otherwise is an illusion.
Can you think of any physical process at all where you'd say that there is in fact optimization pressure? Of course at the base layer it's all just quantum fields changing under unitary evolution with a given Hamiltonian, but you can still identify subparts of the system that are isomorphic with a process we'd call "optimization". Evolution doesn't have a single time-independent objective it's optimizing, but it does seem to me that it's basically doing optimization on a slowly time-changing objective.
Why would you want to take such a child and force them to ‘emotionally develop’ with dumber children their own age?
Because you primarily make friends in school with people in your grade, and if you skip too many grades, the physical difference between the gifted kid and other kids will prevent them from building a social circle based on physical play, and probably make any sort of dating much harder.
Predicting the ratio at t=20s is hopeless. The only sort of thing you can predict is the variance in the ratio over time, like the ratio as a function of time is , where . Here the large number of atoms lets you predict , but the exact number after 20 seconds is chaotic. To get an exact answer for how much initial perturbation still leads to a predictable state, you'd need to compute the lyapunov exponents of an interacting classical gas system, and I haven't been able to find a paper that does this within 2 min of searching. (Note that if the atoms are non-interacting the problem stops being chaotic, of course, since they're just bouncing around on the walls of the box)
I'll try to say the point some other way: you define "goal-complete" in the following way:
By way of definition: An AI whose input is an arbitrary goal, which outputs actions to effectively steer the future toward that goal, is goal-complete.
Suppose you give me a specification of a goal as a function from a state space to a binary output. Is the AI which just tries out uniformly random actions in perpetuity until it hits one of the goal states "goal-complete"? After all, no matter the goal specification this AI will eventually hit it, though it might take a very long time.
I think the interesting thing you're trying to point at is contained in what it means to "effectively" steer the future, not in goal-arbitrariness.
E.g. I claim humans are goal-complete General Intelligences because you can give us any goal-specification and we'll very often be able to steer the future closer toward it.
If you're thinking of "goals" as easily specified natural-language things, then I agree with you, but the point is that turing-completeness is a rigorously defined concept, and if you want to get the same level of rigour for "goal-completeness", then most goals will be of the form "atom 1 is a location x, atom 2 is at location y, ..." for all atoms in the universe. And when averaged across all such goals, literally just acting randomly performs as well as a human or a monkey trying their best to achieve the goal.
Goal-completeness doesn't make much sense as a rigorous concept because of No-Free-Lunch theorems in optimisation. A goal is essentially a specification of a function to optimise, and all optimisation algorithms perform equally well (or rather poorly) when averaged across all functions.
There is no system that can take in an arbitrary goal specification (which is, say, a subset of the state space of the universe) and achieve that goal on average better than any other such system. My stupid random action generator is equally as bad as the superintelligence when averaged across all goals. Most goals are incredibly noisy, the ones that we care about form a tiny subset of the space of all goals, and any progress in AI we make is really about biasing our models to be good on the goals we care about.
Zvi, you continue to be literally the best news aggregator on the planet for the stuff that I actually care about. Really, thanks a lot for doing this, it's incredibly valuable to me.
Wouldn't lowering igf-1 also lead to really shity quality of life from lower muscle mass and much longer recovery times from injury?
The proteins themselves are primarily covalent, but a quick google search says that the forces in the lipid layer surrounding cells are primarily non-covalent, and the forces between cells seem also non-covalent. Aren't those forces the ones we should be worrying about?
It seems like Eliezer is saying "the human body is a sand-castle, what if we made it a pure crystal block?", and you're responding with "but individual grains of sand are very strong!"
But perhaps the bigger reason is that I find SIA intuitively extremely obvious. It’s just what you get when you apply Bayesian reasoning to the fact that you exist.
Correct, except for the fact that you're failing to consider the possibility that you might not exist at all...
My entire uncertainty in anthropic reasoning is bound up in the degree to which an "observer" is at all a coherent concept.
And my guess is that is how Hamas see and bill themselves.
And your guess would be completely, hopelessly wrong. There is an actual document called "The Covenant of Hamas" written in 1988 and updated in 2017, which you can read here, it starts with
Praise be to Allah, the Lord of all worlds. May the peace and blessings of Allah be upon Muhammad, the Master of Messengers and the Leader of the mujahidin, and upon his household and all his companions.
... so, uh, not a good start for the "not religious" thing. It continues:
1. The Islamic Resistance Movement “Hamas” is a Palestinian Islamic national liberation and resistance movement. Its goal is to liberate Palestine and confront the Zionist project. Its frame of reference is Islam, which determines its principles, objectives and means.
In the document they really seem to want to clarify at every opportunity that yes, indeed they are religious at the most basic level, and that religion impacts every single aspect of their decision-making. I strongly recommend that everyone here read the whole thing, just to see what it really means to take your religion seriously.
The 2017 version has been cleaned up, but in the 1988 covenant you also had this gem:
> The Day of Judgment will not come about until Moslems fight Jews and kill them. Then, the Jews will hide behind rocks and trees, and the rocks and trees will cry out: 'O Moslem, there is a Jew hiding behind me, come and kill him.' (Article 7)
>The HAMAS regards itself the spearhead and the vanguard of the circle of struggle against World Zionism... Islamic groups all over the Arab world should also do the same, since they are best equipped for their future role in the fight against the warmongering Jews.'
It is important that Gazans won't feel like their culture is being erased.
A new education curriculum is developed which fuses western education, progressive values and Muslim tradition while discouraging political violence.
These two things are incompatible. Their culture is the entire problem. To get a sense of the sheer vastness of the gap, consider the fact that Arabs read on average 6 pages per year. It would take a superintelligence to somehow convince the palestinians to embrace western thought and values while not feeling like their culture is being erased.
Oh, true! I was going to reply that since probability is just a function of a physical system, and the physical system is continuous, then probability is continuous... but if you change an integer variable in C from 35 to 5343 or whatever, there's no real sense in which the variable goes through all intermediate values, even if the laws of physics are continuous.
If he's ever attended an event which started out with less than a 28% chance of orgy, which then went on to have an orgy, then that statement is false by the Intermediate Value Theorem, since there would have been an instant in time where the probability of the event crossed 28%.
The most basic rationalist precept is to not forcibly impose your values onto another mind.
What? How does that make any sense at all? The most basic precept of rationality is to take actions which achieve future world states that rank highly under your preference ordering. Being less wrong, more right, being bayesian, saving the world, not imposing your values on others, etc. are all deductions that follow from that most basic principle: Act and Think Such That You Win.
Wait, do lesswrongers not know about semaglutide and tirzepatide yet? Why would anyone do something as extreme as bariatric surgery when tirzepatide patients lose pretty much the same amount of weight after a year as with the surgery?
But if you are right that you only respond to a limited set of story types, do you therefore aspire to opening yourself to different ones in future, or is your conclusion that you just want to stick to films with 'man becomes strong' character arcs?
Not especially, for the same reason that I don't plan on starting to eat 90% dark chocolate to learn to like it, even if other people like it (and I can even appreciate that it has a few health benefits). I certainly am not saying that only movies that appeal to me be made, I'm happy that Barbie exists and that other people like it, but I'll keep reading my male-protagonist progression fantasies on RoyalRoad.
Greta Gerwig obviously thinks so: when she says: "I think equally men have held themselves to just outrageous standards that no one can meet. And they have their own set of contradictions where they’re walking a tightrope. I think that’s something that’s universal."
I have a profound sense of disgust and recoil when someone tells me to lower my standards about myself. Whenever I hear something like "it's ok, you don't need to improve, just be yourself, you're enough", I react strongly, because That Way Lay Weakness. I don't have problems valuing myself, and I'm very good at appreciating my achievements, so that self-acceptance message is generally not properly aimed at me, it would be an overcorrection if I took that message even more to heart than I do right now.
I watched Barbie and absolutely hated it. Though it did provide some value to me after I spent some time thinking about why precisely I hated it. Barbie really showed me the difference between the archetypal story that appeals to males and the female equivalent, and how much just hitting that archetypal story is enough to make a movie enjoyable for either men or women.
The plot of the basic male-appealing story is "Man is weak. Man works hard with clear goal. Man becomes strong". I think men feel this basic archetypal story much more strongly than women, so that even an otherwise horrible story can be entertaining if it hits that particular chord well enough (evidence: most isekai stories), if the man is weak enough at the beginning, or the work especially hard. I'm not exactly clear what the equivalent story is for women, but it's something like "Woman thinks she's not good enough, but she needs to realise that she is already perfect". And the Barbie movie really hits on that note, which is why I think the women in my life seemed to enjoy it. But that archetype just doesn't resonate with me at all.
The apparent end-point for the Kens in the movie is that they "find themselves". This was (to me) a clear misunderstanding by the female authors of what the masculine instinct is like. Men don't "find themselves", they decide who they want to be and work towards climbing out of their pitiful initial states. (There was also the weird Ken obsession with horses, which are mostly a female-only thing)
I'm fairly sure that there's architectures where each layer is a linear function of the concatenated activations of all previous layers, though I can't seem to find it right now. If you add possible sparsity to that, then I think you get a fully general DAG.
Their paper for the sample preparation (here) has a trademark sign next to the "LK-99" name, which suggests they've trademarked it... strongly suggesting that the authors actually believe in their stuff.
There are a whole bunch of ways that trying to optimise for unpredictability is not a good idea:
- Most often technical discussions are not just exposition dumps, they're a part of the creative process itself. Me telling you an idea is an essential part of my coming up with the idea. I essentially don't know where I'm going before I get there, so it's impossible for me to optimise for unpredictability on your end.
- This ignores a whoooole bunch of status-effects and other goals of human conversation. The point of conversation is not solely to transmit information. In real life information-transfer is a minuscule part of most conversations: try telling your girlfriend to "speak unpredictably" when she gets home and wants to vent to you about her boss.
- People often don't say what they mean. The process of translating a mental idea into words on-the-fly often results in sequences of words that are very bad at communicating the idea. The only solution to this is to be redundant, repeat the idea multiple times in different ways until you hit one that your interlocutor understands.
Humans are not Vulcans, and we shouldn't try to optimise human communication the way we'd optimise a network protocol.
I think you might want to look at the litterature on "sparse neural networks", which is the right search term for what you mean here.
I'm really confused about how anybody thinks they can "license" these models. They're obviously not works of authorship.
I'm confused why you're confused, if I write a computer program that generates an artifact that is useful to other people, obviously the artifact should be considered a part of the program itself, and therefore subject to licensing just like the generating program. If I write a program to procedurally generate interesting minecraft maps, should I not be able to license the maps, just because there's one extra step of authorship between me and them?
The word "curiosity" has a fairly well-defined meaning in the Reinforcement Learning literature (see for instance this paper). There are vast numbers of papers that try to come up with ways to give an agent intrinsic rewards that map onto the human understanding of "curiosity", and almost all of them are some form of "go towards states you haven't seen before". The predictable consequence of prioritising states you haven't seen before is that you will want to change the state of the universe very very quickly.
Not too sure about the downvotes either, but I'm curious how the last sentence misses the point? Are you aware of a formal definition of "interesting" or "curiosity" that isn't based on novelty-seeking?
According to reports xAI will seek to create a "maximally curious" AI, and this also seems to be the main new idea how to solve safety, with Musk explaining: "If it tried to understand the true nature of the universe, that's actually the best thing that I can come up with from an AI safety standpoint," ... "I think it is going to be pro-humanity from the standpoint that humanity is just much more interesting than not-humanity."
Is Musk just way less intelligent than I thought? He still seems to have no clue at all about the actual safety problem. Anyone thinking clearly should figure out that this is a horrible idea within at most 5 minutes of thinking.
Obviously pure curiosity is a horrible objective to give to a superAI. "Curiosity" as currently defined in the RL literature is really something more like "novelty-seeking", and in the limit this will cause the AI to keep rearranging the universe into configurations it hasn't seen before, as fast as it possibly can...
A theory of the popularity of anime.
Much like there have been ten thousand reskins of Harry Potter I’ve been waiting for more central examples of English-language cultural products to take that story archetype and just run with it. There is clearly a demand.
Well then Rejoice! The entire genre of Progression Fantasy is what you desire, and you need only browse the Best Of RoyalRoad to see lots of english-language stories that scratch that particular itch. In fact, I find these english stories immensely superior to anything in anime or manga.
A particularly good example is the recently-finished 12 book series Cradle, whose books ranked at #1 on fantasy Audible for the past few years.
Overall, a headline that seems counterproductive and needlessly divisive.
Probably the understatement of the decade, this article is literally an "order" from Official Authority to stop talking about what I believe is literally the most important thing in the world. I guess this is not literally the headline that would maximally make me lose respect for Nature... but it's pretty close.
This article is a pure appeal to authority. It contains no arguments at all, it only exists as a social signal that Respectable Scientists should steer away from talk of AI existential risk.
The AI risk debate is now no more about any actual arguments, it's now about slinging around political capital and scientific prestige. It has become political in nature.
That's not a math or physics paper, and it includes a bit more "handholding" in the form of an explicit database than would really make me update. The style of scientific papers is obviously very easy to copy for current LLMs, what I'm trying to get at is that if LLMs can start to make genuinely novel contributions at a slightly below-human level and learn from the mediocre article they write, pure volume of papers can make up for quality.
- "This has been killing people!"
- "Yes, but it might kill all people!"
- "Yes, but it's killing people!"
- "Of course, sure, whatever, it's killing people, but it might kill all people!"
But this isn't the actual back-and-forth, the third point should be "no it won't, you're distracting from the people currently being killed!". This is all a game to subtly beg the question. If AI is an existential threat, all current mundane threats like misinformation, job loss, AI bias, etc. are rounding errors to the total harm, the only situation where you'd talk about them is if you've already granted that the existential risks don't exist.
If a large comet is heading towards Earth, and some group thinks it won't actual hit Earth, but merely pass harmlessly close-by, and they start talking about the sun's reflections off the asteroid making life difficult for people with sensitive eyes... they are trying to get you to assume the conclusion.
I don't think we need superhuman capability here for stuff to get crazy, pure volume of papers could substitute for that. If you can write a mediocre but logically correct paper with $50 of compute instead of with $10k of graduate student salary, that accelerates the pace of progress by a factor of 200, which seems enough for me to enable a whole bunch of other advances which will feed into AI research and make the models even better.
If we get to that point of AI capabilities, we will likely be able to make 50 years of scientific progress in a matter of months for domains which are not too constrained by physical experimentation (just run more compute for LLMs), and I'd expect AI safety to be one of those. So either we die quickly thereafter, or we've solved AI safety. Getting LLMs to do scientific progress basically telescopes the future.
Fair point, "non-trivial" is too subjective, the intuition that I meant to convey was that if we get to the point where LLMs can do the sort of pure-thinking research in math and physics at a level where the papers build on top of one another in a coherent way, then I'd expect us to be close to the end.
Said another way, if theoretical physicists and mathematicians get automated, then we ought to be fairly close to the end. If in addition to that the physical research itself gets automated, such that LLMs write their own code to do experiments (or run the robotic arms that manipulate real stuff) and publish the results, then we're *really* close to the end.
If the question is ‘what’s one experiment that would drop your p(doom) to under 1%?’ then I can’t think of such an experiment that would provide that many bits of data, without also being one where getting the good news seems absurd or being super dangerous.
Not quite an experiment, but to give an explicit test: if we get to the point where an AI can write non-trivial scientific papers in physics and math, and we then aren't all dead within 6 months, I'll be convinced that p(doom) < 0.01, and that something was very deeply wrong with my model of the world.