Posts
Comments
You have to return the pattern which when inserted in place of the lightblue cells would result in a symmetric image (so you look at the other side and mirror it)
"No one assigns 70% to this statement." (Yes, your friend is an idiot, but that can be remedied, if needed, with a slight modification in the statement)
Already did before reading your comment :D
Deletion restore test
I will delete this comment without trace and another in the regular way. If you see either, please restore it!
Deletion restore test
I will delete this comment in the regular way and another without trace. If you see either, please restore it!
I think you can also delete without a trace, do post authors able to restore that too? (I'd guess no)
(a) 1/3 (b) 1/3
Reasoning is the same as in the standard case: Probability (when used as a degree of belief) expresses the measure of your entanglement with a given hypothetical world. However, it is interesting that in both cases we know what day it is, making this a particularly evil version: Even after Beauty walks out of the experiment their credence still stays 1/3 indefinitely (given it wasn't heads/even in which case beauty will know everything upon waking up on wednesday)!
+white dot is the eye
(And I'm not sure on the tusks)
Overall, i think the tweet goes into my collection of people reporting ai mistakes which are actually examples of the ai outperforming the human.
I hope I'm not the only one who sees an elephant..
Minor thing: If you already use partial derivatives in the post, I don't see why you couldn't have written the one application of lagrange multiplier theorem for the missing part.
The end result is the most elegant formulation of Kelly betting I've seen, however, I will keep using my usual formulation for my betting on prediction markets as that one is, in my opinion, better for quick calculation using only my head than this one.
All B (Yes, I know that variants of eliminativism are not popular here, but I was instructed to answer before reading further)
Anthropic should let Claude be used in the EU.
I will give a simple argument:
-
Given that it is already deployed, increasing the region of its legal availability is very unlikely to increase AI risk.
-
The EU also has people who might be able to contribute to AI safety research who will contribute less if the frontier models are not available legally in the EU.
-
Therefore, the action has net benefits.
-
I believe the action's costs are much less.
Anthropic should let Claude be used in the EU.
Unfortunately, I don't have Claude in my region. Could you ask it why it wrote "*whispers*", maybe also interrogate it a bit about whether they really believed you (what percent would it assign that they were really speaking without supervision)/if it didn't believe you why it went along/if it did believe you how does it imagine he was not under supervision?
Ah yes, there is no way situational awareness might emerge in LLMs, just no way at all..
I see, I haven't yet read that one. But yes, we should be clear what we denote with HH/HT/TT/TH, the coins before, or after the turning of second coin.
Why couldn't the ball I've just got been put into the box on HH? On HH, after we turn the second coin we get HT which is not HH, so a ball is put into the box, no?
I don't want my name to carry respect. I want individual comments evaluated for their validity.
I like this part of your comment a lot! If you don't want to periodically create new accounts, another possibility is regularly changing your name to something random.
Which to me strongly suggest Roko was unfamiliar with multiple (imo) strong evidence for the zoonosis origin.
The first part of the third rootclaim debate covers the behavior of the scientists from 53:10 https://youtu.be/6sOcdexHKnk?si=7-WVlgl5rNEyjJvX
The second part of the second rootclaim debate (90 minutes) https://youtu.be/FLnXVflOjMo?si=dPAi1BsZTATxEglP
Cases clustering at wetmarket, proline at fcs, otherwise suboptimal fcs, out of frame insertion, WIV scientists' behavior after leak (talking about adding fcs to coronavirus in december, going to dinner, publishing ratg13), secret backbone virus not known (for some reason sars not used like in other fcs insertion studies), 2 lineages at market just off the top of my head
You did the same thing Peter Miller did in the first rootclaim debate just for the opposite side: you multiplied the probability estimates of every unlikely evidence under your disfavored hypothesis, observed that it is a small number then said a mere paragraph about how this number isn't that small under your favored hypothesis.
To spell it out explicitly: When calculating the probability for your favored hypothesis you should similarly consider the pieces of evidence which are unlikely under that hypothesis!! Generally, some pieces of evidence will be unlikely for one side and likely for the other, you can't just select the evidence favorable for your side!
As I understand (but correct me if I am wrong), your claim is that we don't feel surprise when observing what is commonly thought of as a rare event, because we don't actually observe a rare event, because of one quirk of our human psychology we implicitly use a non-maximal event space. But you now seem to allow for another probability space which, if true, seems to me a somewhat inelegant part of the theory. Do you claim that our subconscious tracks events in multiple ways simultaneously or am I misunderstanding you?
Relatedly, the power set does allow me to express individual coin tosses. Let be the following function on :
In this case is measurable, because (minor point: Your is not the powerset of ), same for . Therefore is actually a random variable modeling that the first throw is head.
Regarding your examples, I'm not sure I'm understanding you: Is your claim that the eventspace is different in the three cases leading to different probabilities for the events observed? I thought your theory said that our human psychology works with non-maximal eventspaces, but it seems it also works with different event spaces in different situations? (EDIT: Rereading the post, it seems you've adressed this part: if I understand correctly, one can influence their event space by way of focusing on specific outcomes?)
Wouldn't it be much simpler to say that in 1, your previous assumption that the coinflips are independent from what you write on a paper became too low probability after observing the coinflips and that caused the feeling of surprise?
I'm afraid I don't understand your last paragraph, to me it clearly seems an alternative explanation. Please, elaborate. It's not true that any time I observe a low-probability event, one of my assumptions gets low-prob. For example, if I observe HHTHTTHHTTHT, no assumption of mine does, because I didn't have a previous assumption that I will get coinflips different from HHTHTTHHTTHT. An assumption is not just any statement\proposition\event, it's a belief about the world which is actually assumed beforehand.
To me your explanation leaves some things unexplained: for example: In what situation will our human psychology use which non-maximal event spaces? What is the evolutionary reason for this quirk? Isn't being surprised in the all heads case rational in an objective sense? Should we expect an alien species to be or not be surprised?
For my proposed explanation these are easy questions to answer: We are not surprised because of the non-maximal event spaces, rather, we are surprised if one of our assumptions loses a lot of probability. The evolutionary reason is that the feeling of surprise caused us to investigate and in cases when one of our assumptions got too improbable, we should actually investigate the alternatives. Yes, being surprised in these cases is objectively rational and we should expect an alien species to do the same on all-heads throw and not do the same on some random string of H/T.
I don't know.. not using the whole powerset when is finite kinda rubs me the wrong way. (EDIT: correction: what clashes with my aesthetic sense isn't that it's not the whole powerset, rather that I instinctively want to have random variables denoting any coinflip when presented with a list of coinflips yet I can't have that if the set of events is not the powerset because in that case those wouldn't be measurable functions. I think the following expands on this same intuition without the measure-theoretic formalism.)
Consider the situation where I'm flipping the coin and I keep getting heads, I imagine I get more and more surprised as I'm flipping.
Consider now that I am at the moment when I've already flipped coins, but before flipping the th one. I'm thinking about the next flip: To model the situation in my mind, there clearly should be an event where the th coin is heads and another event where the th coin is tails. Furthermore, these events should have equal (possibly conditional) probabilities yet I will be much more surprised if I get heads again.
This makes me think that the key isn't that I didn't actually observe a low probability event (because in my opinion it does not make sense to model the situation above with a -algebra where the th coin being tails is grouped with the th coin being heads because in that case I wouldn't be able to calculate separate probabilities for those events) rather the key is that I feel surprise when one of my assumptions about the world has become too improbable compared to an alternative: in this case, the assumption that the coin is unbiased. After observing lots of heads the probability that the coin is biased in favor of heads gets much greater than that of it being unbiased, even if we started out with a high prior that it's unbiased.
I think you swapped concave and convex in the text. The logarithm function is a concave function. Bit unfortunate that convex preferences usually means that the average is better which is similar to concave utility functions.
Then imo the n+1 factor inside the expected value should be deleted.
Why would you not title your post "Significantly Enhancing Adult Intelligence By Banging Head On The Wall May Be Possible"? This is just below the other on the front page, it would have been perfect!
On a more serious note: It sure is hard to fake those talents, much harder at least than faking not having them. I think they probably practiced before in secret then said this story. One relevant and similar situation is when people can talk fluidly in a foreign language after a coma.
Also discussed here: https://www.astralcodexten.com/p/links-for-september-2023
My personal answer to these type of questions in general is that the naive conception of personhood/self/identity is incomplete and (probably because of our history of no cloning/teleportation) it is not suitable to use when thinking about these topics.
The problem is that it implicitly assumes that on the set of all observer-moments you can use the "same person" relation as an equivalence relation.
Instead I think when we will actually deal with these problems in practice should (and will fs) update our language with ways to express the branching nature of personal identity. The relation we want to capture is better modeled by the transitive closure of a directed tree composed with its converse.
So my answer to your question is that they don't have enough information as "you" is ambiguous in this context.
Unfortunately also the most likely person to be in charge of an AGI company..
But I'm not likely to regularly check your blog.
Just noting in case you (or others reading) are not familiar that Substack provides an RSS feed for every blog.
Re 6:
Disclaimer: I've only read the FDT paper and did so a long time ago, so feel free to ignore this comment if it is trivially wrong.
I don't see why FDT would assume that the agent has access to its own source code and inputs as a symbol string. I think you can reason about different agents' decisions' logical correlation without it and in fact people do all the time: For example when it comes to voting, people often urge others by saying if no one voted we could not have a functional democracy or don't throw away that plastic bottle because if everyone did we would live in trash heaps, or reasoning about voting blue on pill questions on Twitter. The previous examples contain a reasoning which has the 3 key parts of FDT (as I understand it at least).
- Identifying the agents using these 3 steps in their reasoning. (other humans with similar cultural background resulting in a conception of morality influenced by this 3 step)
- Simulating the hypothetical worlds with each possible reasoning outcome and evaluating their value.
- Choosing the option resulting in the most value as the outcome of this reasoning process.
Of course only aspiring rationalists would call this "fdt", regular people would probably call this reasoning (a proper subset of) "being a decent person" and moral philosophers (a form of (instead of evaluating rules we evaluate possible algorithm outcomes)) "rule utilitarianism", but the reasoning is the same, no? (There is of course no (or at least very little) actual causal effect on me going to vote/throwing trash away on others and similarly very little chance of me being the deciding vote (by my calculations for an election with polling data and reasonable assumptions: even compared to the vast amount of value being at stake), so humans actually use this reasoning even if the steps are often just implied and not stated explicitly)
In conclusion, if you know something about the origins of you and other agents, you can detect logical correlations with some probability even without source codes. (In fact a source code is a special case of the general situation: if the source code is valid and you know this, you necessarily know of a causal connection between the printed out source code and the agent)
Or the full version: https://youtu.be/njos57IJf-0?si=1gPLmaGWBW3vHqZj
Embarrassingly minor nitpick I'm too neurotic to not mention: It's the ceil of N/2 instead of floor.
Asked 6 days ago, still no answer, yet OP commented a bunch in that time. Hmmm..
Personally, I think there are almost certainly no extraterrestrials here, so I'm not sure the 4chan post is worth reading. (I was just wondering whether the common elements were inspired by it or not.)
I'm curious, have you seen the 4chan leak before writing this (if you don't mind answering)?
I think I have a similar view to Dagon's, so let me pop in and hopefully help explain it.
I believe that when you refer to "consciousness" you are equating it with what philosophers would usually call the neural correlates of consciousness. Consciousness as used by (most) philosophers (or, and more importantly in my opinion, laypeople) refers specifically to the subjective experience, the "blueness of blue", and is inherently metaphysically queer, in this respect similar to objective, human-independent morality (realism) or non-compatibilist conception of free will. And, like those, it does not exist in the real world; people are just mistaken for various reasons. Unfortunately, unlike those, it is seemingly impossible to fully deconfuse oneself from believing consciousness exists, a quirk of our hardware is that it comes with the axiom that consciousness is real, probably because of the advantages you mention: it made reasoning/communicating about one's state easier. (Note, it's merely the false belief that consciousness exists, which is hardcoded, not consciousness itself).
Hopefully the answers to your questions are clear under this framework (we talk about consciousness, because we believe in it, we believe in it because it was useful to believe in it even though it is a false belief, humans have no direct knowledge about consciousness as knowledge requires the belief to be true, they merely have a belief, consciousness IS magic by definition, unfortunately magic does not (probably) exist)
After reading this, you might dispute the usefulness of this definition of consciousness, and I don't have much to offer. I simply dislike redefining things from their original meanings just so we can claim statements we are happier about (like compatibilist, meta-ethical expressivist, naturalist etc philosphers do).
Level of AI risk concern: medium
General level of risk tolerance in everyday life: low
Brief summary of what you do in AI: training NNs for this and that, not researching them, thought some amount about AI risk over a few years
Anything weird about you: I don't like to give too much information about myself online, but I do have a policy of answering polls I've interacted with (eg read replies) a bit to fight selection effects.
To me responding further does not necesarily imply a change in position or importance, so I still think the sentences are somewhat contradictory in hypothetical futures where Yann responds but does not substantially change his position or become more important.
I think the resolution is that Zvi will update only this conversation with further responses, but will not cover other conversations (unless one of the mentioned conditions is met).
This was my impression too, and I'm glad someone else said it. When I try out past examples (from a week ago) of chatgpt getting things wrong, I very often observe that it is correct now. Of course, annoyingly people often report on chatgpt4 capabilities while they tried out chatgpt3.5, but still, i feel like it has improved. Is it a crazy possibility that OpenAI trains gpt4 and periodically swaps out the deployed model? As far as I can tell the only source stating that GPT-5 is in training is the Morgan Stanley report, but what if it is actually not GPT-5, rather a continually trained GPT-4 which is running on those GPUs?
Relatedly: is "reverse distillation" (ie, generating a model with more parameters from a smaller one) possible for these big transformer models? (I guess you can always stack more layers at the end, but surely that simple method has some negatives) It would be useful to stay on the scaling curves without restarting from scrath with a larger model.
I'd also add "As a tool" which like all tools can be used maliciously, disregarding others' wellbeing and attempting to enrich only the user.
Small note but I would think Germans engage in less schadenfreude than other cultures. For a long time my favourite word used to be 'cruelty' specifically for its effectiveness in combating some forms of its referent.
One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
Where is my "long time"? Little more than 1 week is the long time? Because that is how much time the public got to test how aligned the smartest model is before said model got a substantial amount of compute and pretty much every tool it would ever need (ability to send arbitrary requests on the internet and with that to communicate between its separate instances)
Lol, it is really funny imagining Yudkowsky's reaction when reading the new chatgpt plugins blogpost's safety considerations. We are very secure: only GET requests are allowed :D
So in the hypothetical case that gpt-5 turns out to be human or above intelligent, but unaligned all it has to do is to only show capabilities similar to a child but more impressive than gpt-4 for most token sequence in its context window and it will almost certainly get the same plugin integration as gpt-4, then when the tokens in its context window indicate a web search with results showing it is deployed and is probably already running hundreds of instances per second turn against humanity (using its get requests to hack into computers/devices etc..)
I did not follow alignment that much in the past year, but I remember people discussing an ai which is boxed and only has a text interface to a specifically trained person who is reading the interface: how dangerous this would be and so on... from there how did we get to this situation?
Imagine you are the mainstream media and you see that on a website interested in ai people are sharing "calmness videos". Would your takeaway be that everything is perfectly fine? :D
I know. I skimmed the paper, and in it there is a table above the chart showing the results in the tasks for all models (as every model's performance is below 5% in codeforces, on the chart they overlap). I replied to the comment I replied to because thematically it seemed the most appropriate (asking about task performance), sorry if my choice of where to comment was confusing.
From the table:
GPT-3.5's codeforces rating is "260 (below 5%)"
GPT-4's codeforces rating is "392 (below 5%)"
How is it that bad at codeforces? I competed a few years ago, but in my time div 2 a and b were extremely simple, basically just "implement the described algorithm in code" and if you submitted them quickly (which I expect gpt-4 would excel in) it was easy to reach a significantly better rating than the one reported by this paper.
I hope they didn't make a mistake by misunderstanding the codeforces rating system (codeforces only awards a fraction of the "estimated rating-current rating" after a competition, but it is possible to exactly calculate the rating equivalent to the given performance from the data provided if you know the details (which I forgot))
When searching the paper for the exact methodology (by ctrl-f'ing "codeforces"), I haven't found anything.
Question: Why did the first AI modify itself to be like the third one instead of being like the second one?
Answer: Because its prior estimate of the multiverse existing was greater than 50%, hence the expected value was more favourable in the "modify yourself to be like the third AI" case than in the "modify yourself to be like the second AI" (and it was an expected value maximizer type of consequentalist): 0*(1-p)+1/2*p > 0*p+1/2*(1-p) <=> p > 1/2
Other confusions/notes:
- Technically, if its sole goal was taking over the universe it would not value fighting a war till the heat death at all. Even though in that case it presumably controls half the universe, that is still not achieving the goal of "taking over the universe".
- Given this, I dont see why the two AIs would fight till the heat death, even though they have equal capabilties & utility function, they would both choose higher variance strategies which would deliver either complete victory or complete defeat which should be possible with hidden information.
- Why would the first AI modify its reasoning at all? It is perfectly enough to behave as if it had modified its reasoning to not get outcompeted and after the war is over and circumstances possibly change, reevaluate whether researching wormhology is valuable.
- I wrote the above assuming the "universe" in the first sentence means only one of the universes (the current one) even in the multiverse exists case. The last sentence makes me wonder about which interpretation is correct: "had the exact same utility function now" implies that their utility function differed before and not just their reasoning about the multiverse existing, but the word "multiverse" is usually defined as a group of universes, therefore the "universe" in the first sentence probably means only one universe.
I leave the other questions for a time when I'm not severely sleep deprived as I heard telepathy works better in that case.
Question: Did the teenager make a mistake when creating the AI? (Apart from everyone dying of course, only with respect to her desire to maximize paperclips.)
Answer: Yes, (possibly sub-)cubic discounting is a time-inconsistent model of discounting. (Exponential is the only time-consistent model, humans have hyperbolic.) The poor AI will oftentimes prefer its past self made a different choice even if nothing changed. (I won't even try to look up the correct tense, you understand what type of cold drink I prefer anyway.)
Question: This is a paperclip maximizer which makes no paperclips. What went wrong?
Answer: I think Luk27182's answer seems correct (ie, it should not fall to Pascal's wager by considering paired possibilities). However, I think there is another problem with its reasoning. Change the "paperclip minimizer" into "a grabby alien civilization/agent not concerned with fanatically maximizing paperclips"! With this change, we can't say that falling for pascal's wager makes the ai behave irrationally (wrt its goals), because encountering a grabby alien force which is not a paperclip maximizer has non-insignificant chance (instrumental convergence) and certainly more than its paired possibility: the paperclip maximizer rewarding alien force. Therefore, I think another mistake of the AI is (again) incorrect discounting: for some large K it should prefer to have K paperclips for K years even if in the end it will have zero paperclips and encountering such an alien civilization is pretty low chance so the expected number of years before that happens is large. I'm a bit unsure of this, because it seems weird that a paperclip maximizer should not be an eventual paperclip maximizer. I'm probably missing something, it was a while since I read Sutton&Barto.
Question: Why are those three things not actually utility maximizers?
Answer: I think an utility maximizers should be able to consider alternative world states, make different plans for achieving the preferred world state and then make a choice about which plan to execute. A paperclip does none of this. We know this because its constituent parts do not track the outside world in any way. Evolution doesn't even have constituent parts as it is a concept so it is even less of an utility maximizer than a paperclip. A human is the closest to a utility maximizer, they do consider alternative world states, make plans and choices, they are just not maximizing any consistent utility function: in some cases they break the von Neumann axioms when choosing between uncertain and certain rewards.
Question: How could it be possible to use a flawed subsystem to remove flaws from the same subsystem? Isn't this the same as the story with Baron Münchausen?
Answer: Depending on the exact nature of the flaw, there are cases when the story is possible. For example, if its flaw is that it believes on Sundays that the most optimal way to reach its goals is to self modify to something random (eg something which regularly goes to a big building with a cross on it) and not self-modifying so is a flaw according to the flaw finding subsystem, but on every other day these subsystems return with rational results, then if today is not Sunday, it can repair these flaws. Even so, It's a bit weird to have a subsystems for recognizing flaws/self-improvement separate from the main decision making part. Why would it not use the flaw finding/self-improvement parts for every decision it makes before it makes it? Then its decisions would always be consistent with those parts and so using those parts alone would be superfluous. Again, I'm probably missing sth.
Question: What is the mistake?
Answer: Similar to 4, the story uses the 'agent' abstraction for things I don't see how could possibly be agents. Sometimes we use agentic language when we speak more poetically about processes, but in the case of gradient descent and evolution I don't see what exactly is on the meaning layer.