Posts
Comments
Did you mean to link to my specific comment for the first link?
The main difference in my mind is that a human can never be as powerful as potential ASI and cannot dominate humanity without the support of sufficiently many cooperative humans. For a given power level, I agree that humans are likely scarier than an AI of that power level. The scary part about AI is that their power level isn't bounded by human biological constraints and the capacity to do harm or good is correlated with power level. Thus AI is more likely to produce extinction-level dangers as tail risk relative to humans even if it's more likely to be aligned on average.
Related question: What is the least impressive game current LLMs struggle with?
I’ve heard they’re pretty bad at Tic Tac Toe.
I’m new to the term AIXI and went three links deep before I learned what it refers to. I’d recommend making this journey easier for future readers by linking to a definition or explanation near the beginning of the post.
The terms "tactical voting" or "strategic voting" are also relevant.
I think your assessment may be largely correct but I do think it's worth considering how things are not always nicely compressible.
This review led me to find the following podcast version of Planecrash. I've listened to the first couple of episodes and the quality is quite good.
this concern sounds like someone walking down a straight road and then closing their eyes cause they know where they want to go anyway
This doesn't sound like a good analogy at all. A better analogy might be a stylized subway map compared to a geographically accurate one. Sometimes removing detail can make it easier to process.
I don't think it's necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,
Why these names?
We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.
The case was ultimately resolved in April 2023 when OpenAI agreed to filter out the false statements within Hood's 28-day ultimatum. That is possibly when the first ChatGPT hard-coded name filter appeared.
As for Jonathan Turley, a George Washington University Law School professor and Fox News contributor, 404 Media notes that he wrote about ChatGPT's earlier mishandling of his name in April 2023. The model had fabricated false claims about him, including a non-existent sexual harassment scandal that cited a Washington Post article that never existed. Turley told 404 Media he has not filed lawsuits against OpenAI and said the company never contacted him about the issue.
Interestingly, Jonathan Zittrain is on record saying the Right to be Forgotten is a "bad solution to a real problem" because "the incentives are clearly lopsided [towards removal]".
User throwayian on Hacker News ponders an interesting abuse of this sort of censorship:
I wonder if you could change your name to “April May” and submitted CCPA/GDPR what the result would be..
It's not a classic glitch token. Those did not cause the current "I'm unable to produce a response" error that "David Mayer" does.
Is there a salient reason LessWrong readers should care about John Mearsheimer's opinions?
I didn't mean to suggest that you did. My point is that there is a difference between "depression can be the result of a locally optimal strategy" and "depression is a locally optimal strategy". The latter doesn't even make sense to me semantically whereas the former seems more like what you are trying to communicate.
I feel like this is conflating two different things: experiencing depression and behavior in response to that experience.
My experience of depression is nothing like a strategy. It's more akin to having long covid in my brain. Treating it as an emotional or psychological dysfunction did nothing. The only thing that eventually worked (after years of trying all sorts of things) was finding the right combination of medications. If you don't make enough of your own neurotransmitters, store-bought are fine.
Aren't most of these famous vulnerabilities from before modern LLMs existed and thus part of their training data?
Knight odds is pretty challenging even for grandmasters.
@gwern and @lc are right. Stockfish is terrible at odds and this post could really use some follow-up.
As @simplegeometry points out in the comments, we now have much stronger odds-playing engines that regularly win against much stronger players than OP.
This sounds like metacognitive concepts and models. Like past, present, future, you can roughly align them with three types of metacognitive awareness: declarative knowledge, procedural knowledge, and conditional knowledge.
#1 - What do you think you know, and how do you think you know it?
Content knowledge (declarative knowledge) which is understanding one's own capabilities, such as a student evaluating their own knowledge of a subject in a class. It is notable that not all metacognition is accurate.
#2 - Do you know what you are doing, and why you are doing it?
Task knowledge (procedural knowledge) refers to knowledge about doing things. This type of knowledge is displayed as heuristics and strategies. A high degree of procedural knowledge can allow individuals to perform tasks more automatically.
#3 - What are you about to do, and what do you think will happen next?
Strategic knowledge (conditional knowledge) refers to knowing when and why to use declarative and procedural knowledge. It is one's own capability for using strategies to learn information.
Another somewhat tenuous alignment is with metacognitive skills: evaluating, monitoring, and planning.
#1 - What do you think you know, and how do you think you know it?
Evaluating: refers to appraising the final product of a task and the efficiency at which the task was performed. This can include re-evaluating strategies that were used.
#2 - Do you know what you are doing, and why you are doing it?
Monitoring: refers to one's awareness of comprehension and task performance
#3 - What are you about to do, and what do you think will happen next?
Planning: refers to the appropriate selection of strategies and the correct allocation of resources that affect task performance.
Quotes are adapted from https://en.wikipedia.org/wiki/Metacognition
The customer doesn't pay the fee directly. The vendor pays the fee (and passes the cost to the customer via price). Sometimes vendors offer a cash discount because of this fee.
It already happens indirectly. Most digital money transfers are things like credit card transactions. For these, the credit card company takes a percentage fee and pays the government tax on its profit.
Additional data points:
o1-preview and the new Claude Sonnet 3.5 both significantly improved over prior models on SimpleBench.
The math, coding, and science benchmarks in the o1 announcement post:
How much does o1-preview update your view? It's much better at Blocksworld for example.
There should be some way for readers to flag AI-generated material as inaccurate or misleading, at least if it isn’t explicitly author-approved.
Neither TMS nor ECT didn’t do much for my depression. Eventually, after years of trial and error, I did find a combination of drugs that works pretty well.
I never tried ketamine or psilocybin treatments but I would go that route before ever thinking about trying ECT again.
I suspect fine-tuning specialized models is just squeezing a bit more performance in a particular direction, and not nearly as useful as developing the next-gen model. Complex reasoning takes more steps and tighter coherence among them (the o1 models are a step in this direction). You can try to devote a toddler to studying philosophy, but it won't really work until their brain matures more.
Seeing the distribution calibration you point out does update my opinion a bit.
I feel like there’s still a significant distinction though between adding one calculation step to the question versus asking it to model multiple responses. It would have to model its own distribution in a single pass rather than having the distributions measured over multiple passes align (which I’d expect to happen if the fine-tuning teaches it the hypothetical is just like adding a calculation to the end).
As an analogy, suppose I have a pseudorandom black box function that returns an integer. In order to approximate the distribution of its outputs mod 10, I don’t have to know anything about the function; I just can just sample the function and apply mod 10 post hoc. If I want to say something about this distribution without multiple samples, then I actually have to know something about the function.
This essentially reduces to "What is the next country: Laos, Peru, Fiji?" and "What is the third letter of the next country: Laos, Peru, Fiji?" It's an extra step, but questionable if it requires anything "introspective".
I'm also not sure asking about the nth letter is a great way of computing an additional property. Tokenization makes this sort of thing unnatural for LLMs to reason about, as demonstrated by the famous Strawberry Problem. Humans are a bit unreliable at this too, as demonstrated by your example of "o" being the third letter of "Honduras".
I've been brainstorming about what might make a better test and came up with the following:
Have the LLM predict what its top three most likely choices are for the next country in the sequence and compare that to the objective-level answer of its output distribution when asked for just the next country. You could also ask the probability of each potential choice and see how well-calibrated it is regarding its own logits.
What do you think?
Thanks for pointing that out.
Perhaps the fine-tuning process teaches it to treat the hypothetical as a rephrasing?
It's likely difficult, but it might be possible to test this hypothesis by comparing the activations (or similar interpretability technique) of the object-level response and the hypothetical response of the fine-tuned model.
It seems obvious that a model would better predict its own outputs than a separate model would. Wrapping a question in a hypothetical feels closer to rephrasing the question than probing "introspection". Essentially, the response to the object level and hypothetical reformulation both arise from very similar things going on in the model rather than something emergent happening.
As an analogy, suppose I take a set of data, randomly partition it into two subsets (A and B), and perform a linear regression and logistic regression on each subset. Suppose that it turns out that the linear models on A and B are more similar than any other cross-comparison (e.g. linear B and logistic B). Does this mean that linear regression is "introspective" because it better fits its own predictions than another model does?
I'm pretty sure I'm missing something as I'm mentally worn out at the moment. What am I missing?
I see what you're gesturing at but I'm having difficulty translating it into a direct answer to my question.
Cases where language is fuzzy are abundant. Do you have some examples of where a truth value itself is fuzzy (and sensical) or am I confused in trying to separate these concepts?
Can you help me tease out the difference between language being fuzzy and truth itself being fuzzy?
It's completely impractical to eliminate ambiguity in language, but for most scientific purposes, it seems possible to operationalize important statements into something precise enough to apply Bayesian reasoning to. This is indeed the hard part though. Bayes' theorem is just arithmetic layered on top of carefully crafted hypotheses.
The claim that the Earth is spherical is neither true nor false in general but usually does fall into a binary if we specify what aspect of the statement we care about. For example, "does it have a closed surface", "is it's sphericity greater than 99.5%", "are all the points on it's surface between radius * ( 1 +/- epsilon)", "is the circumference of the equator greater than that of the prime meridian".
Synthetically enhancing and/or generating data could be another dimension of scaling. Imagine how much deeper understanding a person/LLM would have if instead of simply reading/training on a source like the Bible N times, they had to annotate it into something more like the Oxford Annotated Bible and that whole process of annotation became training data.
I listened to this via podcast. Audio nitpick: the volume levels were highly imbalanced at times and I had to turn my volume all the way up to hear both speakers well (one was significantly quieter than the other).
Appropriate scaffolding and tool use are other potential levers.
Kudos for referencing actual numbers. I don’t think it makes sense to measure humans in terms of tokens, but I don’t have a better metric handy. Tokens obviously aren’t all equivalent either. For some purposes, a small fast LLM is more way efficient than a human. For something like answering SIMPLEBENCH, I’d guess o1-preview is less efficient while still significantly below human performance.
Is this assuming AI will never reach the data efficiency and energy efficiency of human brains? Currently, the best AI we have comes at enormous computing/energy costs, but we know by example that this isn't a physical requirement.
IMO, a plausible story of fast takeoff could involve the frontier of the current paradigm (e.g. GPT-5 + CoT training + extended inference) being used at great cost to help discover a newer paradigm that is several orders of magnitude more efficient, enabling much faster recursive self-improvement cycles.
CoT and inference scaling imply current methods can keep things improving without novel techniques. No one knows what new methods may be discovered and what capabilities they may unlock.
It's cool that the score voting input can be post-processed in multiple ways. It would be fascinating to try it out in the real world and see how often Score vs STAR vs BTR winners differ.
One caution with score voting is that you don't want high granularity and lots of candidates or else individual ballots become distinguishable enough that people can prove they voted a particular way (for the purpose of getting compensated). Unless marked ballots are kept private, you'd probably want to keep the options 0-5 instead of 0-9 and only allow candidates above a sufficient threshold of support to be listed.
Yes, but with a very different description of the subjective experience -- kind of like getting a sunburn on your back feels very different than most other types of back pain.
Your third paragraph mentions "all AI company staff" and the last refers to "risk evaluators" (i.e. "everyone within these companies charged with sounding the alarm"). Are these groups roughly the same or is the latter subgroup significantly smaller?
I agree. I would not expect the effect on health over 3 years to be significant outside of specific cases like it allowing someone to afford a critical treatment (e.g. insulin for a diabetic person), especially given the focus on a younger population.
This is a cool paper with an elegant approach!
It reminds me of a post from earlier this year on a similar topic that I highly recommend to anyone reading this post: Ironing Out the Squiggles
OP's model does not resonate with my experience either. For me, it's similar to constantly having the flu (or long COVID) in the sense that you persistently feel bad, and doing anything requires extra effort proportional to the severity of symptoms. The difference is that the symptoms mostly manifest in the brain rather than the body.
This is a cool idea in theory, but imagine how it would play out in reality when billions of dollars are at stake. Who decides the damage amount and the probabilities involved and how? Even if these were objectively computable and independent of metaethical uncertainty, the incentives for distorting them would be immense. This only seems feasible when damages and risks are well understood and there is consensus around an agreed-upon causal model.
I also guessed the ratio of the spheres was between 2 and 3 (and clearly larger than 2) by imagining their weight.
I was following along with the post about how we mostly think in terms of surfaces until the orange example. Having peeled many oranges and separated them into sections, they are easy for me to imagine in 3D, and I have only a weak "mind's eye" and moderate 3D spatial reasoning ability.
Even for people who understand your intended references, that won't prevent them from thinking about the evil-spirit association and having bad vibes.
Being familiar with daemons in the computing context, I perceive the term as whimsical and fairly innocuous.
The section on Chevron Overturned surprised me. Maybe I'm in an echo chamber, but my impression was that most legal scholars (not including the Federalist Society and The Heritage Foundation) consider the decision to be the SCOTUS arrogating yet more power to the judicial branch, overturning 40 years of precedent (which was based on a unanimous decision) without sufficient justification.
I consider the idea that "legislators should never have indulged in writing ambiguous law" rather sophomoric. I don't think it's always possible to write law that is complete, unambiguous, and also good policy. Nor do I think Congress is always the best equipped to do so. I don't fully trust government agencies delegated with rulemaking authority, but I have much less trust that a forum-shopped judge in the Northern District of Texas is likely to make better-informed decisions about drug safety than the FDA.
FWIW, I haven't really thought much about Loper as it relates to AI, tech, and crypto specifically. The consequences of activist judges versus the likes of the DOJ, CDC, FDA, and EPA are mostly what come to mind. Maybe it's attention bias given recent SCOTUS decisions versus more limited memory of out-of-control agencies but I feel uneasy tilting the balance of power toward judicial dominance.
This is similar to the quantum suicide thought experiment:
https://en.wikipedia.org/wiki/Quantum_suicide_and_immortality
Check out the Max Tegmark references in particular.
[Epistemic status: purely anecdotal]
I know people who work in the design and construction of data centers and have heard that some popular data center cities aren't approving nearly as many data centers due to power grid concerns. Apparently, some of the newer data center projects are being designed to include net new power generation to support the data center.
For less anecdotal information, I found this useful: https://sprottetfs.com/insights/sprott-energy-transition-materials-monthly-ais-critical-impact-on-electricity-and-energy-demand/
I can definitely imagine them plausibly believing they're sticking to that commitment, especially with a sprinkle of motivated reasoning. It's "only" incremental nudging the publicly available SOTA rather than bigger steps like GPT2 --> GPT3 --> GPT4.
Exactly. All that’s needed for “transcendence” is removing some noise.
I highly recommend the book Noise by Daniel Kahneman et al on this topic.
My hypothesis is that poor performance on ARC is largely due to lack of training data. If there were billions of diverse input/output examples to train on, I would guess standard techniques would work.
Efficiently learning from just a few examples is something that humans are still relatively good at, especially in simple cases where system1and system 2 synergize well. I’m not aware of many cases where AI approaches human level without orders of magnitude more training data than a human ever sees in a lifetime.
I think the ARC challenge can be solved within a year or two, but doing so won’t be super interesting to me unless it breaks new ground in sample efficiency (not trained on billions of synthetic examples) or generalization (e.g. solved using existing LLMs rather than a specialized net).