Posts
Comments
For every token, model activations are computed once when the token is encountered and then never explicitly revised -> "only [seems like it] goes in one direction"
with the only recursive element of its thought being that it can pass 16 bits to its next running
I would name activations for all previous tokens as the relevant "element of thought" here that gets passed, and this can be gigabytes.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
Among them, one I found especially peculiar is that I distinctly started feeling some sort of sensations outside of my body.
I had this, and it lasted for a year after the retreat. I also found that there's a strong tendency for the sensations to happen in the area you described.
I could feel sensations substantially outside of the area accessible to my hands too, but they were a bit more difficult to feel. They could correspond to priors for tactile-like affordances for objects at a distance (e.g. graspability of a cup, or speed of a fast-moving vehicle) that are readily constructed by ordinary perception.
Seems related to https://www.lesswrong.com/posts/qpgkttrxkvGrH9BRr/superintelligence-is-not-omniscience, https://www.lesswrong.com/posts/epgCXiv3Yy3qgcsys/you-can-t-predict-a-game-of-pinball, and similar objections might be applicable.
I thought a bit about datasets before and to me it seems like what needs collecting most is detailed personal preference datasets. E.g. input-output examples of how you generally prefer information to be filtered, processed, communicated to you, refined with your inputs; what are your success criteria for tasks, where are the places in your day flow / thought flow where the thing needs to actively intervene and correct you. Especially in those places where you feel you can benefit from cognitive extensions most, based on your bottlenecks. It could initially be too hard to infer from screen logs alone.
Random idea about preventing model stealing. After finetuning a mixture of experts model with your magic sauce, place the trained experts on geographically distinct servers with heterogeneous tech stacks and security systems to avoid common vulnerabilities. Horcrux vibes
Vaguely related paper: Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models is an early attempt to prevent models from being re-purposed via fine-tuning.
It doesn't seem like a meaningfully positive result. For example, all their plots only track finetuning on up to 200 examples. I imagine they might have even had clear negative results in conditions with >200 examples available for finetuning. After 50-100 examples, the gap between normal finetuning and finetuning from random init, even though still small, grows fast. There are also no plots with x-axis = finetuning iterations. When they optimize for "non-finetunability", they don't aim to maintain the language modeling performance, instead, they only impose the constraint of "maintaining finetunability" on one downstream "professions detection task".
I expect naive solutions to continue to work very poorly on this problem.
I think "on most cognitive tasks" means for an AGI its t is defined as the first t for which it meets the expert level at most tasks. However, what exactly counts as a cognitive task does seem to introduce ambiguity and would be cool to clarify, e.g. by pointing to a clear protocol for sampling all such task descriptions from an LLM.
Several-months-AGI is required to be coherent in the sense of coherence defined with human experts today. I think this is pretty distinct from coherence that humans were being optimized to have before behavioral modernity (50K years ago).
I agree that evolution optimized hard for some kind of coherence, like persistent self-schema, attitudes, emotional and behavioral patterns, attachments, long-term memory access. But what humans have going for them is the combination of this prior coherence and just 50K years of evolution after humans unlocked access to the abstract thinking toolkit. I don't think we can expect it to enable much in terms of to the ability to coherently plan to do complex tasks or to the ability to write and reason abstractly.
This makes me think humans struggling at coherence is not good evidence for building agents with large t being much more difficult compared to small t: there wasn't enough optimization pressure.
on most cognitive tasks, it beats most human experts
I think this specifies both thresholds to be 50%.
It doesn't seem like "shorter timelines" in the safest quadrant has much to do with their current strategy, as they have a gpt-4 paper section on how they postponed the release to reduce acceleration.
https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need was vaguely similar
Relevant recent paper: https://www.lesswrong.com/posts/8F4dXYriqbsom46x5/pretraining-language-models-with-human-preferences
why it is so good in general (GPT-4)
What are the examples indicating it's at the level of performance at complex tasks you would expect from GPT-4? Especially performance which is clearly attributable to improvements that we expect to be made in GPT-4? I looked through a bunch of screenshots but haven't seen any so far.
Can confirm I consistently had non-deterministic temp-0 completions on older davinci models accessed through the API last year.
Bloomberg reported on plans to invest $10B today
Have you seen this implemented in any blogging platform other people can use? I'd love to see this feature implemented in some Obsidian publishing solution like quartz, but for now they mostly don't care about access management.
Wow, Zvi example is basically what I've been doing recently with hyperbolic discounting too after I've spent a fair amount of time thinking about Joe Carlsmith—Can you control the past. It seems to work. "It gives me a lot of the kind of evidence about my future behavior that I like" is now the dominant reason behind certain decisions.
How much time do you expect the form, the coding test, and the interview to take for an applicant?
This idea tries to discover translations between the representations of two neural networks, but without necessarily discovering a translation into our representations.
I think this has been under investigation for a few years in the context of model fusion in federated learning, model stitching, and translation between latent representations in general.
Relative representations enable zero-shot latent space communication - an analytical approach to matching representations (though this is a new work, it may be not that good, I haven't checked)
Git Re-Basin: Merging Models modulo Permutation Symmetries - recent model stitching work with some nice results
Latent Translation: Crossing Modalities by Bridging Generative Models - some random application of unsupervised translation to translation between autoencoder latent codes (probably not the most representative example)
I don't expect Putin to use your interpretation of "d" instead of his own interpretation of it which he is publicly advertising whenever he has a big public speech on the topic.
From the latest speech:
> In the 80s they had another crisis they solved by "plundering our country". Now they want to solve their problems by "breaking Russia".
This directly references an existential threat.
From the speech a week ago:
> The goal of that part of the West is to weaken, divide and ultimately destroy our country. They are saying openly now that in 1991 they managed to split up the Soviet Union and now is the time to do the same to Russia, which must be divided into numerous regions that would be at deadly feud with each other.
Same.
Also, consider nuclear false flags—the frame for them, including in these same speeches, was created and maintained throughout the entire year.
From my experience of playing VR games on mobile devices (Quest 1 and Quest 2), the majority of in-game characters look much better than this and it doesn't impact the framerate at all. This seems like a 100% stylistic choice.
"... the existing literature on the influence of dopamine enhancing agents on working memory provides reasonable support for the hypothesis that augmenting dopamine function can improve working memory."
—Pharmacological manipulation of human working memory, 2003
Is it supposed to be helping working memory?
I'd be really interested in a head-to-head comparison with R on a bunch of real-world examples of writing down beliefs that were not selected to favor either R or Squiggle. R because at least in part specifying and manipulating distributions seems to require less boilerplate than in Python.
I wonder what happens when you ask it to generate
> "in the style of a popular modern artist <unknown name>"
or
> "in the style of <random word stem>ism".
You could generate both types of prompts with GPT-3 if you wanted so it would be a complete pipeline.
"Generate conditioned on the new style description" may be ready to be used even if "generate conditioned on an instruction to generate something new" is not. This is why a decomposition into new style description + image conditioned on it seems useful.
If this is successful, then more of the high-level idea generation involved can be shifted onto a language model by letting it output a style description. Leave blanks in it and run it for each blank, while ensuring generations form a coherent story.
>"<new style name>, sometimes referred to as <shortened version>, is a style of design, visual arts, <another area>, <another area> that first appeared in <country> after <event>. It influenced the design of <objects>, <objects>, <more objects>. <new style name> combined <combinatorial style characteristic> and <another style characteristic>. During its heyday, it represented <area of human life>, <emotion>, <emotion> and <attitude> towards <event>."
DALL-E can already model the distribution of possible contexts (image backgrounds, other objects, states of the object) + possible prompt meanings. An go from the description 1) to high-level concepts, 2) to ideas for implementing these concepts (relative placement of objects, ideas for how to merge concepts), 3) to low-level details. All within 1 forward pass, for all prompts! This is what astonished me most about DALL-E 1.
Importantly, placing, implementing, and combining concepts in a picture is done in a novel way without a provided specification. For style generation, it would need to model a distribution over all possible styles and use each style, all without a style specification. This doesn't seem much harder to me and could probably be achieved with slightly different training. The procedure I described is just supposed to introduce helpful stochasticity in the prompt and use an established generation conduit.
I wonder if the macronutrient rates shifted. This would influence the total calories you end up with because absorption rates are different for different macronutrients. How the food is processed also influences absorption (as well as the total amount of calories that may not be reflected on the package).
If these factors changed, calories today don't mean exactly the same thing as calories in 1970.
Since FDA allows a substantial margin of error for calories, maybe producers also developed a bias that allows them to stay within this margin of error but show fewer calories on the package?
Maybe this is all controlled for in studies, dunno, I just did a couple of google searches and had these questions.
I could imagine that OpenAI getting top talent to ensure their level of research achievements while also filtering people they hire by their seriousness about reducing civilization-level risks is too hard. Or at least it could easily have been infeasible 4 years ago.
I know a couple of people at DeepMind and none of them have reducing civilization-level risks as one of their primary motivations for working there, as I believe is the case with most of DeepMind.
I have an argument for capabilities research being good but with different assumptions. The assumption that's different is that we would progress rapidly towards AGI capabilities (say, in 10 years).
If we agree 95% of progress towards alignment happens very close to the AGI, then the duration of the interval between almost-AGI and AGI is the most important duration.
Suppose the ratio of capabilities research to alignment research is low (probably what most people here want). Then AI researchers and deployers will have an option say "Look, so many resources were put towards safety already, it's actually fine, we're employing the 2027 comprehensive robustness benchmarks, and IDA+, in fact our quality assurance team is implementing it right now, no need to worry", prompting decision-makers to relax and let it go. Almost-AGI -> AGI interval is 2 years.
On the other hand, if it's high, this may cause decision-makers to freak out when they have their almost-AGI on the table and contain the development (e.g. with regulation). This may primarily be mediated via easier-to-avoid public failures and accidents. Or by AI safety people quickly and loudly demonstrating that we don't yet have the tools to avoid even these easier-to-avoid failures. Then regulation extends the Almost-AGI -> AGI interval to 8 years.
The point is that this is 4x more time to work on 95% of safety research progress.
- When you say that coherent optimizers are doing some bad thing, do you imply that it would always be a bad decision for the AI to make the goal stable? But wouldn't it heavily depend on what other options it thinks it has, and in some cases maybe worth the shot? If such a decision problem is presented to the AI even once, it doesn't seem good.
- The stability of the value function seems like something multidimensional, so perhaps it doesn't immediately turn into a 100% hardcore explicit optimizer forever, but there is at least some stabilization. In particular, bottom-up signals that change the value function most drastically may be blocked.
- AI can make its value function more stable to external changes, but it can also make it more malleable internally to partially compensate for Goodharting. The end result for outside actors though is that it only gets harder to change anything.
- Edit: BTW, I've read some LW articles on Goodharting but I'm also not yet convinced it will be such a huge problem at superhuman capability levels - seems uncertain to me. Some factors may make it worse as you get there (complexity of the domain, dimensionality of the space of solutions), and some factors may make it better (the better you model the world, the better you can optimize for the true target). For instance, as the model gets smarter, the problems from your examples seem to be eliminated: in 1, it would optimize end-to-end, and in 2, the quality of the decisions would grow (if the model had access to the ground truth value function all along, then it would grow because of better world models and better tree search for decision-making). If the model has to check-in and use feedback from the external process (human values) to not stray off course, then as it's smarter it's discovering a more efficient way to collect the feedback, has better priors, etc.
Every other day I have a bunch of random questions related to AI safety research pop up but I'm not sure where to ask them. Can you recommend any place where I can send these questions and consistently get at least half of them answered or discussed by people who are also thinking about it a lot? Sort of like an AI safety StackExchange (except there's no such thing), or a high-volume chat/discord. I initially thought about LW shortform submissions, but it doesn't really look like people are using the shortform for asking questions at all.
But the mere fact that one network may be useful for many tasks at once has been extensively investigated since 1990s.
To receive epistemic credit, make sure that people would know you haven't made all possible predictions on a topic this way and then revealed the right one after the fact. You can probably publish plaintext metadata for this.
An update on Israel:
> Citizenship is typically granted 3 months after arrival; you can fill out a simple form to waive this waiting period, however.
I think it's not the case, because you receive an internal ID of a citizen immediately after a document check, but they only give you a passport you can use for visas after 3 months (which you can also spend outside the country).
Waiving the waiting period is possible in 2022, but you have to be smart about it and go to exactly the right place to do it (because many local governments are against it).
> Israel has mandatory conscription into its military if you are under 28 years old and residing in the country.
No, for people receiving a citizenship it's under 22 and not under 28.
> Israel has worldwide taxation
I don't think so? It only taxes you if you are considered to be residing in Israel. There may have some recent (2021-2022) exceptions related to social security, but the amount of tax there is very small.
More cons:
- you are prohibited from entering some countries by Israel. Iran, Saudi Arabia, Palestine areas, etc.
- if you don't live in Israel, you still have a citizenship but it's sort of crippled - you can't vote, you can only get a more restricted type of passport, you don't have a way to quickly restore access to the medical system
> A monthly payment for a number of years if you reside in Israel (I think for a single individual this was about $300 a month)
I think the base pay currently is around $900/mo for 6 months. It recently started to be ~doubled to $1800 for a very significant share of people making aliyah, counting all bonuses, but this may not be applicable to everyone and they may remove these additional bonuses from July.
Actually, the Metaculus community prediction has a recency bias:
> approximately sqrt(n) new predictions need to happen in order to substantially change the Community Prediction on a question that already has n players predicting.
In this case, n=298, the prediction should change substantially after sqrt(n)=18 new predictions (usually it takes up to a few days). Over the past week, there were almost this many predictions and the AGI community median has shifted 2043 -> 2039, and the 30th percentile is 8 years.
No disagreements here; I just want to note that if "the EA community" waits too long for such a pivot, at some point AI labs will probably be faced with people from the general population protesting because even now a substantial share of the US population views the AI progress in a very negative light. Even if these protests don't accomplish anything directly, they might indirectly affect any future efforts. For example, an EA-run fire alarm might be compromised a bit because the memetic ground would already be captured. In this case, the concept of "AI risk" would, in the minds of AI researchers, shift from "obscure overconfident hypotheticals of a nerdy philosophy" to "people with different demographics, fewer years of education, and a different political party than us being totally unreasonable over something that we understand far better".
ICML 2022 reviews dropped this week.
"What if outer space were udon" (CLIP guided diffusion did really well, this is cherry-picked though: https://twitter.com/nshepperd1/status/1479118002310180882)
"colourless green ideas sleep furiously"
Are PaLM outputs cherry-picked?
I reread the description of the experiment and I'm still unsure.
The protocol is on page 37 goes like this:
- the 2-shot exemplars used for few-shot learning were not selected or modified based on model output. I infer this from the line "the full exemplar prompts were written before any examples were evaluated, and were never modified based on the examination of the model output".
- greedy decoding is used, so they couldn't filter outputs given a prompt.
What about the queries (full prompt without the QAQA few-shot data part)? Are they included under "the full exemplar prompts" or not? If they are there's no output selection, if they aren't the outputs could be strongly selected with the selection magnitude unreported. On one hand, "full prompts" should refer to full prompts. On the other hand, they only use "exemplar" when talking about the QAQA part they prepend to every query versus "evaluated example" meaning the query.
These games are really engaging for me and haven't been named:
Eleven Table Tennis. Ping-pong in VR (+ multiplayer and tournaments):
Racket NX. This one is much easier but you still move around a fair bit. The game is "Use the racket to hit the ball" as well.
Synth Riders. An easier and more chill Beat Saber-like game:
Holopoint. Archery + squats, gets very challenging on later levels:
Some gameplay videos for excellent games that have been named:
Beat Saber. "The VR game". You can load songs from the community library using mods.
Thrill of the Fight (boxing):
You can buy fladrafinil or flmodafinil without any process (see reddit for reports, seems to work much better than adrafinil)
One thing you probably won't find in an evidence review is that it feels more pleasant for me to type in Colemak rather than in QWERTY years after I made the switch. That's a pretty huge factor as well considering that we put so many hours into typing.
I would also highlight this as seemingly by far the most wrong point. Consider how many Omicron cases we now have and we still don't know for sure it's significantly less severe. Now consider how many secret cases in humans infected with various novel strains you're working with you would need to enact in a controlled environment to be confident enough that a given strain is less severe and thus it makes sense to release it.
Does anyone have a good model of how do they reconcile
1) a pretty large psychosis rate in this survey, a bunch of people in https://www.lesswrong.com/posts/MnFqyPLqbiKL8nSR7/my-experience-at-and-around-miri-and-cfar-inspired-by-zoe saying that their friends got mental health issues after using psychedelics, anecdotal experiences and stories about psychedelic-induced psychosis in the general cultural field
and
2) Studies https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3747247/ https://journals.sagepub.com/doi/10.1177/0269881114568039 finding no correlation, or, in some cases, negative correlation between psychedelic consumption and mental health issues?
- Studies done wrong?
- Studies don't have enough statistical power?
- Something with confounding and Simpson's paradox? Maybe there's a particular subgroup in the population where psychedelic use correlates negatively with likelihood of mental health issues within this subgroup, or a subgroup where there is more psychedelic use and simultaneously lower likelihood of mental health issues on average across the subgroup?
- Psychedelics impart mental well-being and resilience to some people to such a degree that it cancels out the negative mental health effects in other people, so that in expectation psychedelics wouldn't affect your mental health negatively?
"Training takes between 24 and 48 hours for most models"; I assumed both are trained within 48 hours (even though this is not precise and may be incorrect).
Ohh OK I think since I wrote "512 TPU cores" it's 512x512, because in Appendix C here https://arxiv.org/pdf/1809.11096.pdf they say it corresponds to 512x512.
It should be referenced here in Figure 1: https://arxiv.org/pdf/2006.16668.pdf
"I have heard that they get the details wrong though, and the fact that they [Groq] are still adversing their ResNet-50 performance (a 2015 era network) speaks to that."
I'm not sure I fully get this criticism: ResNet-50 is the most standard image recognition benchmark and unsurprisingly it's the only (?) architecture that NVIDIA lists in their benchmarking stats for image recognition as well: https://developer.nvidia.com/deep-learning-performance-training-inference.
This is a very neat idea, is there any easy way to enable this for Android and Google Calendar notifications? I guess not