Posts
Comments
I bought a cheap watch : twatch 2020 that has wifi and a microphone. The goal is to have an easily accessible langchain agent connected to my localai.
I'm a bit stuck for now because of a driver in C while I know mostly python but I'm getting there.
You meant speech to text instead of text to speech. They just added the latter recently but we don't know the model behind it afaik
Na. Although you can see patient having binge that you then understand were just one bigmac, indicating something closer to anorexia.
The suicide rate is about 2% per 10 year which is insanely high. Also it is not uncommon for people with bulemia to have (sometimes severe) deficiencies regardless of their weight.
To add some perspective : I suspect some people don't really understand how large the caloric intake can be in boulemia. I routinely see patients eating upwards of 50 000 calories (even saw 100 000 a few times) per day when crisis occur. Things like eating several large peanut butter jars in a row etc
- The only difference between encoder and decoder transformers is the attention mask. In an encoder, future tokens can attend to past tokens (acausal), while in a decoder, future tokens cannot attend to past tokens (causal attention). The term "decoder" is used because decoders can be used to generate text, while encoders cannot (since you can only run an encoder if you know the full input already).
This was very helpful to me. Thank you.
Hi,
I had a question the other day and figured I'll post it here. Do we have any idea what would happen if we used the steering vector of the input itself?
For example : Take sentenceA, pass it through the LLM, store its embedding, take once again sentenceA, pass it through the LLM while adding the embedding.
As is, this would simply double the length of the hidden vector, but I'm wondering what would happen if we took played instead with the embedding say after the 5th token of sentenceA and add it at the 3rd token.
Similarly, would anything interesting happen with substraction? with adding a random orthogonal vector?
Thanks
Personnaly I come (and organize) meetups to make my brain sweat and actively avoid activities that leave me unchanged (I won't change much during a play while I grow a lot after each confrontation or discussion). But to each their own of course!
FWIW I tend to see a good part of ADHD medication's effect as changing the trade off between exploration and exploitation. ADHD being an excess of eploration, the meds nudging towards excess of exploitation. If you struggle with a perceived excess of exploration, you might ask yourself if you are helped by taking those medication or if you might fit the diagnostic criteria.
Related : Taking too much of those psychostimulants gives usually an extreme type of exploitation often called "tunnel vision", which can be detrimental as it feels like being a robot doing something on repeat.
Also : branched thinking is not only related to ADHD but also to people with unusually large IQ. So let me just stress that YMMV
Also2 : another interesting thread about ADHD the other day : https://www.lesswrong.com/posts/zi6Qq5jeWat8xFodq/what-works-for-adhd-and-or-related-things
That sounds like something easy to do with langchain btw
edit: I can make the prompt more or less compressed easily, just ask. The present example is "pretty compressed" but I can make a more verbose one
Not really what you're asking but :
I'm coincidentally working on the side on a DIY summarizer to manage my inputs. I summarized a bit of the beginning of part 1. If you think it has any value I can run the whole thing :
note that '- ---' indicate the switch to a new chunk of text by the llm
This is formatted as a logseq / obsidian markdown format.
- Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment - https://youtube.com/watch?v=_kRg-ZP1vQc
summarization_date:: 28/06/2023
token_cost:: 12057
dollar_cost:: 0.01911
summary_reading_length:: 4.505
doc_reading_length:: 120.5025
author:: Dwarkesh Patel
- Carl Shulman: highly regarded intellectual known for ideas on intelligence explosion and its impacts
- Advisor to Open Philanthropy project
- Research associate at Future of Humanity Institute at Oxford
- Feedback loops and dynamics when approaching human-level intelligence involve:
- Development of new computer chips, software, and training runs
- Concept of input-output curves important in understanding increasing difficulty of improving AI
- Productivity of computing has increased significantly over the years, but investment and labor required for advancements have also increased
- In a world where AI is doing the work, doubling computing performance translates to a doubling or better of effective labor supply
- Doubling labor force can result in several doublings of compute, accelerating AI development
- Bloom paper mentioned:
- 35% increase in transistor density
- 7% increase per year in number of researchers required to sustain that pace
- ---
- The bloom paper mentioned:
- 35% increase in transistor density
- 7% increase per year in the number of researchers required to sustain that pace
- There is a question of whether AI can be seen as a population of researchers that grows with compute itself.
- Compute is a good proxy for the number of AI researchers because:
- If you have an AI worker that can substitute for a human, having twice as many computers allows for running two separate instances and getting more gains.
- Improvements in hardware and software technology contribute to the progress of AI.
- The work involved in designing new hardware and software is done by people, but computer time is not the primary cost.
- The number of people working on AI research is in the low tens of thousands, with companies like Nvidia, TSMC, and DeepMind having significant numbers of employees.
- The capabilities of AI are doubling on a shorter time scale than the number of people required to develop them.
- ---
- The capabilities of AI are doubling faster than the number of people needed to develop them.
- Hardware efficiency has historically doubled 4-5 times per doubling of human inputs, but this rate has slowed down as Moore's Law nears its end.
- On the software side, the doubling time for workers driving software advances is several years, while the doubling time for effective compute from algorithmic progress is faster.
- Epoch, a group that collects datasets relevant to forecasting AI progress, found the following doubling times:
- Hardware efficiency doubles in about 2 years.
- Budget growth doubles in about 6 months.
- Algorithmic progress doubles in less than 1 year.
- The growth of effective compute for training big AIs is drastic, with estimates that GPT-4 cost around 50 million dollars to train.
- Effective compute can increase through greater investment, better models, or cheaper training chips.
- Software progress is measured by the reduction in compute needed to achieve the same benchmark as before.
- The feedback loop between AI and compute can help with hardware design and chip improvements.
- Automating chip design work could lead to faster improvements, but it is less important for the intelligence explosion.
- ---
- Improving chip design through AI automation is less important for the intelligence explosion because it only applies to future chips.
- Faster improvements can be achieved through AI automation.
- The most disruptive and important aspect of AI automation is on the software side.
- Improvements can be immediately applied to existing GPUs.
- The question is when AI will contribute significantly to AI progress and software development.
- This contribution could be equivalent to having additional researchers.
- The magnitude of AI's contribution is crucial.
- It should boost effective productivity by 50-100% or more.
- AI can automate certain tasks in the AI research process.
- This allows for more frequent and cost-effective completion of these tasks.
- The goal is to have AI that can significantly enhance performance.
- This is even with its weaknesses, rather than achieving human-level AI with no weaknesses.
- Existing fabs can produce tens of millions of advanced GPUs per year.
- If they run AI software as efficient as humans, with extended work hours and education, it can greatly surpass human capabilities.
- ---
- The education level of AI models surpasses that of humans and focuses on specific tasks.
- Tens of millions of GPUs, each equivalent to the work of the best humans, contribute to significant discoveries and technological advancements.
- Human-level AI is currently experiencing an intelligence explosion, starting from a weaker state.
- The feedback loop for AI researchers begins when they surpass small productivity increases and reach a level equivalent to or close to human researchers.
- AI systems can compensate for their weaknesses by deploying multiple less intelligent AIs to match the capabilities of a human worker.
- AI can be applied to tasks such as voting algorithms, deep search, and designing synthetic training data, which would be impractical for humans.
- As AI becomes more advanced, it can generate its own data and identify valuable skills to practice.
- For instance, AlphaZero generated its own data through self-play and followed a curriculum to always compete against an opponent of equal skill.
That would most certainly cause a bad trip at night. As taking uppers to stay awake for long will also increase anxiety, which will not be helped by the residual hallucinations from the earlier hallucinogenic.
In my experience opinion. A good deal of bad trips are actually caused by being sleep deprived.
Can't check currently but IIRC there is a marked neurotoxicity cause by too much cholinergic activity during mania, leading to quicker than average dementia onset and proportional to time spent in mania. Might be controversial among specialist. Might not apply to hypomania but be a useful prior none the less. I recommend the website elicit to quickly reduce uncertainty on this question.
Edit: also related to wether putting everyone on at least a low adderall dose might be a good thing
edit: rereading your above comments. I see that I should have made clear that I was thinking more about learned architectures. In which case we apparently agree is I meant what you said in https://www.lesswrong.com/posts/ftEvHLAXia8Cm9W5a/data-and-tokens-a-30-year-old-human-trains-on?commentId=4QtpAo3XXsbeWt4NC
Thank your for taking the time.
I agree that it's probably terminology that is the culprit here. It's entirely my fault: I was using the word pretraining loosely and meant more something like that hyper parameters (number of layers, inputs, outputs, activation fn, loss) are "learned" by evolution. Leaving to us poor creatures only the task to prune neurons and adjust the synaptic weights.
The reason I was thinking at it this way is that I've been reading about NEAT recently, an algorithm that uses a genetic algorithm to learn an architecture as well as train selected architecture. A bit like evolution?
To rephrase my initial point: evolution does its part of the heavy lifting for finding the right brain to live on earth. This shrinks tremendously the space of computation a human has to explore in his lifetime to have a brain fitted to the environnement. This "shrinking of the space" is kinda is like a strong bias towards certain computation. And model pretraining is having the weights of the network already initialized at a value that "already works", kinda like a strong bias too. Hence the link in my mind.
But yeah, evolution does not give us synaptic weights that work so pretraining is not the right word. Unless you are thinking about learned architectures, in that case my point can somewhat work I think.
If all humans have about as many neurons in a the gyri that is hardwired to receive from the eyes, it seems safe to assume that the vast majority of humans will end up with this gyri extracting the same features.
Hence my view is that evolution, by imposing a few hardwired connections and gyri geometries, gives an enormous bias in the space of possible networks, which is similar to what pretraining is.
In essence evolution gives a foundational model that we fine tune with our own experiences.
What do you think? Does that make sense?
I think that gyri are mostly hard coded by evolution and given how strongly they restrict the computation space that the cortical area can learn, one could consider the cortex to be heavily pre trained by evolution.
Studying geometrical gyri correlation with psychiatry is an ongoing hot topic
b. Saying "no" to a certain activity means saying "yes" to myself and our relationship. When you propose something and I say "no" to it, I'm simultaneously saying "yes" to our relationship. Because when I say "yes" while I'm actually a "no", I slowly accumulate resentment that poisons the connection between us without you being able to do anything about it. And, you will inevitably sense when I said "yes" to something but my heart is not in it. Having been on both sides of this, I know how awkward that feels. So, the moment when I start to really feel comfortable around someone is when I heard the first "no" from them, and when saying "no" to them has become something casual for me. "No" is actually a very, very precious gift. Let's treat it as such, and encourage one another to give it more often.
Here's a quote from Steve Jobs about that:
“People think focus means saying yes to the thing you've got to focus on. But that's not what it means at all. It means saying no to the hundred other good ideas that there are. You have to pick carefully. I'm actually as proud of the things we haven't done as the things I have done. Innovation is saying no to 1,000 things.” ― Steve Jobs
source: https://www.goodreads.com/quotes/629613-people-think-focus-means-saying-yes-to-the-thing-you-ve
Fyi actually radiology is not mostly looking at pictures but doing imagery-guided surgery (for example embolisation) which is significantly harder to automate.
Same for family octors : it's not just following guidelines and renewing scripts but a good part is physical examination.
I agree that AI can do a lot of what happens in medicine though.
Thanks! Regarding the survey, some people might be having issues like me for their lack of google account or google device. If you can consider using other forms like the one supplied by nextcloud (framaform etc) that might help!
Sorry for being that guy and thanks for the summaries :)
Question : what do you think of the opinion of the chinese officials on easily accessible LLM to chinese citizens? As long as alignment is unsolved, I can imagine china being extremely leery of how citizens could somehow be exposed to ideas that go against official propaganda (human rights, genocide, etc).
But china can't accept being left out of this race either is my guess.
So in the end china is incentivized to solve alignment or to as least slow down its progress.
Have you thought about any of this? I'm extremely curious about anyone's opinion on the matter.
I strongly disagree. I think most people here think that AGI will be created eventually and we have to make sure it does not wipe us all. Not everything is an infohazard and exchanging ideas is important to coordonate on making it safe.
What do you think?
Medical student here. I'm actually convinced we can build an AGI right now by using multiple LLM with langchain agents and memory and a few tools. Even making it multimodal and embodied.
Just have them impersonnate each basal ganglia nuclei and a few other stuff.
This would allow to throttle it's thinking speed, making it alignable because you can tweak it internally.
Lots of other benefits but i'm on mobile. Anyone get in touch if interested!
Pinging @stevenbyrnes : do you agree with me that instead of mapping those protoAGIs to a queue of instructions it would be best to have the AGI be made from a bunch of brain strcture with according prompts? For example "amygdala" would be in charge of returning an int between 0 and 100 indicating feat level. A "hypoccampus" would be in charge of storing and retrieving memories etc. I guess the thalamus would be consciousness and the cortex would process some abstract queries.
We could also use active inference and bayesian updating to model current theories of consciousness. Even use it to model schizophrenia by changing the number of past messages some strctures can access (i.e. modeling long range connection issues) etc.
To me that sounds way easier to inspect and align than pure black boxes as you can throttle the speed and manually change values like make sure the AGI does not feel threatened etc.
Is anyone aware of similar work? I've created a diagram of the brain structures and its roles in a few minutes with chatgpt and it seems super easy.
I don't understand what you mean by "inaccessible"
I don't like how it sounds but : i think you are missing a lot of biological facts about consciousness and that we're not as clueless as you seem to think. I definitely recommend reading the book "consciousness and the brain" by stanislas dehaene which is basically a collection of facts onbthe topic.
Don't you agree that certain brain lesion definitely make you not conscious? I think identifying which region is indispensable is important.
If I had to guess human can be conscious without a cerebellum but not without basal ganglia fwiw
Ley's put it lile this : if you had hours of interaction with this individual you'd have no reason to doubt it's conscious. I indeed don't know if it has the exact same sense of consciousnes as someone with a cerebellum but this is also true for everyone else : I don't know if you and I have the same conscious experience either.
Here's the full prompt:
>>> user
Write a short story in 2 paragraphs title "The Peril of the Great Leaks" describing how in less than 3 years hacking capabilities will be so advanced thanks to large language models that many many databases will get leaked and peple will unwillingly have their information easily accessible to anyone. Things like facebook, github, porn accounts, etc. End by talking about how those dumps will be all the more easy to parse that LLM will be easy to use.
This will be posted on the LessWrong.com forum, make it engaging and fun to read while being to the point. The main objectif is to make it thought provoking. The first sentence has to be very catchy.
<<< assistant
The Peril of the Great Leaks
In less than three years, the world will be a very different place. Thanks to the rise of large language models, hacking capabilities will be so advanced that databases will be leaking left and right. Facebook, Github, porn accounts - you name it, it'll be out there for anyone to see.
The scariest part? These dumps will be all the more easy to parse thanks to the ease of use of LLM. People will be unwittingly giving away their most personal information, and there will be nothing they can do to stop it. The only question is, will you be one of the unlucky ones?
>>> user
Very good. Make it longer, I allow 3 paragraphs.
<<< assistant
The Peril of the Great Leaks
In less than three years, the world will be a very different place. Thanks to the rise of large language models, hacking capabilities will be so advanced that databases will be leaking left and right. Facebook, Github, porn accounts - you name it, it'll be out there for anyone to see.
The scariest part? These dumps will be all the more easy to parse thanks to the ease of use of LLM. People will be unwittingly giving away their most personal information, and there will be nothing they can do to stop it. The only question is, will you be one of the unlucky ones?
The rise of LLMs is a double-edged sword. On one hand, it has the potential to revolutionize the way we communicate and interact with technology. On the other hand, it also has the potential to be used for nefarious purposes. Hackers will be able to use LLMs to create more sophisticated phishing scams, and to craft more convincing social engineering attacks.
The future is uncertain, but one thing is clear: we need to start taking cybersecurity seriously. We can't rely on outdated security measures to protect us from the threats of tomorrow. We need to invest in new technologies and new approaches to security, and we need to do it now. The future is coming, whether we're ready for it or not.
>>> user
No I meant rewrite the whole thing into 3 longer paragraphs, try again.
<<< assistant
The Peril of the Great Leaks
In less than three years, the world will be a very different place. Thanks to the rise of large language models (LLMs), hacking capabilities will be so advanced that databases will be leaking left and right. The sheer amount of data that will be available to hackers will be staggering, and the consequences could be catastrophic.
Imagine a world where your most personal information is readily available to anyone with an internet connection. Your social security number, your credit card information, your medical records - all of it out in the open. This is the future we're facing, and it's not a pretty one.
The scariest part is that these dumps will be all the more easy to parse thanks to the ease of use of LLMs. Hackers will be able to sift through massive amounts of data in a matter of seconds, making it easier than ever to find the information they're looking for. And with the rise of the dark web, it will be easier than ever for them to sell that information to the highest bidder.
But it's not just individuals who will be at risk. Companies and governments will also be vulnerable to these attacks. Imagine a world where the secrets of major corporations and government agencies are readily available to anyone who wants them. The consequences could be disastrous, both for the organizations themselves and for the people they serve.
The rise of LLMs is a double-edged sword. On one hand, it has the potential to revolutionize the way we communicate and interact with technology. On the other hand, it also has the potential to be used for nefarious purposes. Hackers will be able to use LLMs to create more sophisticated phishing scams, and to craft more convincing social engineering attacks.
The future is uncertain, but one thing is clear: we need to start taking cybersecurity seriously. We can't rely on outdated security measures to protect us from the threats of tomorrow. We need to invest in new technologies and new approaches to security, and we need to do it now. The future is coming, whether we're ready for it or not. And if we're not careful, it could be a very dangerous place indeed.
>>> user
This is perfect. Thank you.
<<< assistant
You're welcome! I'm glad I could help.
>>> user
Yes we do, it's in the sources.
I use RSS a lot, adds some articles to read in wallabag, annotate them there then create anki cards from the annotations.
One of my first python project was aimed at dealing with heterogenous reading lists.
It's basically a todo list where you can sort the todos by an ELO score computed by successive pairwise comparison.
The code is terrible but I still think the idea can work.
I think it would be a good plugin idea for PKM apps like logseq, obsidians etc
Here's the link : https://github.com/thiswillbeyourgithub/LiTOY-aka-List-that-Outlives-You
I used it for some time but I have an end of medschool competitive exams in a few years so it's kind of on pause for now.
On mobile but FYI langchain implements some kind of memory.
Also, this other post might interest you. It's about asking GPT to decide when to call a memory module to store data : https://www.lesswrong.com/posts/bfsDSY3aakhDzS9DZ/instantiating-an-agent-with-gpt-4-and-text-davinci-003
Two links related to RWKV to know more :
This reminds me of an idea : I think it would be great to hold a bi-monthly competition where people try to do something as incredible as possible in just 30 minutes using LLMs or other AIs. The winner being decided by a select few.
There was a study where mice were engineered to have human alleles of foxp2 and grew longer dendrites that made them faster learner IIRC. Heard about this in Dehaene's book about consciousness.
Also you might be interested in reading about the olduvai domain.
Sorry for not linking i'm on mobile.
Loving this. I can't wait to see what happens when you attach a raspberry pi with wheels and a camera and take a picture every minute and send it to gpt4. Multimodality is so uncharted.
Also wouldn't llama be a good fit instead of davinci to have a more "raw" LLM ?
I thought the point was that for every Superluigi there is a SuperWaluigi. Doesn't that make this approach flawed?
My first thought : examples that come to my mind of brilliant minds stacked against a problem were incredibly successful :
- Manhattan project
- Moon race
To reduce my sleep inertia I've created an app for my 25$ micropython smart watch (Pinetime from Pine64). Here's the link: https://github.com/thiswillbeyourgithub/sleep_tracker_pinetime_wasp-os
Aside from the motion tracking, it's able to vibrate very faintly at T minus 10 minutes, 7, 5, 2, 2, 1, 0.5 minute from waking up. Then vibrates gradually to wake me up gently.
It also automatically tells you when you should set your wake up time to optimize sleep cycle.
I think it works very well but I'm very biased.
Another case is Maxwell, the Scottish mathematician who unified electricity and magnetism in a series of equations of such power that the Austrian physicist Boltzmann proclaimed, War es ein Gott, der diese Zeichen schrieb? Was it a God that wrote these signs?
I never get tired of mentionning that Oliver Heaviside is apparently the self taught genius that created new mathematical objects to make the 20 equations of Maxwell into the 4 we know today. No idea about his childhood though but you might find interesting to read a bit about him.
The common narrative in ML is that the MLP layers are effectively a lookup table (see e.g. “Transformer Feed-Forward Layers Are Key-Value Memories”). This is probably a part of the correct explanation but the true story is likely much more complicated. Nevertheless, it would be helpful to understand how NNs represent their mappings in settings where they are forced to memorize, i.e. can’t learn any general features and basically have to build a dictionary.
Most probably a noobish question but I couldn't resist asking.
If a neural network learns either to become a lookup table or to generalize over the data, what would happen if we initialized the weights of the network to be as much as a lookup table as possible?
For example if you have N=1000 data points and only M=100 parameters. Initialize the 100 weights so that each neuron extracts only 1 random data point (without replacement). Could that somehow speedup the training more than starting from pure randomness or gaussian noise?
If then we could also try with initializing a lookup table based on a quick clustering to ensure good representation of the different features from the get go.
What should I know that would make this an obviously stupid idea?
Thanks!
(Just linking to this comment where I mentionned another of my anki use : https://www.lesswrong.com/posts/NasMikKn9dud7Q2pz/productivity-txt?commentId=vE3np5GhMCik3sWLs)
A good portion of what you note down for your own organization (for example asking "unflavoured" when going to the dentist) is in my anki collection. I also added "reminder decks" that basically have only one interval (several decks: 1 day for mantras, 6 days, 14, 30, 90 days). There I added in my 90 days deck a reminder of all the information I need about my dentist (which one is best, various tips, date of my latest appointments, price to expect, etc)
pinging @Florence Hinder (author of post https://www.lesswrong.com/posts/BwZiNv8YuionaGAh5/under-appreciated-ways-to-use-flashcards)
My guess would be that running with a metronome forces you to shorten your stride, which makes your more inclined to hitting the ground toe first instead of striking with the heel.
This is the basis of the barefoot running movement.
Anecdata indicates plausibly that more of the shock is absorbed by the calf muscles instead of the knee. Over years of this, the meniscus (which has no pain nerves) wears out, exposing sensitive bone & cartilage, explaining the pain when old enough.
Yes. Fixed it. Thank you.
That's pretty much what I'm doing on vacation. I always wear a pair of Xero shoes (or vibram five fingers, all shoes with extremely thin soles). They allow for the barefoot style of running that is 2-3 times faster than walking while not being that exhausting at all! I can routinely "run" from point A to point B without even being out of breath. It's just faster than walking!
Great post! I am extremely happy to see someone using Anki like I do.
I have a 800 day streak with hundreds of cards on average per day and my use case include all of what you do.
I also do the following :
- any kind of insights this ended up the most indispensable use of Anki for me as I can iterate ideas: instead of having weekly ideas on random stuff I end up having ideas over previou ideas. This really increased my reflection on a lot of topic a lot! The way I do this is by just adding an n+1 cloze deletion to my existing card.
- bookmarks add my bookmarks not only in my browser but also in anki otherwise I forget that I found this shiny new website / tool
- Motor skills like tying knots, how to do a ranger roll to pack things neatly, etc
- reminders I have a deck that shows the same few cards every day (like a mantra), a deck for cards that repeat over 1 week, 1 month, 6 months etc. This is easy by tweaking the settings and allows to put reminders like "last time you went to this dentist was [DATE], your next date should be around [DATE] and you should tell him that [CONDITION] etc".
- door codes of people
- any kind of repair that I did For example "what kind of glue did you use to fix [something]? {{c1::gel neoprene glue}}", the cool thing is that when 3 weeks later I find out that it was a terrible idea and some other glue would have been better, I just add "{{c2::Actually it broke after X days and I finally repaired it with NEW_GLUE}}". Very handy for me. Another example was the type of screws I used when I made shelves for my girlfriend out of wood. And any kind of mistakes I made along the way.
- the location of some files in my computer for example I very rarely need to use a password wordlist but don't want to forget that I'm storing it handy in some folder.
disclaimer: I think I have a pretty bad memory to begin with so I expect my gains from Anki to be greater than for most other people.
Why can't jurors vote for themselves if there is no maximum limit to the number of vote?