Posts
Comments
A chat log is not a simulation because it uses English for all state updates. It’s a story. In a story you’re allowed to add plot twists that wouldn’t have any counterpart in anything we’d consider a simulation (like a video game), and the chatbot may go along with it. There are no rules. It’s Calvinball.
For example, you could redefine the past of the character you’re talking to, by talking about something you did together before. That’s not a valid move in most games.
There are still mysteries about how a language model chooses its next token at inference time, but however it does it, the only thing that matters for the story is which token it ultimately chooses.
Also, the “shoggoth” doesn’t even exist most of the time. There’s nothing running at OpenAI from the time it’s done outputting a response until you press the submit button.
If you think about it, that’s pretty weird. We think of ourselves as chatting with something but there’s nothing there when we type our next message. The fictional character’s words are all there is of them.
Interesting work! Could this be fixed in training by giving it practice at repeating each token when asked?
Another thing I’ve wondered is how substring operations can work for tokenized text. For example, if you ask for the first letter of a string, it will often get it right. How does that happen, and are there tokens where it doesn’t work?
I think this is a question about markets, like whether people are more likely to buy healthy versus unhealthy food. Clearly, unhealthy food has an enormous market, but healthy food is doing pretty well too.
Porn is common and it seems closer to unhealthy food. Therapy isn’t so common, but that’s partly because it’s expensive, and it’s not like being a therapist is a rare profession.
Are there healthy versus unhealthy social networks? Clearly, some are more unhealthy than others. I suspect it’s in some ways easier to build a business around mostly-healthy chatbots than to create a mostly-healthy social network, since you don’t need as big an audience to get started?
At least on the surface, alignment seems easier for a single-user, limited-intelligence chatbot than for a large social network, because are people are quite creative and rebellious. Short term, the biggest risk for a chatbot is probably the user corrupting it. (As we are seeing with people trying to break chatbots.)
Another market question: how intelligent would people want their chatbot to be? Sure, if you’re asking for advice, maybe more intelligence is better, but for companionship? Hard to say. Consider pets.
There's an assumption that the text that language models are trained on can be coherently integrated somehow. But the input is a babel of unreliable and contradictory opinions. Training to convincingly imitate any of a bunch of opinions, many of which are false, may not result in a coherent model of the world, but rather a model of a lot of nonsense on the Internet.
I'm wondering who, if anyone, keeps track of throughput at a port? Ideally there would be some kind of graph of containers shipped per day and we could see long-term shipping trends.
(This is making a bad assumption that containers are fungible, but we would at least have a rough idea of how bad the problem is.)
Could you say anything more specific or concrete about how reading HPMOR changed your life?
While improvements to moderation are welcome, I suspect it’s even more important to have a common, well-understood goal for the large group of strangers to organize around. For example, Wikipedia did well because the strangers who gathered there already knew what an encyclopedia was.
Tag curation seems a bit like a solution in search of a problem. If we knew what the tags were for, maybe we would be more likely to adopt a tag and try to make a complete collection of things associated with that tag?
Maybe tags (collections of useful articles with something in common) should be created by the researchers who need them? They can be bootstrapped with search. Compare with playlists on YouTube and Spotify.
It seems like a genuinely collaborative project, where articles are intended to be useful and somewhat more evergreen, would probably end up looking something like Wikipedia or perhaps an open source project.
There needs to be some concept of shared goals, a sense of organization and incompleteness, of at least a rough plan with obvious gaps to be filled in. Furthermore, attempts to fill the gaps need to be welcomed.
Wikipedia had the great advantage of previous examples to follow. People already knew what an encyclopedia was supposed to be.
I suspect that attempts at a “better discussion board” are too generic to inspire anyone. Someone needs to come up with a more specific and more appealing idea of what the thing will look like when it’s built out enough to actually be useful. How will you read it? what would you learn?
I’ve played around with Anki a bit, but never used it seriously because I was never sure what I wanted to memorize, versus look up when needed.
I wonder if it might be better to look at it a different way, using a note-taking tool to leverage forgetting rather than remembering? That is, you could use it to take notes and start reviewing cards more seriously when you’re going to take a test. Afterwards, you might slack off and forget things, but you still have your notes.
After all, we write things down so we don’t have to remember them.
Such a tool would be unopinionated about remembering things. You could start out taking notes, optimize some of them for memorization, take more notes, and so on. The important thing is persistence. Is this really a note-taking system you’ll keep using?
Teaching people to use such a tool would fall under “learning how to learn.” Ideally you would want them to take their own notes, see how useful it is for studying for a test, and get in the habit of using them for other classes. If not, at least they would know that such tools exist.
Back when I was in school, I remember that there was a teacher that had us keep a journal, probably for similar reasons. Maybe that got some people to start keeping a diary, who knows? For myself, I got in the habit of taking notes in class, but I found that I rarely went back to them; it was write-only. I kept doing it because I thought taking the notes helped a bit to remember the material, though.
You talked about rest but have you looked into stretches, putting your wrists in hot and cold water in tubs, ice packs, and so on? I had a different problem (tendonitis) and these helped.
This isn't my area of expertise, but I found this quote in an article about anticipating climate change in the Netherlands to be food for thought:
If we turn the Netherlands into a fort, we will need to build gigantic dikes, but also, and perhaps more importantly, gigantic pumping stations. This is essential, because at some point we will need to pump all of the water from the Rhine, Meuse, Scheldt and Ems – which by that time will be lower than sea level – over those enormous dikes. The energy costs will be higher – but that is not the only problem, because when the enormous pumping stations pump out the fresh water, the heavier salt water will seep in under the ground. You can get rid of the water, but not the salt, which is disastrous for agriculture in its current form. Instead of a fort, it may make more sense to talk about a semi-porous bath tub.
From: https://www.vn.nl/rising-sea-levels-netherlands/
The South Bay infill wouldn't be the same - much smaller, creeks instead of rivers (though flooding is still a concern), and probably no agriculture. But I wonder what other engineering problems are swept under the rug by assuming that "modern engineering is well up to the task?" Thinking about such questions from a very high-level view often misses important details.
This is just spitballing, but it seems like it would be prudent to build up the new land higher than the new anticipated sea level. And the very expensive land around the infill might actually end up downhill, below sea level. Which might make drainage interesting.
Here's a nice introduction to causal inference in a machine learning context:
ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus
Here's an earlier paper by Judea Pearl:
Bayesianism and Causality, or, Why I am Only a Half-Bayesian
Hmm. I don't know anything about Galleani, but wanting to inspire the masses to action via "propaganda of the deed" seems incompatible with directly terrorizing the masses? (Excuses about "collateral damage" aside.)
It seems like this might have something to do with tribalism: who do the terrorists consider "us" versus "them"?
I'm not sure this will help in your case, but the usual framework for using causality for calculations seems to be that you have a DAG respresenting the causal connections between variables (without probabilities) and statistical data. From this, some things can be calculated that couldn't be inferred with statistical data alone.
The cause graph can't usually be inferred from the data. However, some statistical tests could disprove the cause graph. For example, the cause graph might imply that certain statistical variables are independent.
Surveys are really hard to design correctly.
Remember, these were true/false questions, so 50% means no knowledge at all.
This isn't apparent from the data. A score of 50% could mean that nobody knows the answer and everyone is guessing randomly. Or it could mean that 50% of survey-takers know the right answer and 50% mistakenly believe the wrong answer. Or something in between. Without more information, we can't distinguish which is which.
I'd also argue that three of the questions were ambiguous or uncertain:
- Does the big bang really count as an explosion? It's not much like other explosions.
- Are clones really genetically identical? After all, there was a recent study showing [1] that neurons are usually not genetically alike, due to mutations. Organisms are apparently not even genetically identical to themselves.
- There are edge cases for gender.
Part of test-taking ability seems to be selectively ignoring ambiguity if you think the people who designed the test weren't testing for that edge case.
[1] http://science.sciencemag.org/content/356/6336/eaal1641