Posts

Inaccessible finely tuned RNG in humans? 2020-10-07T17:04:36.142Z

Comments

Comment by Sandi on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-18T18:20:56.883Z · LW · GW

Yep, that's what I was trying to describe as well. Thanks!

Comment by Sandi on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-17T20:41:50.970Z · LW · GW

We do this by performing standard linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors) which associated with them in the MSP.


I don't understand how we go from this to the fractal. The linear probe gives us a single 2D point for every forward pass of the transformer, correct? How do we get the picture with many points in it? Is it by sampling from the transformer while reading the probe after every token and then putting all the points from that on one graph?

Is this result equivalent to saying "a transformer trained on an HMM's output learns a linear representation of the probability distribution over the HMM's states"?

Comment by Sandi on larger language models may disappoint you [or, an eternally unfinished draft] · 2022-05-03T23:05:30.776Z · LW · GW

Very comprehensive, thank you!

Comment by Sandi on larger language models may disappoint you [or, an eternally unfinished draft] · 2022-05-03T21:47:36.631Z · LW · GW

Epistemic status: I'm not familiar with the technical details of how LMs work, so this is more word association.

You can glide along almost thinking "a human wrote this," but soon enough, you'll hit a point where the model gives away the whole game.  Not just something weird (humans can be weird) but something alien, inherently unfitted to the context, something no one ever would write, even to be weird on purpose.

What if the missing ingredient is a better sampling method, as in this paper? To my eye, the completions they show don't seem hugely better. But I do buy their point that sampling for high probability means you get low information completions.

Comment by Sandi on Quick Thoughts on A.I. Governance · 2022-05-03T21:29:45.226Z · LW · GW

How many of the decision makers in the companies mentioned care about or even understand the control problem? My impression was: not many.

Coordination is hard even when you share the same goals, but we don't have that luxury here.

An OpenAI team is getting ready to train a new model, but they're worried about it's self improvement capabilities getting out of hand. Luckily, they can consult MIRI's 2025 Reflexivity Standards when reviewing their codebase, and get 3rd-party auditing done by The Actually Pretty Good Auditing Group (founded 2023).

Current OpenAI wants to build AGI.[1] Current MIRI could confidently tell them that this is a very bad idea. Sure they could be advised that step 25 of their AGI building plan is dangerous, but so were steps 1 through 24.

MIRI's advice to them won't be "oh implement this safety measure and you're golden" because there's no such safety measure because we won't have solved alignment by then. The advice will be "don't do that", as it is currently, and OpenAI will ignore it, as they do currently. 

  1. ^

    Sure, they could actually mean "build AGI in a few decades when alignment is solved and we're gonna freeze all our current AGI building efforts long before then", but no they don't.

Comment by Sandi on Humans pretending to be robots pretending to be human · 2022-03-29T18:40:52.548Z · LW · GW

TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.

Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it's smarter than it is. Was that interpretation of your post more in the eye of the beholder?

Nested three layers deep is one of two pieces of actual evidence:

InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.

My impression was that InstructGPT was a new/separate model, available as an option in the API along with the base GPT3, that is openly finetuned with human feedback as a way of aligning the base model. That was the whole point of this paper: https://arxiv.org/abs/2203.02155

This is very different from what I saw this post as implying, because OpenAI are open about it, it's different from the main GPT3 and it's not humans providing completions but humans aligning a language model. Hence strong downvote.

(The examples about the completions for select "gotcha" prompts improving over time aren't very compelling evidence for what this post implies. The ones changing in a day are pretty compelling, though - how weird!)

Then I opened the GPT3 API playground for the first time in a few months and realized that my understanding was outdated. Looks like InstructGPT and the old non-finetuned davinci have been merged into text-davinci-002, which is now the default model.

Trying the "Does grape juice taste better if you add sour milk?" prompt many times over keeps giving me the exact same answer that Gary got, even with max temperature. To test where on the spectrum between "aligned using human review" and "repeating human answers" this lies, I tried some variations:

Does grape juice taste better if you add sour cream?

No, grape juice does not taste better if you add sour cream.

Does wine taste better if you add sour cream?

There is no definitive answer to this question as it depends on personal preferences. Some people might find that wine tastes better when paired with sour cream, while others may not notice a difference. Ultimately, it is up to the individual to decide whether or not they think wine tastes better with sour cream.

Should I add sour milk to grape juice in order to improve taste?

It is not necessary to add sour milk to grape juice in order to improve its taste. Grape juice is already a tart and flavorful beverage. However, if you enjoy the taste of sour milk, you may add it to grape juice to create a tart and refreshing drink.

While GPT3 might not literally outsource a portion of the requests to MTurk, I don't think it's unfair to say that some of the completions are straight-up human provided. If corrected completion was added in a way that generalized (e.g. aligning using human feedback like in the paper), then it would have been a different story. But it clearly doesn't. 

So to recap:

  • the curation of InstructGPT is now in the default model
  • human completions are substituted within a day in response to publicized embarrassing completions (I'm alleging this)
  • human completions aren't added such that the model is aligned to give more helpful answers, because very similar prompts still give bad completions

In addition, and more intangibly, I'm noticing that GPT3 is not the model I used to know. The completions vary a lot less between runs. More strikingly, they have this distinct tone. It reads like a NYT expert fact checker or first page Google results for a medical query.

I tried one of my old saved prompts for a specific kind of fiction prompt and the completion was very dry and boring. The old models are still available and it works better there. But I won't speculate further since I don't have enough experience with the new (or the old) GPT3.

Comment by Sandi on We got what's needed for COVID-19 vaccination completely wrong · 2021-02-11T03:25:12.712Z · LW · GW

The Kefauver-Harris Drug Amendments of 1962 coincide with a drop in the rate of life-span increase.

 

I believe that, but I couldn't find a source. Do you remember where you got it from?

Comment by Sandi on Inaccessible finely tuned RNG in humans? · 2020-10-08T08:34:00.928Z · LW · GW

I wonder if, in that case, your brain picks the stopping time, stopping point or "flick" strength using the same RNG source that is used when people just do it by feeling.

What if you tried a 50-50 slider on Aaronson's oracle, if it's not too exhausting to do it many times in a row? Or write down a sequence here and we can do randomness tests on it. Though I did see some tiny studies indicating that people can improve at generating random sequences.

Comment by Sandi on Inaccessible finely tuned RNG in humans? · 2020-10-07T18:52:20.623Z · LW · GW

Hm, could we tell apart yours and Zack's theories by asking a fixed group of people for a sequence of random numbers over a long period of time, with enough delay between each query for them to forget? 

Comment by Sandi on Inaccessible finely tuned RNG in humans? · 2020-10-07T18:44:02.119Z · LW · GW

I seriously doubt the majority of the participants in these casual polls are doing anything like that.

Comment by Sandi on Inaccessible finely tuned RNG in humans? · 2020-10-07T18:42:51.414Z · LW · GW

This occurred to me, but I didn't see how it could work with different ratios. I guess if you have a sample from a variable with a big support (> 100 events) that's uniformly distributed, that would work (e.g. if x is your birth date in days, then x/365 < 20 would work).

It would be interesting to test this with a very large sample where you know a lot of information about the respondents and then trying to predict their choice.

Comment by Sandi on Inaccessible finely tuned RNG in humans? · 2020-10-07T18:31:09.949Z · LW · GW

Well, I'm quite satisfied with that. Thank you!

Comment by Sandi on Rationality for Kids? · 2020-09-17T11:31:30.362Z · LW · GW

Here's an Android game that works like Zendo but has colorful caterpillars, might be great for kids: https://play.google.com/store/apps/details?id=org.gromozeka1980.caterpillar_logic

Comment by Sandi on Open thread, July 31 - August 6, 2017 · 2017-07-31T20:25:23.416Z · LW · GW

What would be the physical/neurological mechanism powering ego depletion, assuming it existed? What stops us from doing hard mental work all the time? Is it even imaginable to, say, study every waking hour for a long period of time, without ever having an evening of youtube videos to relax? I'm not asking what the psychology of willpower is, but rather if there's a neurology of willpower?

And beyond ego depletion, there's a very popular model of willpower where the brain is seen as a battery, used up when hard work is being done and charged when relaxing. I see this as a deceptive intuition pump since it's easy to imagine and yet it doesn't explain much. What is this energy being used up, physically?

Surely it isn't actual physical energy (in terms of calories) since I recall that the energy consumption of the brain isn't significantly increased while studying. In addition, physical energy is abundant nowadays because food is plentiful. If the lack of physical energy was the issue, we could just keep going by eating more sugar.

The reason we can't workout for 12 hours straight is understood, physiologically. Admittedly, I don't understand it very well myself, but I'm sure an expert could provide reasons related to muscles being strained, energy being depleted, and so on. (Perhaps I would understand the mental analogue better if I understood this.) I'm looking for a similar mechanism in the brain.

To better explain what I'm talking about, what kind of answer would be satisfying, I'll give you a couple fake explanations.

  • Hard mental work sees higher electrical activity in the brain. If this is kept up for too long, neurons would get physically damaged due to their sensitivity. To prevent damage, brains evolved a felling of tiredness when the brain is overused.
  • There is a resource (e.g. dopamine) that is literally depleted during tasking brain operation and regenerated when resting.
  • There could also be a higher level explanation. The inspiration for this came from an old text by Yudkowsky. (I didn't seriously look at those explanations as an answer to my problem because of reasons). I won't quote the source since I think that post was supposed to be deleted. This excerpt gives a good intuitive picture:

My energy deficit is the result of a false negative-reinforcement signal, not actual damage to the hardware for willpower; I do have the neurological ability to overcome procrastination by expending mental energy. I don't dare. If you've read the history of my life, you know how badly I've been hurt by my parents asking me to push myself. I'm afraid to push myself. It's a lesson that has been etched into me with acid. And yes, I'm good enough at self-alteration to rip out that part of my personality, disable the fear, but I don't dare do that either. The fear exists for a reason. It's the result of a great deal of extremely unpleasant experience. Would you disable your fear of heights so that you could walk off a cliff? I can alter my behavior patterns by expending willpower - once. Put a gun to my head, and tell me to do or die, and I can do. Once.

Let me speculate on the answer.

1) There is no neurological limitation. The hardware could, theoretically, run demanding operations indefinitely. But, theories like ego depletion are deceptive memes that spread throughout culture, and so we came to accept an nonexistent limitation. Our belief in the myth is so strong, it might as well be true. The same mechanism as learned helplessness. Needless to say, this could potentially be overcome.

2) There is no neurological limitation, but otherwise useful heuristics stop us from kicking it into higher gear. All of the psychological explanations for akrasia, the kind that are discussed all the time here, come into play. For example, youtube videos provide a tiny, but steady and plentiful stimulus to the reward system, unlike programming, which can have a much higher payout, but one that's inconsistent, unreliable and coupled with frustration. And so, due to a faulty decision making procedure, the brain never gets to the point where it works to its fullest potential. The decision making procedure is otherwise fast and correct enough, thus mostly useful, so simply removing it isn't possible. The same mechanism as cognitive biases. It might be similar to how we cannot do arithmetic effortlessly even though the hardware is probably there.

3) There is an in-built neurological limitation because of an evolutionary advantage. Now, defining this evolutionary advantage can lead to the original problem. For example, it cannot be due to minimizing energy consumption, as discussed above. But other explanations don't run into this problem. Laziness can often lead to more efficient solutions, which is beneficial, so we evolved ego depletion to promote it, and now we're stuck with it. Of course, all the pitfalls customary to evolutionary psychology apply, so I won't go in depth about this.

4) There is a neurological limitation deeply related to the way the brain works. Kind of like cars can only go so fast, and it's not good for them if you push them to maximum speed all the time. At first glance, the brain is propagating charge through neurons all the same, regardless of how tiring an action it's accomplishing. But one could imagine non-trivial complexities to how the brain functions which account for this particular limitation. I dare not speculate further since I know so little about neurology.

Comment by Sandi on Open thread, May 8 - May 14, 2017 · 2017-05-10T20:46:23.712Z · LW · GW

What does TapLog lack, besides a reminder feature? It seems pretty nifty from the few screenshots I just saw.

Comment by Sandi on Open thread, May 8 - May 14, 2017 · 2017-05-09T17:28:18.135Z · LW · GW

Yeah, that's why I kept comparing it to a spreadsheet. Ease of use is a big point. I don't want to write SQL queries on my phone.

Comment by Sandi on Open thread, May 8 - May 14, 2017 · 2017-05-09T17:27:25.758Z · LW · GW

Thanks! I didn't know this was such a developed concept already and that there are so many people trying to measure stuff about themselves. Pretty cool. I'll check out Quantified Self and what's linked.

Comment by Sandi on Introducing the Instrumental Rationality Sequence · 2017-05-08T22:06:02.966Z · LW · GW

That is indeed very low weight. My prior is pretty shaky as-is, but that evidence shouldn't move it much.

I thought about priming a lot while reading. Many of the results he lists are similar to priming, but priming being false doesn't mean all results similar to it are false. One could consider a broader hypothesis encompassing all that, namely "humans can be influenced by subtle clues to their subconsciousness to a significant degree". That's the similarity I see with priming, both it and many of Caldini's hypothesis follow from this premise. The priming failure would suggest it's false, but those experiments used extremely subtle subliminal clues, as if they were designed not to work. Much of Caldini's work affirms this broader thesis. It's no metastudy, but the guy lists a lot of studies, all affirming it. A lot of Kahneman's work does, too. Surely it is acceptable that humans often act on instinct (unconsciously) and that they are subconsciously influenced by their surroundings. This follows from System 1 being so prevalent in our thought.

SSC has a new open thread right now, I should ask there. Maybe Scott can clear it up.

Comment by Sandi on Open thread, May 8 - May 14, 2017 · 2017-05-08T19:44:50.392Z · LW · GW

I have a neat idea for a smartphone app, but I would like to know if something similar exists before trying to create it.

It would be used to measure various things in one's life without having to fiddle with spreadsheets. You could create documents of different types, each type measuring something different. Data would be added via simple interfaces that fill in most of the necessary information. Reminders based on time, location and other factors could be set up to prompt for data entry. The gathered data would then be displayed using various graphs and could be exported.

The cool thing is that it would be super simple to reliably measure most things on a phone in a way that's much simpler than keeping a spreadsheet. For example: you want to measure how often you see a seagull. You'd create a frequency-measuring document, entitle it "Seagull sightings", and each time you open it, there'd be a big button for you to press indicating that you just saw a seagull. Pressing the button would automatically record the time and date, perhaps the location, when this happened. Additional fields could be added, like the size of the seagull, which would be prompted and logged with each press. With a spreadsheet, you'd have to enter the date yourself, and the interface isn't nearly as convenient.

Another example: you're curious as to how long you sleep and how you feel in the morning. You'd set up an interval-measuring document with a 1-10 integer field for sleep quality and reminders tied into your alarm app or the time you usually wake up. Each morning you'd enter hours slept and rate how good you feel. After a while you could look at pretty graphs and mine for correlations.

A third example: you can emulate the experience sampling method for yourself. You would have your phone remind you to take the survey at specific times in the day, whereupon you'd be presented with sliders, checkboxes, text fields and other fields of your choosing.

This could be taken further in a useful way by adding a crowd sourcing aspect. Document-templates could be shared in a sort of template marketplace. The data of everyone using a certain template would accumulate in one place, making for a much larger sample size.

Comment by Sandi on Introducing the Instrumental Rationality Sequence · 2017-05-08T19:31:59.832Z · LW · GW

Cialdini? I'm finishing "Influence" right now. I was extra skeptical during reading it since I'm freshly acquainted with the replication crisis, but googling each citation and reading through the paper is way too much work. He supports many of his claims with multiple studies and real-life anecdotes (for all that's worth). Could you point me to the criticism of Cialdini you have read?

Comment by Sandi on Open thread, Apr. 17 - Apr. 23, 2017 · 2017-04-22T03:11:04.598Z · LW · GW

The SSC article about omega-6 surplus causing criminality brought to my attention the physiological aspect of mental health, and health in general. Up until now, I prioritized mind over body. I've been ignoring the whole "eat well" thing because 1) it's hard, 2) I didn't know how important it was and 3) there's a LOT of bullshit literature. But since I want to live a long life and I don't want my stomach screwing with my head, the reasonable thing to do would be to read up. I need book (or any other format, really) recommendations on nutrition 101. Something practical, the do's and don'ts of food and research citations to back it up. On a broader note, I want to learn more about biodeterminism, also from a practical perspective. There might be conditions in my environment causing me issues that I don't even know of. It goes beyond nutrition.

Comment by Sandi on Open thread, March 13 - March 19, 2017 · 2017-03-15T22:28:23.115Z · LW · GW

I have two straight-forward empirical questions for which I was unable to find a definitive answer.

1) Does ego depletion exist? There was a recent meta-study that found a negligible effect, but the result is disputed.

2) Does visualizing the positive outcome of a endeavor help one achieve it? There are many popular articles confirming this, but I've found no studies in either direction. My prediction is no, it doesn't, since the mind would feel like it already reached the goal after visualizing it, so no action would be taken. It has been like this in my personal experience, although inferring from personal experience is incredibly unreliable.

Comment by Sandi on Welcome to Less Wrong! (11th thread, January 2017) (Thread B) · 2017-02-09T21:08:55.234Z · LW · GW

Depending on where you are in your life and education, you could consider enrolling in graduate school.

If I've managed to translate "graduate school" to our educational system correctly, then I currently am in undergraduate school. Our mileages vary by quite a bit, most people I meet aren't of the caliber. Also, it's hard to find out if they are. Socially etiquette prevents me from bringing up the heavy hitting topics except on rare occasions.

I guess I should work on my social skills then cast a bigger net. The larger the sample, the better odds I have of finding someone worthwhile. Needless to say I'm introverted and socialization doesn't come easily, but I'll find a way.

I do this too.

Oh, thank the proverbial God.

Comment by Sandi on Welcome to Less Wrong! (11th thread, January 2017) (Thread B) · 2017-02-08T22:15:13.245Z · LW · GW

I'm not 100% clear as to where the non-ambitious posts should go, so I will write my question here.

Do you know of a practical way of finding intellectual friends, so as to have challenging/interesting conversations more often? Not only is the social aspect of friendship in general invaluable (of course I wouldn't be asking here if that was the sole reason), but I assume talking about the topics I care and think about will force me to flesh them out and keep me closer to Truth, and is a great source of novelty. So, from a purely practical standpoint (although I don't deny other motives), I want to improve this part of my life.

Sporadic discourse with my normal friends often pops up in unsuitable conditions and with underequipped participants. Meeting the right type of person in real life takes a huge sample and social skills. Focused forums, like this one, contain the right type of people and are very useful, but lacking in one-to-one personal and casual conversation (neither method is superior, I'd prefer a mix of both to the current imbalance).

Fun fact about me (or a thinly vailed plea for a diagnosis): Often when I'm bothered by a problem or simply bored, my mind will conjure vivid conversations with one of my friends and have us argue this problem. I never actually aim for it to happen, it's as spontaneous as normal thinking. I have no proof, but I'd say those imaginary conversations are more productive, because my imaginary listeners will disagree or misunderstand me, raising important points or faults in my reasoning. Whereas with normal thinking, I agree with myself the wast majority of time.