Posts

LLM Guardrails Should Have Better Customer Service Tuning 2023-05-13T22:54:16.255Z

Comments

Comment by Jiao Bu (jiao-bu) on A Teacher vs. Everyone Else · 2024-03-24T13:00:44.020Z · LW · GW

I think now you're talking more about desired qualities of a system than teachers, which might also be interesting in the other cases.  In some technical sense probably it applies to the farmer, but human use of food is so constant and cyclical, it feels misapplied there.  The doctor may be similar to a farmer in that regard, making money off the nature of humans to occasionally be ill.

However, the lawyer is most like what you are describing above, fully dependent on the system of conflicts for its sustenance, as the Dao De Jing states, "The more laws and ordinances are promulgated, The more thieves and robbers there are."  Hence, perhaps, the general easy animosity towards lawyers.

I wonder if there is a social proportion to a school system having more of factor X and it getting more social animosity.  I suspect it would be the same factor that creates droves of disaffected, burnt-out teachers.  Of course, there is also the illness-industrial-complex system, which most people react badly to, compared to doctors themselves.   What is that factor though?

Comment by Jiao Bu (jiao-bu) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-06T04:43:54.458Z · LW · GW

"Comes from external stumuli" in this case, or more accurately incorporates external information =/= brainwashing into slavery.  To some extent what you're saying is built of correct sentences, but you're keeping things vague enough and unconnected enough to defend.  Above you said, "subset of this scenario is a nightmarish one where humans are brainwashed by their mindless but articulate creations and serve them, kind of like the ancients served the rock idols they created. Enslaved by an LLM, what an irony."

Yes, I have changed my mind based on things I have read and watched.  One should do this based on new information.  As for "happens consistently and feels like your own volition" I think you would need to unpack it a bit.  "Consistently," I don't know.  I'm 44 and an engineer and kind of a jackass, so maybe I don't change my mind as often as I should.  My new partner has a PhD in Nutrition though, so I have changed my mind partly based on studies she has presented (including some of her own research) and input regarding diet in the last several months.

That it "Feels like" "my" "volition" is even more complicated.  I don't know from whence will and volition arise, and they seem stochastic.  I'm not entirely sure what """I""" am or where consciousness is, if the continuity of it is an illusion, or etc.  These questions get really quickly out of what anyone knows for sure.  But having been presented with both the papers and the food, eaten a lot, and noticed improved mood and energy levels, I'm pretty well sold on her approach being sound and the diet being great.

But you jump to service and enslavement?  This is a bit more like someone needs to headbag me and then dump me in the back of their truck and drag me to a hidden site and inject me with LSD for six months or something.  You are jumping scales drastically without discussing concrete anything, really.  It might have emotional salience, but that hardly seems fit for a rationalist board.

Though I welcome discussion of concrete scenarios/possibilities of how you think this might go down.  If those are realistic, this might be more interesting.

Comment by Jiao Bu (jiao-bu) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T18:28:37.768Z · LW · GW

"Cause Panic."

Outside of the typical drudgereport level "AI admits it wants to kill and eat people" type of headline, what do you expect?

My prediction, with medium confidence, is there won't be meaningful panic until people see it directly connected with job loss.  There will be handwringing about deepfakes and politics, but unfortunately that is almost a lost cause since I can already make deepfakes on my own expensive GPU computer from 3 years ago with open source GANs.  Anthropic and others will probably make statements about it (I hear the word "safe" so much said by every tech company in this space, it makes me nervous, like saying "Our boys will be home by Christmas" or something).  But as far as meaningful action?  A large number of people will need to first lose economic security/power.

Comment by Jiao Bu (jiao-bu) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T18:23:24.254Z · LW · GW

"Brainwashing" is pretty vague and likely difficult.  Hypnosis and LSD usually will not get you there, if I'm to believe what is declassified.  It would need to have some way to set up incentives to get people to act, no?  Or at least completely control my environment (and have the ability to administer the LSD and hypnosis?)

Comment by Jiao Bu (jiao-bu) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T15:44:00.558Z · LW · GW

>There is no way for such a collective pretence to get started. (This is the refutation of p-zombies.)

It could have originally had coordination utility for the units, and thus been transmitted in the manner of culture and language.

One test might then be if feral children or dirt digger tribesman asserted their own individual consciousness (though I wonder if a language with "I" built into it could force one to backfill something in the space during on the spot instance that patterns involving the word "I" are used, which also could be happening with the LLMs).

Comment by Jiao Bu (jiao-bu) on Notes on Innocence · 2024-01-30T09:09:01.608Z · LW · GW

They are most definitely two different things, though it is popular to conflate them.  Innocence of Evil does not require naivete, only that you are pure of doing the evil.

And the purity distinction is important.  Otherwise we will fall prey to the delusion that it was our goodness itself which betrayed us or that in order to be pure, we must be fools regarding some part of the Truth.  Though it is popular to think, as you have pointed out in the sexual distinction above, that awareness of consequence necessarily begets heaviness or loss of innocence (as if we cannot now take wiser action and secure our freedom, whereas prior to accurate knowledge, it was only through dumb luck something had not already gone wrong).

As for some psychical scarring occurring due to knowledge of the potential of humans to do harm, yes this is unpleasant, and the knowledge of it may cause some discomfort -- as you have said "cognitohazard."  The question then is what is the nature of this discomfort?  The bulk of it boils down to self-pity that the world is not as one wishes it to be, or that the world contains people who are damaged.  The remainder, what you called "psychic scarring," is usually an accretion of previous unhealed trauma getting triggered (PTSD), or one's self-pity wishing to perpetuate naivete.

We could say that innocence is supreme sobriety, sober enough and seeing enough truth to be absent of evil in the situation, and naivete is drunkenness -- if anything, whatever good it manages is just one's having stumbled blindly into it.  As a simple thought experiment, if sobriety and awareness of truth does not lead to good will and good actions, then our understanding of good will and good actions must be updated; if it is otherwise, then virtue does not exist in any form, and the "effective altruism" aspect of this community is wrongheaded and impossible (naive).

Otherwise, lets get back to the business of being "Innocent as a Dove and as Shrewd as the Serpent."

Comment by Jiao Bu (jiao-bu) on Notes on Innocence · 2024-01-27T19:51:25.947Z · LW · GW

You are mostly describing Naivete.

Innocence is closest to purity, as it describes absence of evil.  It is compatible with guile, to be "As innocent as a dove, and as shrewd as the serpent."  To do so would describe cleverness, even craftiness in service of definite intentions, without any evil in your heart.  A clear example might be deceiving someone doing human trafficking in order to save those being trafficked.  Sometimes a razor's edge to walk, no doubt, but one that broaches not an epsilon of naivete (which could get someone killed in the above example of trafficking).

Can you tease apart those two traits, naivete and innocence?

Comment by Jiao Bu (jiao-bu) on Monthly Roundup #14: January 2024 · 2024-01-24T16:22:04.821Z · LW · GW

I believe the leverage advice is very good, and people may not know how good it is or how broadly it really applies.  Real-estate with 20% down amounts to a 5x leveraged investment (and one which is expensive to maintain).  For about half a century it was a home-run for most people who did it, despite caveats.  Since 2011, the volatility is higher than before, and I am not even confident in that as a hill to die on much more than NVDA.

Accelerated progress also means increased volatility / wider confidence bands, probably on everything.

Comment by Jiao Bu (jiao-bu) on The Mountain Troll · 2024-01-22T19:04:26.009Z · LW · GW

We are also overloading the word "Child" here, which we may need to disambiguate at this point.

What you are saying applies broadly to a 7 year old, and less to a 16 year old.   For the 16 year old, there's no longer 2 possible outcomes "succeed as a Salafi" or "fail as a Salafi."  There is often the very real option to "Make your way towards something else."  And the seeds of that could easily start (probably did!) in the 13 or 14 year old.

It's also neat that humans are kind of wired where the great questioning/rebellion tends to happen more in the 13-to-16-year-old than the 7-year-old.  Thus the common phenomenon where the person graduates high school and church at the same time, or leaves the cult, emigrates, etc.

Comment by Jiao Bu (jiao-bu) on Introduce a Speed Maximum · 2024-01-11T16:10:30.377Z · LW · GW

I think that you are correct, policies that "everyone knows" aren't "real" tend to reduce the degree to which everyone takes other policies seriously.  But I think a lot of the "unreal" policies are in place for reasons of liability, risk management, or other communication tool.  Also, seldom are any policy actually absolute or meant to be absolute.  Just ask your lawyer, nearly everything in life is negotiable.

What's more, speed limit policy is geared towards a complex set of goals, politically decided upon in a risk-managed, engineered way.  Then voted on by a board of people who might somewhat understand the problem.

You know what is almost never discussed explicitly in politics.  "What are our goals here?" and "What tradeoffs do we suspect these different decisions entail?"  Making this explicit, and letting people vote on politicians based on all this lucidity would be great, but reading, say, Thomas Schelling, I think it is also utterly impossible.

So we bluff speed limits (and nearly everything), and negotiate about it later.

Comment by Jiao Bu (jiao-bu) on The Mountain Troll · 2024-01-11T15:59:00.417Z · LW · GW

"The best way to get out of a local maximum that I've found is to incorporate elements of a different, but clearly functional, intellectual tradition."

I agree wholeheartedly with this being a good way (Not sure about "best").  The crux is "clearly functional" and "maxima" -- and as an adult, I can make pretty good judgments about this.  I'm also likely to bake in some biases about this that could be wrong.  And depending on what society you find yourself within, you might do the same.

If I understand you, you are basically asking to jump from one maxima to another, assuming that in doing this search algorithm, you will eventually find a maxima that's better than the one you're in, or get enough information to go back to the previous one.  And we limit our search on "functional."

But what if you have little information or priors available as to what would be functional or not, or even what constitutes a maxima?  There's no information telling a child not to go join a fringe religious group, for example (and I think they often do their recruiting among the very young, for this reason).  

Moreover, if someone (1) without clear criteria for what constitutes a "maxima" or "functional," or (2) who may even wish to explore other models of "functional" because they suspect their current model may be self-limiting, then we get to questioning.

And I think in (2) above, I am defining the positive side of post-modernism, which also exists and contributes to our society.  The most salient criticism of post-modernism is usually that it is anti-heirarchical, yet insisting it is a better approach than those before it, constitutes a performative contradiction.  Also, I think they are sometimes guilty of taking a "noble savage" approach to other cultures or ways of thinking (failure to judge what is functional).

However, if we combine the "questioning" (broad search, willing to approach with depth where it seems useful), with some level of judgement about "functional" (assuming our judgement is sound), then I think it's still a useful approach.

Because what you have presented offers no method I can see for a child without existing priors, or someone educated in a Shalafi school or similar (where judgement of "functional" is artificially curtailed), to find better ways to think.

Comment by Jiao Bu (jiao-bu) on The Mountain Troll · 2023-12-12T16:54:16.544Z · LW · GW

"There are large bodies of highly reliable knowledge in the world,[...]"

The purpose of the questioning is to find out which objects are in that bucket, and which objects are in some other bucket.

If the child accepts what she is told about (A)There are large bodies of highly reliable knowledge in the world, and (B) This is one of them, then you might get many types of crazy.

TH;DT:  The idea of firmly established ideas is unfortunately culturally and sub-culturally bound, at least to an extent.  Which "firmly established truths" are currently being taught in Shalafi schools?  I think the "flat-earthers, Qanon, etc...," could easily destroy the nonsense of their beliefs if they could employ a bit of the questioning.

Maybe what you and I are saying is a strong case of reversible advice?

Comment by Jiao Bu (jiao-bu) on The First Sample Gives the Most Information · 2023-06-26T00:47:35.677Z · LW · GW

Related:  I got two masters degrees, at midlife, after doing other stuff.  I also moved back to the USA during that time and found it useful to learn a lot of little things I never needed to think about in Taiwan, like how to fix a car.  So, having learned a handful of new skills in the past eight years or so, from car repairs to calculus, as a general heuristic I find doing something independently from beginning to end and fixing the problems along the way the first time teaches about 50% of the knowledge.  2-3 times gets to 75%.  3-5 times gets to 90%.  Past the 90% mark, you spend the rest of your life making small improvements in the last 10% of the knowledge.

Basically, you don't need to do that many Taylor series to see the pattern and grok what's going on (and have improved your understanding of polynomial representations of calculus, and start getting intuitions when other approaches are used).  You don't need to switch the motor mounts on that many cars to basically get it (and to have learned frankly a lot about similar types of car work).  Etc.

That first time, maybe the second in some cases, is the biggest lift and the biggest learning.

Comment by Jiao Bu (jiao-bu) on Sexual Abuse attitudes might be infohazardous · 2023-05-29T15:46:34.806Z · LW · GW

I think OP is painting with a broad brush.  However, he probably has a point that social attitudes end up shaping the experience itself.  Similar to the above poster talking about age gaps or miscarriages.

A problem in your objection, as well as any rebuttal to it, is how would we separate social contagion from the data?  It seems that if OP is right, we wouldn't have the data to say he's right or wrong.  If he's wrong, the data wouldn't really show that or not either.  Embedded social attitudes are a matter of the fish not knowing the water in which it swims.

If indeed, that water is so think that OP (as well as several others who have responded) feels it is even taboo to admit their own experience was not traumatizing, then such a deep social fact is also likely to permeate all the data.

Now, in defense of the taboo (like all taboos), sexual molestation is basically such a bad thing in some sense that we don't want to allow any talk that would make this bad thing potentially happen more.  The taboo is like a field around a Schelling fence that is trying to innoculate everyone against walking even within 200m of that fence.  For whatever reason, the taboo also has some utility that should not be dismissed until it is also understood carefully.

In other words, it is taboo specifically because his talking about it risks pushing us deep into nuances that are risky.  In fact, even assuming OPs position in a broad and hard form is fully correct, then it wouldn't undo the damage that people felt from being molested, and talking about it could hurt more.  So, the entire topic is likely to be an infohazard, actually regardless of the truth value of OP's comment.

Comment by Jiao Bu (jiao-bu) on The way AGI wins could look very stupid · 2023-05-16T18:29:38.756Z · LW · GW

As I think more about this, the LLM as a collaborator alone might have a major impact.  Just off the top of my head, a kind of Rube Goldberg attack might be <redacted for info hazard>.  Thinking about it in one's isolated mind, someone might never consider carrying something like that out.  Again, I am trying to model the type of person who carries out a real attack, and I don't estimate that person having above-average levels of self confidence.  I suspect the default is to doubt themselves enough to avoid acting in the same way most people do about their entreprenurial ideas.

However, if they either presented it to an LLM for refinement, or if the LLM suggested it, there could be just enough psychological boost of validity to push them over the edge to trying it.  And after a few successes on the news of either "dumb" or "bizarre" or "innovative" attacks being successful due to "AI telling these people how to do it" then the effect might get even stronger.

To my knowledge, one could have bought an AR-15 since the mid to late 1970s.  My cousin has a Colt from 1981 he bought when he was 19.  Yet people weren't mass shooting each other, even during times when the overall crime/murder rate was higher than it is now.  Some confluence of factors has driven the surge, one of them probably being a strong meme, "Oh, this actually tends to '''work.''"  Basically, a type of social proofing of efficacy.

And I am willing to bet $100 that the media will report big on the first few cases of "Weird Attacks Designed by AI."

It seems obvious to me that the biggest problems in alignment are going to be the humans, both long before the robots, and probably long after.

Comment by Jiao Bu (jiao-bu) on The way AGI wins could look very stupid · 2023-05-12T20:02:25.451Z · LW · GW

Solving for "A viable attack, maximum impact" given an exhaustive list of resources and constraints seems like precisely the sort of thing GPT-4-level AI can solve with aplomb when working hand in hand with a human operator. As the example of shooting a substation, humans could probably solve this in a workshop-style discussion with some Operations Research principles applied, but I assume the type of people wanting to do those things probably don't operate in such functional and organized ways. When they do, it seems to get very bad.

The LLM can easily supply cross-domain knowledge and think within constraints. With a bit of prompting and brainstorming, one could probably come up with a dozen viable attacks in a few hours. So the lone bad actor doesn't have to assemble a group of five or six people who are intelligent, perhaps educated, and also want to do an attack. I suspect the only reason people already aren't prompting for such methods and then setting up automation of them is the existence of guardrails. When truly opensource LLMs get to GPT-4.5 capability and good interfaces to the internet and other software tools (such as phones), we may see a lot of trouble.  Fewer people would have the drive and intellect needed (at least early on) to carry out such an attack, but those few could cause very outsized trouble.

TL;DR:  The "Fun" starts waaaaaaay before we get to AGI.

Comment by Jiao Bu (jiao-bu) on The Puce Tribe · 2021-10-02T15:30:07.245Z · LW · GW

So is the hypothetical Puce just otherwise Blue tribers who tolerate or welcome some amount of forbidden talk, media, ideas?

What would you call an educated leftist who has no objection at all to Alt-right or anti-vaxxers speaking freely on twitter? What about one who is actively bothered when those people get deplatformed or legally interfered with, even if it is something truly repugnant such as neonazis? I have read a few corners of leftist media that express these ideas. Is this Puce, Grey, or something else?

Comment by Jiao Bu (jiao-bu) on Status-Regulating Emotions · 2020-11-27T02:28:14.012Z · LW · GW

In MBTI terms, you may have an Se blindspot.  Se, or "External Sensation" is just what is right in front of you, what you see.  People with high Se tend to be pretty good at status symbols, both reading them and communicating in them (and they also often fall pray to "what you see is all there is" illusions/delusions, as well as "X resembles y enough that x=y, and I'm done with any need for further information.").

Se Blindspot can make people basically fail to grok social status cues at all, and "Your strongpoint is your weakpoint" applies here.