Posts

Comments

Comment by Kristin Lindquist (kristin-lindquist) on Has anyone actually changed their mind regarding Sleeping Beauty problem? · 2024-02-18T19:00:16.881Z · LW · GW

I have weak intuitions for these problems, and in net they make me feel like my brain doesn't work very well. With that to disclaim my taste, FWIW I think your posts are some of the most interesting content on modern day LW. 

It'd be fun to hear you debate anthropic reasoning with Robin Hanson esp. since you invoke grabby aliens. Maybe you could invite yourself on to Robin & Agnes' podcast.

Comment by Kristin Lindquist (kristin-lindquist) on Goals selected from learned knowledge: an alternative to RL alignment · 2024-01-28T01:26:54.413Z · LW · GW

If System X is of sufficient complexity / high dimensionality, it's fair to say that there are many possible dimensional reductions, right? And not just globally better or worse options; instead, reductions that are more or less useful for a given context.

However, a shoggoth's theory-of-human-mind context would probably be a lot of like our context, so it'd make sense that the representations would be similar.

Comment by Kristin Lindquist (kristin-lindquist) on Goals selected from learned knowledge: an alternative to RL alignment · 2024-01-27T21:02:39.543Z · LW · GW

That's interesting re: LLMs as having "conceptual interpretability" by their very nature. I guess that makes sense, since some degree of conceptual interpretability naturally emerges given 1) sufficiently large and diverse training set, 2) sparsity constraints. LLMs are both - definitely #1, and #2 given regularization and some practical upper bounds on total number of parameters. And then there is your point - that LLMs are literally trained to create output we can interpret.

I wonder about representations formed by a shoggoth. For the most efficient prediction of what humans want to see, the shoggoth would seemingly form representations very similar to ours. Or would it? Would its representations be more constrained by and therefore shaped by its theory of human mind, or by its own affordances model? Like, would its weird alien worldview percolate into its theory-of-human-mind representations? Or would its alienness not be weird-theory-of-human-mind so much as everything else going on in shoggoth mind?

More generically, say there's System X with at least moderate complexity. One generally intelligent creature learns to predict System X with N% accuracy, but from context A (which includes its purpose for learning System X / goals for it). Another generally intelligent creature learns how to predict System X with N% accuracy but from a very different context B (it has very different goals and a different background). To what degree would we expect their representations to be be similar / interpretable to one another? How does that change given the complexity of the system, the value of N, etc?

Anyway, I really just came here to drop this paper - https://arxiv.org/pdf/2311.13110.pdf - re: @Sodium's wondering "some suitable loss function that incentivizes the model to represent its learned concepts in an easily readable form." I'm curious about the same question, more from the applied standpoint of how to get a model to learn "good" representations faster. I haven't played with it yet tho. 

Comment by Kristin Lindquist (kristin-lindquist) on After Alignment — Dialogue between RogerDearnaley and Seth Herd · 2024-01-10T17:23:57.627Z · LW · GW

I've thought about this and your sequences a bit; it's a fascinating to consider given its 1000 or 10000 year monk nature.

A few thoughts that I forward humbly, since I have incomplete knowledge of alignment and only read 2-3 articles in your sequence:

  • I appreciate your eschewing of idealism (as in, not letting "morally faultless" be the enemy of "morally optimized"), and relatedly, found some of your conclusions disturbing. But that's to be expected, I think!
  • While "one vote per original human" makes sense given your arguments, its moral imperfection makes me wonder - how to minimize that which requires a vote? Specifically, how to minimize the likelihood that blocks of conscious creatures suffer as a result of votes in which they could not participate? As in, how can this system be more federation than democratic? Are there societal primitives that can maximize autonomy of conscious creatures, regardless of voting status?
  • I object, though perhaps ignorantly, to the idea that a fully aligned ASI would not consider itself as having moral weight. How confident are you that this is necessary? Is it a When is Goodhart catastrophic analogous argument - that the bit of unalignment arising from an ASI considering itself as a moral entity, amplified due to its superintelligence, maximally diverges from human interest? If so, I don't necessarily agree. An aligned ASI isn't a paperclip maximizer. It could presumably have its own agenda provided it doesn't and wouldn't, interfere with humanity's... or if it imposed only a modicum of restraint on the part of humanity (e.g. just because we can upload ourselves a million times doesn't mean that is a wise allocation of compute).
  • Going back to my first point, I appreciate you (just like others on LW) going far beyond the bounds of intuition. However, our intuitions act as imperfect but persistent moral mooring. I was thinking last night that given the x-risk of it all, I don't fault Yud et al. for some totalitarian thinking. However, that is itself an infohazard. We should not get comfortable with ideas like totalitarianism, enslavement of possibly conscious entities and restricted suffrage... because we shouldn't overestimate our own rationality nor that of our community and thus believe we can handle normalizing concepts that our moral intuitions scream about for good reason. But 1) this comment isn't specific to your work, of course, 2) I don't know what to do about it, and 3) I'm sure this point has already been made eloquently and extensively elsewhere on LW somewhere. It is more that I found myself contemplating these ideas with a certain nihilism, and had to remind myself of the immense moral weight of these ideas in action.  
Comment by Kristin Lindquist (kristin-lindquist) on How do you feel about LessWrong these days? [Open feedback thread] · 2024-01-04T14:29:34.135Z · LW · GW

LW, along with Astral Codex Ten, are the best places on the internet. Lately LW tops the charts for me, perhaps because I've made it through Scott's canon but not LW's. As a result, my experience on LW is more about the content than the meta and community. Just coming here, I don't stumble across much evidence of conflict within this community - I only learned about it after friending various rationalists on FB such as Duncan (which btw I really like having rationalists in my FB feed, which does give me a sense of community and belongingness... perhaps there is something to having multiple forums). 

On the slight negative side, I have long believed LW to be an AI doom echo chamber. This is partly due to my fibrotic intuitions, persisting despite reading Superintelligence and having friends in AI safety research, and only breaking free after ChatGPT. But part of it I still believe is true. The reasons include hero worship (as mentioned already on this thread), the community's epistemic landscape (as in, it is harder and riskier to defend a position of low vs high p(doom)), and perhaps even some hegemony of language.

In terms of the app: it is nice. From my own experiences building apps with social components, I would have never guessed that a separate "karma vote" and "agreement vote" would work. Only on LW!

Comment by Kristin Lindquist (kristin-lindquist) on Apologizing is a Core Rationalist Skill · 2024-01-04T01:01:49.161Z · LW · GW

+1

I internalized the value to apologize proactively, sincerely, specifically and without any "but". While I recommend it from a virtue ethics perspective, I'd urge starry-eyed green rationalists to be cautious. Here are some potential pitfalls:

- People may be confused by this type of apology and conclude that you are neurotic or insincere. Both can signal low status if you lack unambiguous status markers or aren't otherwise effectively conveying high status.
- If someone is an adversary (whether or not you know it), apologies can be weaponized. As a conscientious but sometimes off-putting aspie, I try to apologize for my frustration-inducing behaviors such as being intense, overly persistent and inappropriately blunt - no matter the suboptimal behavior of the other person(s) involved. In short, apology is an act of cooperation and people around you might be inclined to defect, so you must be careful.

I've been too naive on this front, possibly because some of the content I've found most inspirational comes from high status people (the Dalai Lama, Sam Harris, etc) and different rules apply (i.e. great apologies as counter-signaling). It's still really good to develop these virtues; in this case, to learn how to be self-aware, accountable and courageously apologetic. But in some cases, it might be best to just write it in a journal rather than sharing it to your disadvantage.

Comment by Kristin Lindquist (kristin-lindquist) on ACX/SSC Boulder meetup- September 23 · 2023-08-26T15:20:42.528Z · LW · GW

I'll be attending, probably with a +1.

Comment by Kristin Lindquist (kristin-lindquist) on All AGI safety questions welcome (especially basic ones) [July 2022] · 2022-07-17T22:28:44.093Z · LW · GW

Not an answer but a related question: is habituation perhaps a fundamental dynamic in an intelligent mind? Or did the various mediators of human mind habituation (e.g. downregulation of dopamine receptors) arise from evolutionary pressures?

Comment by Kristin Lindquist (kristin-lindquist) on What cognitive biases feel like from the inside · 2021-12-02T19:35:59.981Z · LW · GW

I'm reading this for the first time today. It'd be great if more biases were covered this way. The "illusion of transparency" one is eerily close to what I've thought so many times. Relatedly, sometimes I do succeed at communicating, but people don't signal that they understand (or not in a way I recognize). Thus sometimes I only realize I've been understood after someone (politely) asks that I stop repeating myself, mirroring back to me what I had communicated. This is a little embarrassing, but also a relief - once I know I've been understood, I can finally let go.

Comment by Kristin Lindquist (kristin-lindquist) on Frame Control · 2021-11-30T23:54:56.969Z · LW · GW

I think kindness is a good rule for rationalists, because unkindness is rhetorically OP yet so easily rationalized ("i'm just telling it like it is, y'all" while benefitting – again, rhetorically – from playing the offensive).

Your implication that Aella is not speaking, writing or behaving sanely is, frankly, hard to fathom. You may disagree with her; you may consider her ideas and perspectives incomplete; but to say she has not met the standards of sanity?

She speaks about an incredibly painful and personal issue with remarkable sanity and analytical distance. Does that mean she's objective? No. But she's a solid rationalist, and this post is appropriately representative.

But see, here we are trading subjective takes. You imply this post is insane. I say that it is impressively sane. Are we shouldering the burden of standards for speaking, writing and behaving sanely?

In other words, you've set quite a high bar there, friend, and conveniently it is to your rhetorical advantage. Is this all about being rational or achieving rhetorical wins?

--

Wrt "burn it with fire" - she goes on to say that she can't have frame controllers in her life, not that she plans on committing arson. Her meaning was clear to me. If I detect that someone is attempting coercive control on me (my preferred phrasing), I block them on all channels. This has happened 2x in the last 5 years, since I escaped the abuser. I cut them out of my life with a sort of regret; not because I think they're bad, but because I've determined that continued interaction puts me at risk. This is my personal nuclear option too (like Aella) because I'm not one to block people nor consider them irredeemable.

Perhaps you could re-read that part of her post with principle o' charity / steel manning glasses on.

While I'm at it, your other criticism about normal or praiseworthy traits: she explicitly says "Keep in mind these are not the same thing as frame control itself, they’re just red flags." A red flag doesn't mean "a bad behavior" but rather means a warning sign. As is said elsewhere in the comment section (perhaps by you), some of those red flags might be exhibited by Aspie types or those who have successfully overcome some unhelpful social norms. As a different example, I have a friend who talks quickly, genuinely wants to help out even if there is nothing in it for him, and is polymathic - his rapidly covering lots of intellectual ground and wanting to help me out set off my "bullshitter" red flags. But that isn't the case. He's a good guy. And given that, the aforementioned traits are awesome. Red flags are signals and not necessarily bad behaviors.

Comment by Kristin Lindquist (kristin-lindquist) on Frame Control · 2021-11-30T22:18:20.418Z · LW · GW

"Honestly, this is a terrible post. It describes a made-up concept that, as far as I can tell, does not actually map to any real phenomenon [...]" - if I am not mistaken, LessWrong contains many posts on "made-up concepts" - often newly minted concepts of interest to the pursuit of rationality. Don't the rationalist all-stars like Scott Alexander and Yudkowsky do this often?

As a rationalist type who has also experienced abuse, I value Aella's attempt to characterize the phenomenon.

Years of abuse actually drove my interest in rationality and epistemology. My abuser's frame-controlling (or whatever it should be called) drove me to desperately seek undeniable truths (e.g. "dragging one's partner around by the hair while calling them a stupid crazy bitch is objectively wrong"). My partner hacked our two-person consensus reality so thoroughly that this "truth" was dangerous speculation on my part, and he'd punish me for asserting it.

I think abuse is a form of epistemic hacking. Part of the 'hack' is detection avoidance, which can include use of / threat of force (such as "I will punish you if you say 'abuse' one more time"), he-said-she-said ("you accuse me of abuse, but i'll accuse you right back"), psychological jabs that are 100% clear given context but plausibly denied outside that context, and stupid but effective shit like "this isn't abuse because you deserve it." In my experience, detection avoidance is such a systemic part of abuse that it is almost as if it could all be explained as a gnarly mess of instrumental goals gone wild.

My point is, abuse defies description. It is designed (or rather, honed) to defy description.

I don't know if you'll find this persuasive in the slightest. But if you do, even a tiny bit, maybe you could chill out on the "this is a terrible post" commentary. To invoke SCC (though I know those aren't the rules here), that comment isn't true, kind OR necessary. 

 

Comment by Kristin Lindquist (kristin-lindquist) on How do you assess the quality / reliability of a scientific study? · 2019-11-02T20:23:40.409Z · LW · GW

Already many good answers, but I want to reinforce some and add others.

1. Beware of multiplicity - does the experiment include a large number of hypotheses, explicitly or implicitly? Implicit hypotheses include "Does the intervention have an effect on subjects with attributes A, B or C?" (subgroups) and "Does the intervention have an effect that is shown by measuring X, Y or Z?" (multiple endpoints). If multiple hypotheses were tested, were the results for each diligently reported? Note that multiplicity can be sneaky and you're often looking for what was left unsaid, such as a lack of plausible mechanism for the reported effect.

For example, take the experimental result "Male subjects who regularly consume Vitamin B in a non-multi-vitamin form have a greater risk of developing lung cancer (irrespective of dose)." Did they *intentionally* hypothesize that vitamin B would increase the likelihood of cancer, but only if 1) it was not consumed as part of a multi vitamin and 2) in a manner that was not dose-dependent? Unlikely! The real conclusion of this study should have been "Vitamin B consumption does not appear correlated to lung cancer risk. Some specific subgroups did appear to have a heightened risk, but this may be statistical anomaly."

2. Beware of small effect sizes and look for clinical significance - does the reported effect sound like something that matters? Consider the endpoint (e.g. change in symptoms of depression, as measured by the Hamilton Depression Rating Scale) and the effect size (e.g. d = 0.3, which is generally interpreted as a small effect). As a depressive person, I don't really care about a drug that has a small effect size.* I don't care if the effect is real but small or not real at all, because I'm not going to bother with that intervention. The "should I care" question cuts through a lot of the bullshit, binary thinking and the difficulty in interpreting small effect sizes (given their noisiness).

3. Beware of large effect sizes - lots of underpowered studies + publication bias = lots of inflated effect sizes reported. Andrew Gelman's "Type M" (magnitude) errors are a good way to look at this - an estimate of the how inflated the effect size is likely to be. However, this isn't too helpful unless you're ready to bust out R when reading research. Alternately, a good rule of thumb is to be skeptical of 1) large effect sizes reported from small N studies and 2) confidence intervals wide enough to drive a trunk through.

4. Beware of low prior odds - is this finding in a highly exploratory field of research, and itself rather extraordinary? IMO this is an under-considered conclusion of Ioannidis' famous "Why Most Published Research Findings are False" paper. This Shinyapp nicely illustrates "positive predictive value" (PPV), which takes into account bias & prior odds.

5. Consider study design - obviously look for placebo control, randomization, blinding etc. But also look for repeated measures designs, e.g. "crossover" designs. Crossover designs achieve far higher power with fewer participants. If you're eyeballing study power, keep this in mind.

6. Avoid inconsistent skepticism - for one, don't be too skeptical of research just because of its funding source. All researchers are biased. It's small potatoes $$-wise compared to a Pfizer, but postdoc Bob's career/identity is on the line if he doesn't publish. Pfizer may have $3 billion on the line for their Phase III clinical trial, but if Bob can't make a name for himself, he's lost a decade of his life and his career prospects. Then take Professor Susan who built her career on Effect X being real - what were those last 30 years for, if Effect X was just anomaly?

Instead, look at 1) the quality of the study design, 2) the quality and transparency of the reporting (including COI disclosures, preregistrations, the detail and organization in said preregistrations, etc).

7. Learn to love meta-analysis - Where possible, look at meta-analyses rather than individual studies. But beware: meta-analyses can suffer their own design flaws, leading to some people saying "lies, damn lies and meta-analysis." Cochrane is the gold standard. If they have a meta-analysis for the question at hand, you're in luck. Also, check out the GRADE criteria - a pragmatic framework for evaluating the quality of research used by Cochrane and others.


*unless there is high heterogeneity in the effect amongst a subgroup with whom I share attributes, which is why subgrouping is both hazardous and yet still important.