Posts

What's The Best Place to Look When You Have A Question About x? 2022-05-25T13:51:14.384Z

Comments

Comment by Jalen Lyle-Holmes on Anxiety vs. Depression · 2024-03-19T02:27:21.099Z · LW · GW

These are really good descriptions! (Going by my own and friends' experience). For me I might just tweak it to put anxiety as the height rather than the gravity. Thank you for writing these up!

Comment by Jalen Lyle-Holmes on The End of Anonymity Online · 2023-02-14T01:55:57.403Z · LW · GW

You also could just use this to disguise your 'style' if you want to say something anonymously going forward (doesn't work for stuff you've already got out there). Just ask an LLM to reword it in a different style before you post, could be a plugin or something, and then it can't be identified as being by you, right?

Comment by Jalen Lyle-Holmes on About probabilities · 2022-11-28T03:16:15.392Z · LW · GW

Yep that's it! Glad my explanation helped.
(Though if we want to be a bit pedantic about it, we'd say that actually a world where 21 heads in a row ever happens is not unlikely (If heaps and heaps of coin tosses happen across the world over time, like in our world), but a world where any particular given sequence of 21 coin flips is all heads is yes very unlikely (before any of them have been flipped)).)

Comment by Jalen Lyle-Holmes on About probabilities · 2022-11-27T09:03:40.522Z · LW · GW

Comment by Jalen Lyle-Holmes on About probabilities · 2022-11-27T08:58:48.352Z · LW · GW

Ah yes this was confusing to me for a while too, glad to be able to help someone else out with it!

The key thing to realise for me, is that the probability of 21 heads in a row changes as you toss each of those 21 coins.

The sequence of 21 heads in a row does indeed have much less than 0.5 chance, to be precise , which is 0.000000476837158. But it only has such a tiny probability before any of those 21 coins have been tossed. However as soon as the first coin is tossed, the probability of those 21 coins all being heads changes. If first coin is tails, the probability of all 21 coins being heads goes down to 0, if first coin is heads the probability of all 21 coins being heads goes up to ${0.5}^{20}$ . Say you by unlikely luck keep tossing heads. Then with each additional heads in a row you toss, the probability of all 21 coins being heads goes steadily up and up, til by the time you've tossed 20 heads in a row, the probability of all 21 being heads is now.... 0.5, i.e. the same as a the probability of a single coin toss being heads! And our apparent contradition is gone :)

The more 'mathematical' way to express this would be: The unconditional probability of tossing 21 heads in a row is ${0.5}^{21}$ , i.e. 0.000000476837158 but the probability of tossing 21 heads in a row conditional on having already tossed 20 heads in a row is $0.5$ .

$P (21 heads) = {0.5}^{21} = 0.000000476837158$

$P (21 heads | 20 heads) = 0.5$

Let me know if any of that is still confusing.

Comment by Jalen Lyle-Holmes on On clinging · 2022-09-09T07:38:56.600Z · LW · GW

Love this way at pointing at this distinction!

Comment by Jalen Lyle-Holmes on What's The Best Place to Look When You Have A Question About x? · 2022-05-30T11:02:08.207Z · LW · GW

Oooo cool I didn't know this github trick!

Comment by Jalen Lyle-Holmes on What's The Best Place to Look When You Have A Question About x? · 2022-05-28T15:43:36.017Z · LW · GW

thanks!

Comment by Jalen Lyle-Holmes on What's The Best Place to Look When You Have A Question About x? · 2022-05-28T15:41:13.995Z · LW · GW

Thank you!

Comment by Jalen Lyle-Holmes on What's The Best Place to Look When You Have A Question About x? · 2022-05-28T15:40:13.792Z · LW · GW

Thanks! What do you mean about the cross referencing?

Comment by Jalen Lyle-Holmes on What's The Best Place to Look When You Have A Question About x? · 2022-05-28T15:38:59.145Z · LW · GW

Thanks!

Comment by Jalen Lyle-Holmes on What's The Best Place to Look When You Have A Question About x? · 2022-05-26T07:55:43.300Z · LW · GW

Good for any particular topic or just in general?

Comment by Jalen Lyle-Holmes on A method of writing content easily with little anxiety · 2022-04-16T03:26:52.239Z · LW · GW

Oooo thanks for this, just used it to write a post for my blog and it was more fun and easier than usual: My Anxiety Keeps Persisting Even Though It Has Been Diagnosed, Rude

Comment by Jalen Lyle-Holmes on Introducing myself: Henry Lieberman, MIT CSAIL, whycantwe.org · 2022-03-04T10:04:11.418Z · LW · GW

Thanks for sharing your ideas. I'm a bit confused about your core claim and would love if you could could clarify (Or refer to the specific part of your writing that addresses these questions): I get the general gist of your claim, that AI alignment depends on whether humans can all have the same values, but I don't know how much 'the same' you mean. You say 'substantially' align, could you give some examples of how aligned you mean? For example, do you mean all humans sharing the same political ideology (libertarian/communist/ etc)? Do you mean that for all non-trivial ethical questions (When is abortion permissable? How much duty do you have to your family vs yourself? How many resources should we devote to making things better on earth vs exploring space?), that you would need to be able to ask any human on earth and say 99% would give you the same answer?

Likewise with the idea of humans needing to compete less and cooperate more. How much less and more? For example, competition between firms is a core part of capitalism, do you think we need to completely eliminate capitalism? Or do you only mean eliminating zero/negative sum competition like war?

Comment by Jalen Lyle-Holmes on Introducing myself: Henry Lieberman, MIT CSAIL, whycantwe.org · 2022-03-04T09:52:40.705Z · LW · GW

Yes great question. Looking at programming in general, there seem to be many obvious counterexamples, where computers have certain capabilities ('features') that humans don't (e.g. doing millions of arithmetic operations extremely fast with zero clumsy errors) and likewise where they have certain problem ('bugs') that we don't (e.g. adversarial examples for image classifiers, which don't trip humans up at all but entire ruin the neural nets classification.)

Comment by Jalen Lyle-Holmes on Introducing myself: Henry Lieberman, MIT CSAIL, whycantwe.org · 2022-03-04T09:46:23.036Z · LW · GW

Yes, if all humans agreed on everything, there would still be significant technical problems to get an AI to align with all the humans. Most of the existing arguments for the difficulty of AI alignment would still hold even if all humans agreed. If you (Henry) think these existing arguments are wrong, could you say something about why you think that, i.e. offer counterarguments?

Comment by Jalen Lyle-Holmes on How would you learn absolute pitch? · 2022-01-30T05:15:55.684Z · LW · GW

Chris Aruffo has done some work on this: http://www.aruffo.com/eartraining/

Comment by Jalen Lyle-Holmes on Streaming Science on Twitch · 2022-01-04T08:46:02.453Z · LW · GW

I love this idea and would watch this stream!

Comment by Jalen Lyle-Holmes on Omicron: My Current Model · 2022-01-04T08:43:00.248Z · LW · GW

These are a couple posts I came up with in a quick search, so not necessarily the best examples:

Covid 9/23: There Is a War

"The FDA, having observed and accepted conclusive evidence that booster shots are highly effective, has rejected allowing people to get those booster shots unless they are over the age of 65, are immunocompromised or high risk, or are willing to lie on a form. The CDC will probably concur. I think we all know what this means. It means war! ..."

Covid 11/18: Paxlovid Remains Illegal

"It seems to continue to be the official position that:
Paxlovid is safe and effective.
Paxlovid has proven this sufficiently that it isn’t ‘ethical’ to continue running a clinical trial on it.
Paxlovid will be approved by the FDA in December.
Until then, Paxlovid must remain illegal.
[...]
Washington Examiner points out the obvious, that the FDA is killing thousands of people by delaying Pfizer’s and Merck’s Covid treatments. It’s good to state simple things simply:
"So, set Merck aside for now and consider Pfizer’s Paxlovid. In the past 30 days, more than 37,000 people died of COVID in the United States, according to the CDC . Over the next 35 days, Paxlovid could prevent tens of thousands of avoidable deaths. But instead, the FDA won’t immediately let Pfizer sell a drug it knows to be lifesaving. "..."

Face Masks: Much More Than You Wanted To Know (SSC post written when CDC was still telling people not to wear masks)

So if studies generally suggest masks are effective, and the CDC wasn’t deliberately lying to us, why are they recommending against mask use? ...[He goes on to give some possible reasons.]

Covid 8/27: The Fall of the CDC

"An attempt at a “good faith” interpretation of the new testing guidelines, that you ‘do not necessarily need a test’ even if you have been within 6 feet of a known positive for fifteen minutes, is that the CDC is lying. That they are doing the exact same thing with testing that was previously done with masks. ..."

CDC Changes Isolation Guidelines

Here was the CDC’s explicit reasoning on not requiring a test, quote from the Washington Post article:
"Walensky said the agency decided not to require a negative test result after people had isolated for five days because the results are often inaccurateat that point in an infection. PCR tests — those typically performed in a lab which are around 98 percent effective — can show positive results long after a person is no longer infectious because of the presence of viral remnants,she said. It remains unclear how well rapid, at-home tests determine someone’s ability to transmit the virus in the latter part of their infection, she added."
This is standard government thinking. We can’t use PCR for the sensible reason that it will stay positive long after infectiousness. We can’t use rapid tests because we don’t know how accurate they are in this particular situation, so instead we’re going to (1) not run experiments to find out, experiments remain illegal and (2) instead not run any tests at all, which is known to be about 50% accurate. I call heads.

ETA: https://twitter.com/robertwiblin/status/1463748021011681285

"The US CDC in an article updated in October 2021 is still telling people not to wear N95 masks, even though they are in abundant supply and vastly more effective than the cloth masks they seemingly recommend.
Absolutely disgraceful: https://cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/about-face-coverings.html"

Comment by Jalen Lyle-Holmes on Book review: "Feeling Great" by David Burns · 2021-08-23T06:02:52.521Z · LW · GW

There's a 16 week Zoom book club coming up for Burns' book about TEAM-CBT, facilitated by a TEAM-CBT trainer, in case anyone is interested (starts Sep 8th 2021): https://www.feelinggreattherapycenter.com/book-club
(I just signed up)

Comment by Jalen Lyle-Holmes on TEAM: a dramatically improved form of therapy · 2021-08-23T06:01:21.749Z · LW · GW

There's a 16 week Zoom book club coming up for Burns' book about TEAM-CBT, facilitated by a TEAM-CBT trainer, in case anyone is interested: https://www.feelinggreattherapycenter.com/book-club
(I just signed up)

Comment by Jalen Lyle-Holmes on Knowledge is not just precipitation of action · 2021-06-30T05:57:15.961Z · LW · GW

To me it seems useful to distinguish two different senses of 'containing knowledge', and that some of your examples implicitly assume different senses. Sense 1: How much knowledge a region contains, regardless of whether an agent in fact has access to it (This is the sense in which the sunken map does contain knowledge) and 2. How much knowledge a region contains and how easily a given agent can physically get information about the relevant state of the region in order to 'extract' the knowledge it contains (This is the sense in which the go-kart with a data recorder does not contain a lot of knowledge).

If we don't make this distinction, it seems like either both or neither of the sunken map and go kart with data recorder examples should be said to contain knowledge. You make an argument that the sunken map should count as containing knowledge, but it seems like we could apply the same reasoning to the go-kart with data recorder:

"We could board the ship and see an accurate map being drawn. It would be strange to deny that this map constitutes knowledge simply because it wasn’t later used for some instrumental purpose."

becomes

"We could retrieve the data recorder and see accurate sensor recordings being made. It would be strange to deny that this data recorder constitutes knowledge simply because it wasn't later used for some instrumental purpose."

Though there does seem to be a separate quantitative distinction between these two cases, which is something like "Once you know the configuration of the region in question (map or data recorder), how much computation do you have to do in order to be able to use it for improving your decisions about what turns to make." (Map has lower computation needed, data recorder has more as you need to compute the track shape from the sensor data). But this 'amount of computation' distinction is different to the distinction you make about 'is it used for an instrumental purpose'.

Comment by Jalen Lyle-Holmes on Problems facing a correspondence theory of knowledge · 2021-06-30T05:46:26.468Z · LW · GW

Hm on reflection I actually don't think this does what I thought it did. Specifically I don't think it captures the amount of 'complexity barrier' reducing the usability of the information. I think I was indeed equivocating between computational (space and time) complexity, vs. Kolmogorov complexity. My suggestion captures the later, not the former.

Also, some further Googling has told me that the expected absolute mutual information, my other suggestion at the end, is "close" to Shannon mutual information (https://arxiv.org/abs/cs/0410002) so doesn't seem like that's actually significantly different to the mutual information option which you already discussed.

Comment by Jalen Lyle-Holmes on Problems facing a correspondence theory of knowledge · 2021-06-26T01:39:29.546Z · LW · GW

Building off Chris' suggestion about Kolmogorov complexity, what if we consider the Kolmogorov complexity of thing we want knowledge about (e.g. the location of an object) given the 'knowledge containing' thing (e.g. a piece of paper with the location coordinates written on it) as input.

Wikipedia tells me this is called the 'conditional Kolmogorov complexity' of (the thing we want knowledge about) given $r$ (the state of the region potentially containing knowledge), $K (x | r)$

(Chris I'm not sure if I understood all of your comment, so maybe this is what you were already gesturing at.)

It seems like the problem you (Alex) see with mutual information as a metric for knowledge is that it doesn't take into account how "useful and accessible" that information is. I am guessing that what you mean by 'useful' is 'able to be used' (i.e. if the information was 'not useful' to an agent but simply because the agent didn't care about it, I'm guessing we wouldn't want to therefore saying the knowledge isn't there), so I'm going to take the liberty of saying "usable" here to capture the "useful and accessible" notion (But please correct me if I'm misunderstanding you).

I can see two ways that information can be less easily "usable" for a given agent. 1. Physical constraint. e.g a map is locked in a safe so it's hard for the agent to get to it, or the map is very far away. 2. Complexity, e.g. rather than a map we have a whole bunch of readings from sensors from a gokart that has driven around the area which we want to know the layout of. This is less easily "usable" than a map, because we need a longer algorithm to extract the answers we want from it (e.g. "what road will this left turn take me to) [EDIT: Though maybe I'm equivocating between Kolmogorov complexity and runtime complexity here?] This second way of being less easily usable is what duck_master articulates in their comment (I think!).

It makes sense to me to not use sense 1 (physical constraint) in our definition of knowledge, because it seems like we want to say a map contains knowledge regardless of whether it is, in your example for another post, at the bottom of the ocean or not.

So then we're left with sense 2, for which we can use the conditional Kolmogorov complexity to make a metric.

To be more specific, perhaps we could say that for a variable $X$ (e.g. the location of an object), and the state $r$ of some physical region (e.g. a map), the knowledge which $r$ contains about $X$ is

$K (x) - K (x | r)$

where $x$ is the value of the variable $X .$

This seems like the kind of thing that would already have a name, so I just did some Googling and yes it looks like this is "Absolute mutual information", notated $I_{K} (x, r)$ .

Choosing this way to define knowledge means we include cases where the knowledge is encoded by chance-- e.g. If someone draws a dot on a map at random and the dot coincidentally matches the position of an object, this metric would say that the map now does contain knowledge about the position of an object. I think this is a good thing-- It means that we can e.g. consider at a rock that came in from outer space with an inscription on it and say whether it contains knowledge, without having to know about the causal process that produced those inscriptions. But if we wanted to only include cases where there's a reliable correlation and not just chance, we could modify the metric (perhaps just modify it to the expected absolute mutual information $E (I_{K} (X, R))$ ).

P.S. I commented on another post in this sequence with a different idea last night, but I like this idea better :)

Comment by Jalen Lyle-Holmes on Problems facing a correspondence theory of knowledge · 2021-06-26T00:53:31.721Z · LW · GW

Comment by Jalen Lyle-Holmes on Knowledge is not just precipitation of action · 2021-06-25T17:48:16.544Z · LW · GW

Interesting sequence so far!

Could we try like an 'agent relative' definition of knowledge accumulation?

e.g. Knowledge about X (e.g. the shape of the coastline) is accumulating in region R (e.g. the parchment) accessibly for an agent A (e.g. a human navigator) to the extent that agent A is able to condition its behaviour on X by observing R and not X directly. (This is borrowing from the Cartesian Frames definition of an 'observable' being something the agent can condition on).

If we want to break this down to lower level concepts than 'agents' and 'conditioning behaviour' and 'observing', we could say something roughly like (though this is much more unwieldy):

is some feature of the system (e.g. shape of coastline).

$R$ is some region of the system (e.g. the parchment).

$A$ is some entity in the system which can 'behave' in different ways (over time) (e.g. the helmsman who can turn the ship's wheel over time ('over time' in the sense that they don't just have single the option to 'turn right' or 'turn left' once, rather they have the option to 'turn right for thirty minutes, then turn left for twenty minutes, then...' or some other trajectory)

Definition for 'conditioning on': We say $A$ is 'conditioning on' $R$ if: changing $R$ causes a change in $A$ 's behaviour (i.e. if we perturb $R$ (e.g. change the map) then $A$ changes (e.g. the steering changes).) So just a Pearlian notion of causality I think.

An intermediate concept: We say $A$ is 'utilising the knowledge in R about X' if: 1. A is conditioning on R (e.g. the helmsman is condition their steering on the content of the parchment) and 2. There exists some basin of attraction B which goes to some target set T (e.g. B is some wide range of ways the world can be, and T is 'the ship ends up at this village by this time') and if A were not conditioning on R then B would be smaller (if the helmsman were not steering according to the map then they would only end up at the village on time in far fewer worlds), and 3. If A were to also condition on X, this would not expand B much (e.g. seeing the shape of the coastline once you can already read the map doesn't help you much), but 4. IF A were not conditioning on R, then conditioning on X would expand B a lot more (e.g. if you couldn't steer by the map, then seeing the shape of the coastline would help you a lot). (You could also put all this in terms of utility functions instead of target sets I reckon, but the target set approach seemed easier for this sketch).

So we've defined what it means for A to 'utilise the knowledge in R about X', but what we really want is to say what it means for A to be able to utilise knowledge in X about R, because when A is able to utilise knowledge in X about R, we can say that R contains knowledges about X accesibly for A. (e.g. if the map is not on the ship, the helmsman will not be utilising its knowledge, but in some sense they 'could' and thus we would still say the map contains the knowledge)

But now I find that it's far past my bedtime and I'm too sleepy to work out this final step haha! Maybe it's something like that R contains knowledge about X accessibly to R 'if we can, without much change to R or A, cause A to utilise the knowledge in R about X' (e.g. just by moving the map onto the ship, and not changing anything else, we can cause the helmsman to utilise the knowledge in the map). Though a clear problem here is: what if A is not 'trying' to achieve a goal that requires the knowledge on the map? (e.g. if helmsman were on the other side of the world trying to navigate somewhere else there, then they wouldn't utilise the knowledge in this map because it wouldnt be relevant). In this case it seems we cant cant A to utilise the knowledge in R about X 'without much change to R or A'-- we would need to change A to change A's goal to make it utilise the knowledge in R. Hmm.....

One thing I like about this approach is that when R does have information about X but it's not in a very 'action ready' or 'easily usable' form (e.g. if R is a disk of 10,000 hours of video taken by ships, which you could use to eventually work out the shape of the coastline) then I think this approach would say that R does contain knowledge about X (accessibly to A) to some degree but less so than if it just directly gave the shape of the coastline. What makes this approach say this? Because in the "10,000 hours of footage" case, the agent is less able to condition its behaviour on X by observing R (which is the 'definition' of knowledge under this approach)-- because A has to first do all the work of watching through the footage and extracting/calculating the relevant knowledge before it can use it, and so therefore in all that time when it is doing this processing it cannot yet condition its behaviour on X by observing R, so overall over time its behaviour is 'less conditioned' on X via R.

Anyway curious to hear your thoughts about this approach, I might get to finish filling it out another time!

Comment by Jalen Lyle-Holmes on Beware over-use of the agent model · 2021-05-29T01:47:28.370Z · LW · GW

Thank you Alex! Just sent you a PM :)

Comment by Jalen Lyle-Holmes on Beware over-use of the agent model · 2021-05-18T12:44:01.089Z · LW · GW

Oh cool, I'm happy you think it makes sense!
I mean, could the question even be as simple as "What is an optimiser?", or "what is an optimising agent?"?
With maybe the answer being maybe something roughly to do with
1. being able to give a particular cartesian frame over possible world histories, such that there exists an agent 'strategy' ('way that the agent can be') such that for some 'large' subset of possible environments $B \subseteq E$ , and some target set of possible worlds $S \subseteq W$ we have $a \cdot b \in S$ for all $b \in B$

and 2. that the agent 'strategy'/'way of being' $a$ is in fact 'chosen' by the agent
?

(1) is just a weaker version of the 'ensurable' concept in Cartesian frames, where the property only has to hold for a subset of $E$ rather than all of it. I think $B$ would correspond to both 'the basin of attraction' and 'pertubations', as a set of ways the environment can 'be' (Thinking of $A$ and $E$ as sets of possible sort of 'strategies' for agent and environment respectively across time). (Though I guess $B$ is a bit different to Basin of attraction+perturbations because the Ground of Optimization notion of 'basin of attraction' includes the whole system, not just the 'environment' part, and likewise perturbations can be to the agent as well... I feel like I'm a bit confused here.) $S$ would correspond to the target configuration set (or actually, the set of world histories in which the world 'evolves' towards a configuration in the target configuration set).

Something along those lines maybe? I'm sure you could incorporate the time aspect better by using some of the ideas from the 'Time in Cartesian Frames' post, which I haven't done here.

Comment by Jalen Lyle-Holmes on Beware over-use of the agent model · 2021-04-27T14:09:28.349Z · LW · GW

Side question: After I read some of the Cartesian Frames sequence, I wondered if something cool could come out of combining its ideas with your ideas from your Ground of Optimisation post. Because: Ground of Optimisation 1. formalises optimisation but 2. doesn't split the system into agent and environment, whereas Cartesian Frames 1. gives a way of 'imposing' an agent-environment frame on a system (without pretending that frame is a 'top level' 'fundamental' property of the system) , but 2. doesn't really deal with optimisation. So I've wondered if there might be something fruitful in trying to combine them in some way, but am not sure if this would actually make sense/be useful (I'm not a researcher!), what do you think?

Comment by Jalen Lyle-Holmes on Beware over-use of the agent model · 2021-04-27T14:09:02.317Z · LW · GW

I assume you've read the Cartesian Frame sequence? What do you think about that as an alternative to the traditional agent model?

User info

Posts

Comments

Covid 8/27: The Fall of the CDC

CDC Changes Isolation Guidelines