jonathan_graehl

I'm unclear on whether the 'dimensionality' (complexity) component to be minimized needs revision from the naive 'number of nonzeros' (or continuous but similar zero-rewarded priors on parameters).

Either:

the simplest equivalent (by naive score) 'dimensonality' parameters are found by the optimization method, in which case what's the problem?
not. then either there's a canonicalization of the equivalent onto- parameters available that can be used at each step, or an adjustment to the complexity score that does a good job doing so, or we can't figure it out and we risk our optimization methods getting stuck in bad local grooves because of this.

Does this seem fair?

Comment by Jonathan_Graehl on Neural networks generalize because of this one weird trick · 2023-01-18T21:32:43.037Z · LW · GW

This appears to be a high-quality book report. Thanks. I didn't see anywhere the 'because' is demonstrated. Is it proved in the citations or do we just have 'plausibly because'?

Physics experiences in optimizing free energy have long inspired ML optimization uses. Did physicists playing with free energy lead to new optimization methods or is it just something people like to talk about?

Comment by Jonathan_Graehl on On Cooking With Gas · 2023-01-17T21:39:05.314Z · LW · GW

This kind of reply is ridiculous and insulting.

Comment by Jonathan_Graehl on Scaling laws vs individual differences · 2023-01-13T21:51:13.466Z · LW · GW

We have good reason to suspect that biological intelligence, and hence human intelligence roughly follow similar scaling law patterns to what we observe in machine learning systems

No, we don't. Please state the reason(s) explicitly.

Comment by Jonathan_Graehl on Google Search loses to ChatGPT fair and square · 2023-01-11T22:51:57.125Z · LW · GW

Google's production search is expensive to change, but I'm sure you're right that it is missing some obvious improvements in 'understanding' a la ChatGPT.

One valid excuse for low quality results is that Google's method is actively gamed (for obvious $ reasons) by people who probably have insider info.

IMO a fair comparison would require ChatGPT to do a better job presenting a list of URLs.

Comment by Jonathan_Graehl on Sparse trinary weighted RNNs as a path to better language model interpretability · 2022-09-17T20:33:46.837Z · LW · GW

how is a discretized weight/activation set amenable to the usual gradient descent optimizers?

Comment by Jonathan_Graehl on Argument against 20% GDP growth from AI within 10 years [Linkpost] · 2022-09-14T03:04:10.347Z · LW · GW

You have the profits from the AI tech (+ compute supporting it) vendors and you have the improvements to everyone's work from the AI. Presumably the improvements are more than the take by the AI sellers (esp. if open source tools are used). So it's not appropriate to say that a small "sells AI" industry equates to a small impact on GDP.

But yes, obviously GDP growth climbing to 20% annually and staying there even for 5 years is ridiculous unless you're a takeoff-believer.

Comment by Jonathan_Graehl on Taking the parameters which seem to matter and rotating them until they don't · 2022-08-26T20:36:46.316Z · LW · GW

You don't have to compute the rotation every time for the weight matrix. You can compute it once. It's true that you have to actually rotate the input activations for every input but that's really trivial.

Comment by Jonathan_Graehl on Taking the parameters which seem to matter and rotating them until they don't · 2022-08-26T20:35:28.156Z · LW · GW

Interesting idea.

Obviously doing this instead with a permutation composed with its inverse would do nothing but shuffle the order and not help.

You can easily do the same with any affine transformation, no? Skew, translation (scale doesn't matter for interpretability).

More generally if you were to consider all equivalent networks, tautologically one of them is indeed more input activation => output interpretable by whatever metric you define (input is a pixel in this case?).

It's hard for me to believe that rotations alone are likely to give much improvement. Yes, you'll find a rotation that's "better".

What would suffice as convincing proof that this is valuable for a task: the transformation increases the effectiveness of the best training methods.

I would try at least fine-tuning on the modified network.

I believe people commonly try to train not a sequence of equivalent power networks (w/ a method to project from weights of the previous architecture to the new one), but rather a series of increasingly detailed ones.

Anyway, good presentation of an easy to visualize "why not try it" idea.

Comment by Jonathan_Graehl on Is population collapse due to low birth rates a problem? · 2022-08-26T18:14:23.172Z · LW · GW

If human lives are good, depopulation should not be pursued. If instead you only value avg QOL, there are many human lives you'd want to prevent. But anyone claiming moral authority to do so should be intensely scrutinized.

Comment by Jonathan_Graehl on Is population collapse due to low birth rates a problem? · 2022-08-26T18:11:35.422Z · LW · GW

To sustain high tech-driven growth rates, we probably need (pre-real-AI) an increasing population of increasingly specialized and increasingly long-lived researchers+engineers at every intelligence threshold - as we advance, it takes longer to climb up on giants' shoulders. It's unclear what the needs are for below-threshold population (not zero, yet). Probably Elon is intentionally not being explicit about the eugenic-adjacent angle of the situation.

Comment by Jonathan_Graehl on What's up with the bad Meta projects? · 2022-08-19T21:02:29.631Z · LW · GW

IMO this project needs an aesthetic leader. A bunch of technically competent people building tools they think might be useful is very likely to result in a bunch of unappealing stuff no one wants.

Comment by Jonathan_Graehl on What's up with the bad Meta projects? · 2022-08-19T21:00:58.499Z · LW · GW

In Carmack's recent 5+hr interview on Lex Friedman [1], he points out that finding a particular virtual setting that people love and focusing effort on that is usually how we arrive at games/spaces that have historically driven hardware/platform adoption, and that Zucc is very obviously not doing that. The closest successful virtual space to Zucc's approachis Roblox, a kind of social game construction kit (with pretty high market cap), but in his opinion the outcome is usually you build it and they don't come. I believe Carmack also favors the technical results of optimizing a platform along with a particular game, which is part of his strong motivation for making things better in his immediate environment.

[1]

Comment by Jonathan_Graehl on The longest training run · 2022-08-17T17:45:35.509Z · LW · GW

This is good thinking. Breaking out of your framework: trainings are routinely checkpointed periodically to disk (in case of crash) and can be resumed - even across algorithmic improvements in the learning method. So some trainings will effectively be maintained through upgrades. I'd say trainings are short mostly because we haven't converged on the best model architectures and because of publication incentives. IMO benefitting from previous trainings of an evolving architecture will feature in published work over the next decade.

Comment by Jonathan_Graehl on Sexual Abuse attitudes might be infohazardous · 2022-07-29T01:17:18.075Z · LW · GW

One of the reasons abusers of kids/teens aren't fully prosecuted is because parents of victims rightly predict that everyone knowing you were raped by the babysitter or whatever will generate additional psych baggage and selfishly refrain from protecting other children from the same predator.

Comment by Jonathan_Graehl on Donohue, Levitt, Roe, and Wade: T-minus 20 years to a massive crime wave? · 2022-07-06T18:34:03.824Z · LW · GW

How are we ever supposed to believe that enough variables were 'controlled for'?

More abortions -> [lag 15 years] less crime is of course plausible. We should expect smaller families produced by abortion to have more resources available for the surviving children, if any, which plausibly could reduce their criminality. But the hypothesis is clearly also motivated by a belief that we should hope genetically criminal-inclined people differentially have most of the abortions (though I'm sure this motivation is not forefronted by authors).

Comment by Jonathan_Graehl on Looking back on my alignment PhD · 2022-07-01T05:18:03.496Z · LW · GW

Congrats on the accomplishments. Leaving aside the rest, I like the prompt: why don't people wirehead? Realistically, they're cautious due to having but one brain and a low visibility into what they'd become. A digital-copyable agent would, if curious about what slightly different versions of themselves would do, not hesitate to simulate one in a controlled environment.

Generally I would tweak my brain if it would reliably give me the kind of actions I'd now approve of, while providing at worst the same sort of subjective state as I'd have if managing the same results without the intervention. I wouldn't care if the center of my actions was different as long as the things I value today were bettered.

Anyway, it's a nice template for generating ideas for: when would an agent want to allow its values to shift?

I'm glad you broke free of trying to equal others' bragged-about abilities. Not everyone needs to be great at everything. People who invest in learning something generally talk up the benefits of what they paid for. I'm thinking of Heinlein's famous "specialization is for insects" where I presume much of the laundry lists of things every person should know how to do are exactly the arbtirary things he knows how to do.

Comment by Jonathan_Graehl on Failing to fix a dangerous intersection · 2022-06-30T18:35:21.398Z · LW · GW

LA has a tradition of guerrilla freeway sign enhancements as a result of similar authority non-responsiveness. http://www.slate.com/blogs/the_eye/2015/02/11/guerrilla_public_service_on_99_invisible_richard_ankrom_replaced_a_los_angeles.html

Comment by Jonathan_Graehl on Why I don't believe in doom · 2022-06-20T20:48:24.641Z · LW · GW

A general superhuman AI motivated to obtain monopoly computational power could do a lot of damage. Security is hard. Indeed we'd best lay measures in advance. 'Tool' (we hope) AI will unfortunately have to be part of those measures. There's no indication we'll see provably secure human-designed measures built and deployed across the points of infrastructure/manufacturing leverage.

Comment by Jonathan_Graehl on Why I don't believe in doom · 2022-06-20T20:43:17.638Z · LW · GW

Most people agree with this, of course, though perhaps not most active users here.

Comment by Jonathan_Graehl on Georgism, in theory · 2022-06-15T18:26:52.165Z · LW · GW

Where's the model?

Comment by Jonathan_Graehl on Explaining the Twitter Postrat Scene · 2022-04-06T19:34:26.640Z · LW · GW

Consider also that activities you find you enjoy, such as LW or Twitter posting, are likely to be judged by you as more useful than they are. Agree that LW-style is not the only one to think in. Authors here could give more weight to being easily understood than showing off.

Comment by Jonathan_Graehl on Explaining the Twitter Postrat Scene · 2022-04-06T19:29:38.137Z · LW · GW

I liked your Qanon-feminist tweet, but we have to remember that something that upsets people by creating dissonance around the mistake you intend (even if they can't pin down the intent) is not as good as actually correcting the mistake. It's certainly easier to create an emotionally jarring contrast around a mistaken belief than to get people to understand+accept an explicit correction, so I can see why you'd enjoy creating the easy+viral.

Comment by Jonathan_Graehl on They Don’t Know About Second Booster · 2022-03-31T18:24:43.564Z · LW · GW

I haven't seen booster net efficacy assessed in an honest way, since they often exclude events for the first 2 weeks post-boost. Agree that we should expect a small effect only; I would approve for whoever wants and leave it at that.

Comment by Jonathan_Graehl on Why is the war in Ukraine particularly dangerous for the world? · 2022-03-11T23:07:34.507Z · LW · GW

While I lived through and can confirm the prevlance of the 'extinguish all civilization' MAD narrative, I wonder today how extinguished it actually would have been. (famine due to a year of reduced sunlight from dust floating around was part of the story)

Comment by Jonathan_Graehl on Lives of the Cambridge polymath geniuses · 2022-01-26T08:32:04.449Z · LW · GW

https://aviation.stackexchange.com/questions/75411/was-ludwig-wittgensteins-aircraft-propeller-ever-built imaginative I suppose. Why is Wittgenstein thought to have contributed anything of worth? Yes, he was clever. Yes, some of his contemporaries praised him.

Comment by Jonathan_Graehl on I have COVID, for how long should I isolate? · 2022-01-13T21:23:03.796Z · LW · GW

sniffles don't matter; 10 days after fever's end seems generous/considerate. allegedly positive nasal swab antigen tests will persist for days after it's impossible to lab-culture the virus from a snot sample but in any case such tests are definitely negative after 14 days of onset

Comment by Jonathan_Graehl on The Speed of Rumor · 2022-01-04T19:42:23.337Z · LW · GW

Aren't rumors typically rounded up for impact in the fashion you caught this someone doing by luck of existing direct knowledge?

Comment by Jonathan_Graehl on COVID Skepticism Isn't About Science · 2021-12-31T06:33:23.779Z · LW · GW

Poll inadequancy: zero is not right, but I think the answer to P(hospitalized|covid) is <1%

Comment by Jonathan_Graehl on Help figuring out my sexuality? · 2021-12-22T22:23:38.334Z · LW · GW

Do you like strip clubs?

Comment by Jonathan_Graehl on Help figuring out my sexuality? · 2021-12-22T22:22:14.741Z · LW · GW

Sounds like you've imprinted some sort of not exactly resentment+rejection of the power+value of female sexuality (as I think some gay men have) but rather frustrated worship+submission to it, congruent with high porn consumption, although you say you don't actually consume much since the out and about the powerless man ogling/frustration stimulus is enough.
This voyeurish mode and esp. the powerlessness arousal fetish doesn't help you pose as the typically high-value 'prize' so the lack of access isn't surprising. As an unsolicited prescription, I'd suggest getting used to interacting with as high-value women as you can stand as powerfully as possible (even if that mean just not acting thirsty; confident flirtation/approaches are even better). If your desire were more connected to pursuit you'd learn+calibrate as part of a road to increasing comfort and inevitably results.

Comment by Jonathan_Graehl on Let's buy out Cyc, for use in AGI interpretability systems? · 2021-12-08T03:05:45.943Z · LW · GW

Cyc's human-input 'knowledge' would indeed be an interesting corpus, but my impression has always been that nothing really useful has come of it to date. I wouldn't pay much for access.

Comment by Jonathan_Graehl on Frame Control · 2021-11-30T03:14:07.232Z · LW · GW

I'm not seeing any difference between pressure and aggression these days.

Comment by Jonathan_Graehl on Why don't our vaccines get updated to the delta spike protein? · 2021-11-28T04:17:38.035Z · LW · GW

Trying to push out a revision costs money and doesn't earn any expected money. And everyone knows this is so. Unofficial market collusion regularly manages to solve harder problems; you don't need explicit comms at all.

I'll grant that we'll hear some competitive "ours works better on variant X" marketing but a new even faster approval track would be needed if we really wanted rapid protein updates.

As evhub mentions, the antibodies you make given the first vaccine you're exposed to are what will get manufactured every time you see a similar-enough provocation. It may be impossible to switch the learned immune response without some specially designed "different enough" protein that's hoped to also be protective against the latest variant. I buy the 'original antigenic sin' concept - there has to be a reason we're not naturally immune to flu and corona-colds already after many previous encounters.

Comment by Jonathan_Graehl on Paxlovid Remains Illegal: 11/24 Update · 2021-11-25T08:07:10.815Z · LW · GW

Why are you quoting without correction someone who thinks 5 billion divided by 10 million is 500,000 (it's 500)?

Comment by Jonathan_Graehl on Insights from Modern Principles of Economics · 2021-10-18T18:38:10.667Z · LW · GW

presumably perfect competition defects from perfect price discrimination

Comment by Jonathan_Graehl on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-18T03:40:31.317Z · LW · GW

'how-level' would be easier to parse

Comment by Jonathan_Graehl on Is GPT-3 already sample-efficient? · 2021-10-06T17:18:36.458Z · LW · GW

In general a language model will 'know' the sentence related to the single occurrence of a rare name. I don't think you learn much here if there are enough parameters available to support this memory.

Comment by Jonathan_Graehl on Is GPT-3 already sample-efficient? · 2021-10-06T17:17:06.632Z · LW · GW

Perhaps GPT-3 has more parameters than are probably needed to roughly memorize its very large training data. This would be good since the data contains some low quality garbage, false claims, etc (can think of them as 'noise'). I believe GPT-n are adding parameters faster than training data Here's my summary of a paper that suggests this is the right move:

https://www.youtube.com/watch?v=OzGguadEHOU Microsoft guy Sebastian Bubeck talking about seemingly overparameterized neural models being necessary for learning (due to label noise?). Validation 'early stopping' of training duration or size scaling is a mistake. after you're over some initial hump that would trigger validation early stopping, overfitting is 'benign' [already known, dubbed 'double descent']. As soon as you can defeat adversarial attacks then you're probably using enough parameters. He (+intern) proves that in order to perfectly memorize the label-noised data set such that small perturbations in the noise don't change predicted output, you need a much larger parameter set than the data set (perfectly memorizing the training data set should be possible within some constant factor of its size). He predicts that ImageNet (image labeling task) could benefit from 10-100 billion parameters instead of the current sub-1-billion.

(obviously GPT- are language models but they can be thought of as having an output which is the masked word or the sentence-before-or-after or whatever they're using to train)

Comment by Jonathan_Graehl on Covid 9/23: There Is a War · 2021-09-23T17:35:48.799Z · LW · GW

Two reasons you could recommend boosters for vulnerable only:

global first doses first thinking
awareness that eradicating covid by rapid vaccination to herd immunity is futile given current effectiveness+adoption and hope to reduce the mareks-like adaptation of more vax-resistant strains so that the vulnerable can have more of the benefit preserved to them

It does seem that, temporarily supply shortages aside, you should advocate universal 'vaccination' (say w/ moderna) iff you also advocate ongoing doses until a real vaccine is available.

Comment by Jonathan_Graehl on Long Covid Is Not Necessarily Your Biggest Problem · 2021-09-01T08:51:02.846Z · LW · GW

Your contrary cite notwithstanding, I predict Delta will end up less damaging on average and more cases will go uncounted due to its mildness. This may also drive some overestimation of its virulence. It does appear to spread well enough that is a question of when not if you'll be exposed.

Comment by Jonathan_Graehl on COVID/Delta advice I'm currently giving to friends · 2021-08-24T03:57:40.636Z · LW · GW

agree. thanks

Comment by Jonathan_Graehl on Pedophile Problems · 2021-08-15T19:10:37.383Z · LW · GW

as always the legal term 'minor' is not really germane to the topic people really care about

Comment by Jonathan_Graehl on Pedophile Problems · 2021-08-15T19:09:44.641Z · LW · GW

Everyone wants fewer of these people.

If there's a way there that involves an edit of existing people (including by invasive 'minder' future tech), fine.

Otherwise, prevent them being born or destroy them.

Comment by Jonathan_Graehl on Causes of a Debt Crisis—Economic · 2021-07-02T00:01:45.569Z · LW · GW

Holders of prepayable loans don't really benefit much when rates drop, so I'll assume you mean bond-like instruments (or ones that aren't likely to be refinanced out of, or that pay some bonus in that event).

Comment by Jonathan_Graehl on How will OpenAI + GitHub's Copilot affect programming? · 2021-06-29T20:14:47.190Z · LW · GW

surely private installations of the facility will be sold to trade-secret-protecting teams

Comment by Jonathan_Graehl on Covid vaccine safety: how correct are these allegations? · 2021-06-14T20:33:58.687Z · LW · GW

'If this were true, where are the lawsuits against the vaccine makers?'

Surely they've been shielded from liability so there won't be any.

Comment by Jonathan_Graehl on Often, enemies really are innately evil. · 2021-06-07T17:17:47.774Z · LW · GW

To me, 'evil' means 'should be destroyed if possible'. Therefore I don't like to hand out the label recklessly, as it leads generally to impotent rage, which is harmful to me.

Comment by Jonathan_Graehl on Why has no one compared Covid-19 and Vaccine Risks? · 2021-06-04T21:18:26.701Z · LW · GW

Is only 1/3 of Long Covid sufferers actually having had covid definitely a thing, too? I think it is (or maybe antibody tests give many false positives?)

Comment by Jonathan_Graehl on Why has no one compared Covid-19 and Vaccine Risks? · 2021-06-04T21:16:50.001Z · LW · GW

That seems a bit overconfident. Immunity is one supposed long-term effect. Death is another long-term effect though obviously infrequent in approved vaccines.

User info

Posts

Comments