Posts

Learning Written Hindi From Scratch 2024-04-11T11:13:17.743Z
David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud. 2024-01-27T13:21:05.068Z
Wobbly Table Theorem in Practice 2023-09-28T14:33:16.898Z
“Thinking Physics” as an applied rationality exercise 2023-08-27T15:31:00.814Z
“Thinking Physics” as an applied rationality exercise 2023-08-10T08:32:01.075Z
Karlsruhe Rationality Meetup: Inadequate Equilibria pt2 2022-11-10T10:00:44.150Z
Karlsruhe Rationality Meetup: Inadequate Equilibria pt1 2022-11-02T09:15:57.489Z
What Is the Idea Behind (Un-)Supervised Learning and Reinforcement Learning? 2022-09-30T16:48:06.523Z
Does the existence of shared human values imply alignment is "easy"? 2022-09-26T18:01:10.661Z
Karlsruhe Rationality Meetup: Predictions 2022-09-06T16:56:57.021Z
Moneypumping Bryan Caplan's Belief in Free Will 2022-07-16T00:46:03.176Z
Returns on cognition of different board games 2022-02-13T20:40:49.163Z
Coping with Undecidability 2022-01-27T10:31:00.520Z
Time until graduation as a proxy for picking between (German) universities 2022-01-24T18:27:32.984Z
Are "non-computable functions" always hard to solve in practice? 2021-12-20T16:32:25.118Z
What is the evidence on the Church-Turing Thesis? 2021-09-19T11:34:49.377Z
Chance that "AI safety basically [doesn't need] to be solved, we’ll just solve it by default unless we’re completely completely careless" 2020-12-08T21:08:47.575Z
Morpheus's Shortform 2020-08-07T22:35:57.530Z

Comments

Comment by Morpheus on My AI Model Delta Compared To Christiano · 2024-07-18T14:53:09.405Z · LW · GW

So on the -meta-level you need to correct weakly in the other direction again.

Comment by Morpheus on Brief notes on the Wikipedia game · 2024-07-14T06:20:40.372Z · LW · GW

I used Alex Turners entire shortform for my prompt as context for gpt-4 which worked well enough to make the task difficult for me but maybe I just suck at this task.

Comment by Morpheus on What and Why: Developmental Interpretability of Reinforcement Learning · 2024-07-10T07:53:06.417Z · LW · GW

By the way, if you want to donate to this but thought, like me, that you need to be an “accredited investor” to fund Manifund projects, that only applies to their impact certificate projects, not this one.

Comment by Morpheus on gwern's Shortform · 2024-07-02T12:19:49.632Z · LW · GW

My point is more that 'regular' languages form a core to the edifice because the edifice was built on it, and tailored to it

If that was the point of the edifice, it failed successfully, because those closure properties made me notice that visibly pushdown languages are nicer than context-free languages, but still allow matching parentheses and are arguably what regexp should have been built upon.

Comment by Morpheus on gwern's Shortform · 2024-07-01T12:42:44.921Z · LW · GW

My comment was just based on a misunderstanding of this sentence:

The 'regular' here is not well-defined, as Kleene concedes, and is a gesture towards modeling 'regularly occurring events' (that the neural net automaton must process and respond to).

I think you just meant that there's really no satisfying analogy explaining why it's called 'regular'. What I thought you imply is that this class wasn't crisply characterized then or now in terms of math (it is). Thanks to your comment though, I noticed a large gap in the CS-theory understanding I thought I had. I thought that the 4 levels usually mentioned in the chomsky hierarchy are the only strict subsets for languages that are well characterized by a grammar, an automaton and a a whole lot of closure properties. Apparently the emphasis on these languages in my two stacked classes on the subject 2 years ago was a historical accident? (Looking at wikipedia, visibly pushdown languages allow intersection, so from my quick skim more natural than context-free languages). They were only discovered in 2004, so perhaps I can forgive my two classes on the subject to not have included developments 15 years in the past. Anyone has post recommendations for propagating this update?

Comment by Morpheus on Morpheus's Shortform · 2024-07-01T10:28:58.114Z · LW · GW

I noticed some time ago there is a big overlap between lines of hope mentioned in Garret Baker's post and lines of hope I already had. The remaining things he mentions are lines of hope that I at least can't antipredict which is rare. It's currently the top plan/model of Alignment that I would want to read a critique of (to destroy or strengthen my hopes). Since no one else seems to have written that critique yet I might write a post myself (Leave a comment if you'd be interested to review a draft or have feedback on the points below).

  • if singular learning theory is roughly correct in explaining confusing phenomena about neural nets (double descent, grokking), then the things confusing about these architectures are pretty straightforward implications from probability theory (Implying we might expect fewer diffs in priors between humans and neural nets because biases are less architecture dependent).
  • the idea of whether something like "reinforcing shards" can be stable if your internals are part of the context during training even if you don't have perfect interpretability
  • The idea that maybe the two ideas above  can stack? If for both humans and AI training data is the most crucial, then perhaps we can develop methods comparing human brains and AI. If we get to the point of being able to do this in detail (big If, especially on the neuroscience side this seems possibly hopeless?), then we could get further guarantees that the AI we are training is not a "psychopath".

Quite possibly further reflection feedback would change my mind and counterarguments/feedback would be appreciated. I am quite worried about motivated reasoning to think this plan is better than I think because it would give me something tractable to work on. Also to which extent people planning to work on methods that should be robust enough to survive a sharp left turn are pessimistic about lines of research like this only because of the capability externalities. I have a hard time evaluating the capability externalities of publishing research on plans like the above. If someone is interested in writing a post about this or reading it feel free to leave a comment.

Comment by Morpheus on gwern's Shortform · 2024-06-25T08:07:39.005Z · LW · GW

Aren't regular languages really well defined as the weakest level in the Chomsky Hierarchy?

Comment by Morpheus on jacquesthibs's Shortform · 2024-06-13T10:18:25.967Z · LW · GW

Would it change your mind if gpt-4 was able to do the grid tasks if I manually transcribed them to different tokens? I tried to manually let gpt-4 turn the image to a python array, but it indeed has trouble performing just that task alone.

Comment by Morpheus on My AI Model Delta Compared To Christiano · 2024-06-12T19:19:06.334Z · LW · GW

That propagates into a huge difference in worldviews. Like, I walk around my house and look at all the random goods I’ve paid for - the keyboard and monitor I’m using right now, a stack of books, a tupperware, waterbottle, flip-flops, carpet, desk and chair, refrigerator, sink, etc. Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse and would unambiguously have been worth-to-me the cost to make better.

Based on my 1 deep dive on pens a few years ago this seems true. Maybe that is too high dimensional and too unfocused a post, but maybe there should be a post on "best X of every common product people use every day"? And then we somehow filter for people with actual expertise? Like for pens you want to go with the recommendations of "the pen addict".

Comment by Morpheus on jacquesthibs's Shortform · 2024-06-12T19:02:54.301Z · LW · GW
Comment by Morpheus on jacquesthibs's Shortform · 2024-06-12T18:14:10.605Z · LW · GW

For concreteness. In this task it fails to recognize that all of the cells get filled, not only the largest one. To me that gives the impression that the image is just not getting compressed really well and the reasoning gpt-4 is doing is just fine.

Comment by Morpheus on jacquesthibs's Shortform · 2024-06-12T18:05:25.228Z · LW · GW

I think humans just have a better visual cortex and expect this benchmark too to just fall with scale.

Comment by Morpheus on jacquesthibs's Shortform · 2024-06-12T17:56:51.800Z · LW · GW

Looking at how gpt-4 did on the benchmark when I gave it some screenshots, the thing it failed at was the visual "pattern matching" (things completely solved by my system 1) rather than the abstract reasoning.

Comment by Morpheus on jacquesthibs's Shortform · 2024-06-12T17:37:47.064Z · LW · GW

Thanks for clarifying! I just tried a few simple ones by prompting gpt-4o and gpt-4 and it does absolutely horrific job! Maybe trying actually good prompting could help solving it, but this is definitely already an update for me!

Comment by Morpheus on jacquesthibs's Shortform · 2024-06-12T16:43:36.271Z · LW · GW

LLMs have failed at ARC for the last 4 years because they are simply not intelligent and basically pattern-match and interpolate to whatever is within their training distribution. You can say, "Well, there's no difference between interpolation and extrapolation once you have a big enough model trained on enough data," but the point remains that LLMs fail at the Abstract Reasoning and Concepts benchmark precisely because they have never seen such examples.

No matter how 'smart' GPT-4 may be, it fails at simple ARC tasks that a human child can do. The child does not need to be fed thousands of ARC-like examples; it can just generalize and adapt to solve the novel problem.

I don't get it. I just looked at ARC and it seemed obvious that gpt-4/gpt-4o can easily solve these problems by writing python. Then I looked it up on papers-with-code and it seems close to solved? Probably the ones remaining would be hard for children also. Did the benchmark leak into the training data and that is why they don't count them?

Comment by Morpheus on Arjun Panickssery's Shortform · 2024-06-12T09:23:12.464Z · LW · GW

Feel free to write a post if you find something worthwhile. I didn't know how likely the whole Biden leaving the race thing was so 5% seemed prudent. At those odds, even if I belief the fivethirtyeight numbers I'd rather leave my money in etfs. I'd probably need something like >>1,2 multiplier in expected value before I'd bother. Last year when I was betting on Augur I was also heavily bitten by gas fees (150$ transaction costs to get my money back because gas fees exploded for eth), so would be good to know if this is a problem on polymarket also.

Comment by Morpheus on MichaelDickens's Shortform · 2024-06-11T20:57:43.897Z · LW · GW

Heuristics I heard: cutting away moldy bits is ok for solid food (like cheese, carrot). Don't eat moldy bread, because of mycotoxins (googeling this I don't know why people mention bread in particular here). Gpt-4 gave me the same heuristics.

Comment by Morpheus on Morpheus's Shortform · 2024-06-09T13:54:16.034Z · LW · GW

Has anyone here investigated before if washing vegetables/fruits is worth it? Until recently I never washed my vegetables, because I classified that as a bullshit value claim.

Intuitively, if I am otherwise also not super hygienic (like washing my hands before eating) it doesn't seem that plausible to me that vegetables are where I am going to get infected from other people having touched the carrots etc... . Being in quarantine during a pandemic might be an exception, but then again I don't know if I am going to get rid of viruses if I am just lazily rinsing them with water in my sink. In general washing vegetables is a trivial inconvenience I'd like to avoid, because it leads me to eat less vegetables/fruits (like raw carrots or apples).

Also I assume a little pesticides and dirt are not that bad (which might be wrong).

Comment by Morpheus on CstineSublime's Shortform · 2024-06-07T18:58:40.613Z · LW · GW

Sounds like the right kind of questions to ask, but without more concrete data on what questions your predictions were off by how much, it is hard to give any better advice than: if your gut judgement tends to be 20% off after considering all evidence, move the number 20% up.

Personally me and my partner have a similar bias, but only for ourselves, so making predictions together on things like "Application for xyz will succeed. Y will read, be glad about and reply to the message I send them" can be helpful in cases where there are large disagreements.

Comment by Morpheus on Rationality Cardinality · 2024-05-21T16:58:36.356Z · LW · GW

I've recently tried to play this again with @Towards_Keeperhood. We think it was still working a year ago. He would be happy to pay a 50$ bounty for this to get fixed by reverting it to the previous version (or whatever happened there). If the code was public that would also be helpful, because then I might get to fixing it.

Comment by Morpheus on Rationality Cardinality · 2024-05-21T16:58:22.715Z · LW · GW
Comment by Morpheus on Benefitial habits/personal rules with very minimal tradeoffs? · 2024-05-14T14:06:47.184Z · LW · GW

Weekdays ^^

Comment by Morpheus on Benefitial habits/personal rules with very minimal tradeoffs? · 2024-05-13T21:51:20.810Z · LW · GW

“Sweets only on weekdays starting with S”. Depending on your lifestyle and preference for sweets, this can be easy to implement.

Comment by Morpheus on Thoughts on seed oil · 2024-05-08T18:11:18.085Z · LW · GW

Need to know.

Comment by Morpheus on Spatial attention as a “tell” for empathetic simulation? · 2024-04-29T15:13:16.753Z · LW · GW

Are there any disorders impairing spatial attention that you think would also impair empathy? I asked GPT-4 for disorders of spatial attention and gave me Hemispatial neglect and Balint's Syndrom. If things were really convenient with Hemispatial neglect, I can imagine that people always think of some of their thoughts and feelings as on the “left” side. Then they would have difficulties having those feelings once they have trouble attending to anything on the left side. For a cliché example, associating his love with his heart on the left side (Maybe that's a bad example. Perhaps better would be something where someone would have trouble telling if something was their own or another person's thought or feelings).

Comment by Morpheus on Superposition is not "just" neuron polysemanticity · 2024-04-28T12:57:06.720Z · LW · GW

Wouldn't “Neuron Polysemanticity is not 'just' Superposition” be a more fitting title?

Comment by Morpheus on Job Search Advice · 2024-04-23T10:43:24.811Z · LW · GW

A piece of advice I frequently hear: always make sure you call somebody in the company you're applying for.

Is this still up-to-date advice? Or is messaging someone over LinkedIn or similar more appropriate? Mostly asking because I got the impression that the internet changed the norms to no one doing phone calls anymore.

Comment by Morpheus on A couple productivity tips for overthinkers · 2024-04-23T08:41:03.397Z · LW · GW
  1. If you find that you’re reluctant to delete computer files / emails, don’t empty the trash

In Gmail I like to scan the email headers and then I bulk select and archive them (* a e thanks to vim shortcuts). After 5 years of doing this I still didn't run out of the free storage in Gmail. I already let Gmail sort the emails by "Primary" , "Promotions" , "Updates" etc. Usually the only important things are in "Primary" and 1 or 2 in "Updates".

Comment by Morpheus on Morpheus's Shortform · 2024-04-20T12:03:19.745Z · LW · GW

Can anyone here recommend particular tools to practice grammar? Or with strong opinions on the best workflow/tool to correct grammar on the fly? I already know Grammarly and LanguageTool, but Grammarly seems steep at $30 per month when I don’t know if it is any good. I have tried GPT-4 before, but the main problems I have there, is that it is too slow and changes my sentences more than I would like (I tried to make it do that less through prompting, which did not help that much).

I notice that feeling unconfident about my grammar/punctuation leads me to write less online, especially applying for jobs or fellowships, feels more icky because of it. That seems like an avoidable failure mode.

Ideally, I would like something like the German Orthografietrainer (It was created to teach middle and high school children spelling and grammar). It teaches you on a sentence by sentence basis where to put the commas and why by explaining the sentence structure (Illustrated through additional examples). Because it trains you with particularly tricky sentences, the training is effective, and I rapidly got better at punctuation than my parents within ~3 hours. Is there a similar tool for English that I have never heard of?

While writing this, I noticed that I did not have the free version of Grammarly enabled anymore and tried the free version while writing this. One trick I noticed is that it lists what kinds of error you are making across the whole text. So it is easy to infer what particular mistake I made in which spot, and then I correct it myself. Also, Grammarly did not catch a few simple spelling and punctuation mistakes that Grammarly caught (like “anymore” or the comma at the start of this sentence.). At the end, I also tried ProWritingAid, which found additional issues.

Comment by Morpheus on Is LLM Translation Without Rosetta Stone possible? · 2024-04-11T09:27:36.672Z · LW · GW

Trying to learn a language from scratch, just from text is a fun exercise for humans also. I recently tried this with Hindi after I had an disagreement with someone about the exact question of this post. I didn't get very far in 2 hours though.

Comment by Morpheus on Quinn's Shortform · 2024-04-06T20:50:58.135Z · LW · GW

Trydactyl is amazing. You can disable the mode on specific websites by running the blacklistadd command. If you have configured that already, these settings can also be saved in your config file. Here's my config (though careful before copying my config. It has fixamo_quiet enabled, a command that got Tridactyl almost removed when it was enabled by default. You should read what it does before you enable it.)

Here are my ignore settings:

autocmd DocStart https://youtube.com mode ignore
autocmd DocStart https://todoist.com mode ignore
autocmd DocStart mail.google.com mode ignore
autocmd DocStart calendar.google.com mode ignore
autocmd DocStart keyma.sh mode ignore
autocmd DocStart monkeytype.com mode ignore
autocmd DocStart https://www.youtube.com mode ignore
autocmd DocStart https://ilias.studium.kit.edu/ mode ignore
autocmd DocStart localhost:888 mode ignore
autocmd DocStart getguestimate.com mode ignore
autocmd DocStart localhost:8888 mode ignore
Comment by Morpheus on The Best Tacit Knowledge Videos on Every Subject · 2024-03-31T18:50:20.864Z · LW · GW

Juggling: Anthony Gatto's juggling routine from 2000. Anthony Gatto holds several juggling world records. This routine is infamous in the juggling world (here's a decent juggler commenting on it). As well as the fact that he gave up juggling to work with concrete instead (because it pays the bills). Here's more context on Gatto and his routine (the guy picking up the balls for him in the video is his father, for example):

Comment by Morpheus on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-23T21:05:18.419Z · LW · GW

Agreed. Especially the “electoral college is good actually” part is where I started laughing. If you don't want tyranny by the majority, perhaps just not crippling your system by not using first-past-the-post voting would be a first step to a more sane system.

Comment by Morpheus on On green · 2024-03-23T20:15:20.541Z · LW · GW

Absolutely love this essay! The green from the perspective of non-green thoughts really resonated with things I thought in the past and made me notice how I have been confused by green. Helpfull for AGI or not, this is is giving me a bunch of fresh thoughts about problems/confusing areas in my own life, so thanks!

Comment by Morpheus on Natural Latents: The Concepts · 2024-03-21T22:44:36.185Z · LW · GW

A quick intuitive check for whether something is a natural latent over some parts of a system consists of two questions:

  • Are the parts (approximately) independent given the candidate natural latent?

I first had some trouble checking this condition intuitively. I might still not have got it correctly. I think one of the main things that got me confused first, is that if I want to reason about natural latents for “a” dog, I need to think about a group of dogs. Even though there are also natural latents for the individual dog (like fur color is a natural latent across the dog's fur). Say I check the independence condition for a set of sets of either cats or dogs. So if I look at a single animal's shoulder height in those sorted cluster, it tells me which of the two clusters it's in, but once I updated on that information, my guesses for the dog height's will not be able to improve.

An important example for something that is not a natural latent is the empirical mean in fat tailed distributions for real world sample sizes, while it is in thin-tailed ones. This doesn't mean that they don't have natural latents. This fact is what Nassim Taleb is harping on. For Pareto distributions (think: pandemics, earthquakes, wealth), one still has natural latents like the tail index (estimated from plotting the data on a log-log plot by dilettantes like me and more sophisticatedly by real professionals).

Comment by Morpheus on Morpheus's Shortform · 2024-03-21T14:27:36.564Z · LW · GW

If I had more time I would have written a shorter letter.

TLDR: I looked into how much it would take to fine-tune gpt-4 to do Fermi estimates better. If you liked the post/paper on fine-tuning Language models to make predictions you might like reading this. I evaluated gpt-4 on the first dataset I found, but gpt-4 was already making better fermi estimates than the examples in the dataset, so I stopped there (my code).

First problem I encountered: there is no public access to fine-tuning gpt-4 so far. Ok, we might as well just do gpt-3.5 I guess.

First, I found this Fermi estimate dataset. (While doing this, I was thinking I should perhaps search more widely what kind of different AI benchmarks exist, since probably a dataset that is evaluating a similar capability is already out there, but I don't know its name.)

Next I looked at this paper, where people used among other gpt-3.5 and gpt-4 on this benchmark. Clearly these people weren't even trying, though, because gpt-4 does worse than gpt-3.5. One of the main issues I saw was that they were trying to make the LLM output the answer as a program in the domain specific language used in that dataset. They couldn't even get the LLM to output valid programs more than 60% of the time (their metric compares on a log scale, if the answer by the LLM is within 3 orders of magnitude of the real answer. 1 is best 0 is more than 3 orders of magnitude away: fp-score(x) = max(0,1-1/3 * | log_10(prediction/answer)|)).

image

My conjecture was that just using python instead should give you better results.(This turned out to be true). I get a mean score of ~0.57 on 100 sample problems, so as good results with gpt-4-turbo as they get when they first provide “context” by giving the llm the values for the key variables needed to compute the answer (why would this task even still be hard at all?).

When gpt-4 turned out to get a worse fp-score than gpt-4-turbo on my 10 samples. I got suspicious and after looking at samples gpt-4 got a bad score, it was clear this was mostly to blame on bad quality of the dataset. 2 answers were flat-out not using the correct variables/confused, while gpt-4 was answering correctly. Once, the question didn't make clear what unit to use. 2 of the samples gpt-4 gave a better answer. Once, using a better approach (using geometry instead of wrong figures of how much energy the earth gets from the sun, to determine the fraction of sun energy that the earth receives). Once, by having better numbers, input estimates like how many car miles are driven in total in the US.

So on this dataset, gpt-4 seems to be already at the point of data-saturation. I was actually quite impressed how well it was doing. When I had tried using gpt-4 for this task, I had always felt like it was doing quite badly. One guess I have is this is because when I ask gpt-4 for an estimate, it is often a practical question, which is actually harder than these artificial questions. In addition, the reason I ask gpt-4 is that the question is hard, and I expect to need to employ a lot of cognitive labor to do it myself.

Another data point with respect to this was the “Thinking physics exercises”. Which I tried with some of my friends. For that task, gpt-4 was better than people who were bad at this, but worse than people who were good at this (and given 5–10 minutes of thinking time) (although I did not rigorously evaluate that). GPT-4 is probably better than most humans at doing Fermi estimates given 10 minutes of time. Especially in domains one is unfamiliar with, since it has so much more breadth.

I would be interested to see what one would get out of actually making a high quality dataset by taking Fermi estimates from people I deem to produce high quality work in that area. 

Comment by Morpheus on D0TheMath's Shortform · 2024-03-20T11:49:08.074Z · LW · GW

Not exactly what you were looking for, but recently I noticed that there were a bunch of John Wentworth's posts that I had been missing out on that he wrote over the past 6 years. So if you get a lot out of them too, I recommend just sorting by 'old'. I really liked don't get distracted by the boilerplate (The first example made something click about math for me that hadn't clicked before, which would have helped me to engage with some “boilerplate” in a more productive way.). I also liked constraints and slackness, but I didn't go beyond the first exercise yet. There's also more technical posts that I didn't have the time to dig into yet.

bhauth doesn't have as long a track record, but I got some interesting ideas from his blog which aren't on his lesswrong account. I really liked proposed future economies and the legibility bottleneck.

Comment by Morpheus on Useful Vices for Wicked Problems · 2024-03-07T04:49:05.543Z · LW · GW

This post warms my heart. Thank you.

Comment by Morpheus on Alex_Altair's Shortform · 2024-03-07T01:19:54.637Z · LW · GW

The pdf linked by @CstineSublime is definitely towards the textbook. I’ve started reading it and it has been an excellent read so far. Will probably write a review later.

Comment by Morpheus on Morpheus's Shortform · 2024-03-06T06:02:37.254Z · LW · GW

While there is currently a lot of attention on assessing language models, it puzzles me that no one seems to be independently assessing the quality of different search engines and recommender systems. Shouldn't this be easy to do? The only thing I could find related to this is this Russian site (It might be propaganda from Yandex, as it is listed as the top quality site?). Taking their “overall search quality” rating at face value does seem to support the popular hypothesis that search quality of Google has slightly deteriorated over the last 10 years (although compared to 2009-2012, quality has been basically the same according to this measure). Overall search result quality.

The gpt-4 translated version of their blog states that they gave up actively maintaining this project in 2014, because search engine quality had become reasonable according to them:

For the first time in the history of the project, we have decided to shut down one of the analyzers: SEO pressing as a phenomenon has essentially become a thing of the past, and the results of the analyzer have ceased to be interesting.

Despite the fact that search engine optimization as an industry continues to thrive, search engine developers have made significant progress in combating the clutter of search results with specially promoted commercial results. The progress of search engines is evident to the naked eye, including in the graph of our analyzer over the entire history of measurements:

commercial results

SEO Pressing Analyzer Graph

The result of the analyzer is the share of commercial sites in the search results for queries that do not have a clearly commercial meaning; when there are too many such sites in the search results, it is called susceptibility to SEO pressing. It is easy to see that a few years ago, more than half (sometimes significantly more than half) of the search results from all leading search engines consisted of sites offering specific goods or services. This is, of course, a lot: a query can have different meanings, and the search results should cater to as many of them as possible. At the same time, a level of 2-3 such sites seems decent, since a user who queries "Thailand" might easily be interested in tours to that country, and one who queries "power station" might be interested in power stations for a country home.

If we are worried that current recommender systems are already doing damage and expect things to get worse in the future, it might be good to actively monitor this to not get frog boiled.

Comment by Morpheus on Morpheus's Shortform · 2024-03-04T04:56:36.394Z · LW · GW

Metaculus recently updated the way they score user predictions. For anyone who used to be active on Metaculus and hasn't logged on for a while, I recommend checking out your peer and baseline accuracy scores in the past years. With the new scoring system, you can finally determine whether your predictions were any good compared to the community median. This makes me actually consider using it again instead of Manifold.

By the way, if you are new to forecasting and want to become better, I would recommend past-casting and/or calibration games instead, because of the faster feedback loops. Instead of within weeks, you'll know within 1–2 hours whether you tend to be overconfident or underconfident.

Comment by Morpheus on Approaching Human-Level Forecasting with Language Models · 2024-03-01T09:42:30.355Z · LW · GW

Something like this sounds really useful just for my personal use. Is someone finetuning (or already finetuned) a system to be generally numerate and good at fermi estimates? Just my bad prompting skills on gpt-4 gave pretty mediocre results for that purpose.

Comment by Morpheus on ask me about technology · 2024-02-29T02:50:45.536Z · LW · GW

Do you have any views on the most promising avenues for human intelligence enhancement through biology? I'd be most interested in approaches that would give us (humanity) better odds in worlds where AI takes off in the next 1–15 years.

Comment by Morpheus on CFAR Takeaways: Andrew Critch · 2024-02-28T01:03:52.215Z · LW · GW

Rationality seems to be missing an entire curriculum on "Eros" or True Desire.

I got this curriculum from other trainings, though. There are places where it's hugely emphasized and well-taught.

What are these places?

Comment by Morpheus on How I internalized my achievements to better deal with negative feelings · 2024-02-28T00:00:39.978Z · LW · GW

This information and introspective techniques like Focusing helped me discover that these negative feelings came from some unmet need to feel worthwhile and recognized, but the problem was that I heavily tied my self-worth to the amount of progress I made.

Oops! And thanks! Somehow this articulated the uneasy relationship I had noticed in myself with regards to impact in a way that I feel like I can finally adress it.

Comment by Morpheus on shoes with springs · 2024-02-24T11:26:16.881Z · LW · GW

Reminds me of this

Comment by Morpheus on I'd also take $7 trillion · 2024-02-24T09:56:05.208Z · LW · GW

AI is something I've thought about a lot, but I think I've already posted everything about that that I want to, and people didn't seem to appreciate this that much.

Thanks for linking it! I think one reason I bounced off this article the first time was that I had pattern matched it from the title with the abundant posts on this platform that mostly distill existing arguments.

Comment by Morpheus on CFAR Takeaways: Andrew Critch · 2024-02-15T08:29:08.956Z · LW · GW

Causal Diagramming

  1. or, some kind of external media, other than writing

Anyone knows a nice way to drill this skill? I was just reading one of Steven Byrne's posts which made me notice that he is good at this and that I lack this skill currently? Also reminds me of Thinking in Systems which I read, found cool and then mostly went about my life not really applying this too much. I think I have a pretty good intuitive understanding of statistical causal relationships and have thought a lot about confounders. But I've never felt compelled to whip out a diagram.

Comment by Morpheus on I played the AI box game as the Gatekeeper — and lost · 2024-02-13T23:15:19.544Z · LW · GW

I'd also bet $50 as a gatekeeper. I won this game as a gatekeeper before and now need someone to put my ego in place. I'd prefer to play against someone who won as the AI before.

This post prompted me to wonder to which degree there might be publication bias going on in that people don't report when they "predictably" won as the gatekeeper (as I did).

Comment by Morpheus on I played the AI box game as the Gatekeeper — and lost · 2024-02-13T23:14:38.011Z · LW · GW