Posts

Agentic Language Model Memes 2020-08-01T18:03:30.844Z · score: 11 (5 votes)
How well can the GPT architecture solve the parity task? 2020-07-11T19:02:07.730Z · score: 18 (5 votes)
AvE: Assistance via Empowerment 2020-06-30T22:07:50.220Z · score: 12 (2 votes)
The Economic Consequences of Noise Traders 2020-06-14T17:14:59.343Z · score: 45 (14 votes)
Facebook AI: A state-of-the-art open source chatbot 2020-04-29T17:21:25.050Z · score: 9 (3 votes)
Are there any naturally occurring heat pumps? 2020-04-13T05:24:16.572Z · score: 14 (8 votes)
Can we use Variolation to deal with the Coronavirus? 2020-03-18T14:40:35.090Z · score: 11 (5 votes)
FactorialCode's Shortform 2019-07-30T22:53:24.631Z · score: 1 (1 votes)

Comments

Comment by factorialcode on Where is human level on text prediction? (GPTs task) · 2020-09-21T08:55:50.665Z · score: 3 (2 votes) · LW · GW

Just use bleeding edge tech to analyze ancient knowledge from the god of information theory himself.

This paper seems to be a good summary and puts a lower bound on entropy of human models of english somewhere between 0.65 and 1.10 BPC. If I had to guess, the real number is probably closer 0.8-1.0 BPC as the mentioned paper was able to pull up the lower bound for hebrew by about 0.2 BPC. Assuming that regular english compresses to an average of 4* tokens per character, GPT-3 clocks in at 1.73/ln(2)/4 = 0.62 BPC. This is lower than the lower bound mentioned in the paper.

So, am I right in thinking that if someone took random internet text and fed it to me word by word and asked me to predict the next word, I'd do about as well as GPT-2 and significantly worse than GPT-3?

That would also be my guess. In terms of data entropy, I think GPT-3 is probably already well into the superhuman realm.

I suspect this is mainly because GPT-3 is much better at modelling "high frequency" patterns and features in text that account for a lot of the entropy, but that humans ignore because they have low mutual information with the things humans care about. OTOH, GPT-3 also has extensive knowledge of pretty much everything, so it might be leveraging that and other things to make better predictions than you.

This is similar to what we see with autoregressive image and audio models, where high frequency features are fairly well modelled, but you need a really strong model to also get the low frequency stuff right.

*(ask Gwern for details, this is the number I got in my own experiments with the tokenizer)

Comment by factorialcode on Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Battle of the Sexes · 2020-09-15T14:49:11.270Z · score: 3 (2 votes) · LW · GW

I'm OOTL, can someone send me a couple links that explain the game theory that's being referenced when talking about a "battle of the sexes"? I have a vague intuition from the name alone, but I feel this is referencing a post I haven't read.

Edit: https://en.wikipedia.org/wiki/Battle_of_the_sexes_(game_theory)

Comment by factorialcode on How much can surgical masks help with wildfire smoke? · 2020-08-21T16:09:19.721Z · score: 12 (5 votes) · LW · GW

I'm gonna go with barely, if at all. When you wear a surgical mask and you breath in, a lot of air flows in from the edges, without actually passing through the mask, so the mask doesn't have very good opportunity to filter the air. At least with N95 and N99 mask, you have a seal around your face, and this forces the air through the filter. Your probably better off wearing a wet bandana or towel that's been tied in such a way as to seal around your face, but that might make it hard to breath.

I found this, which suggests that they're generally ineffective. https://www.cdph.ca.gov/Programs/EPO/Pages/Wildfire Pages/N95-Respirators-FAQs.aspx

Comment by factorialcode on Money creation and debt · 2020-08-13T06:14:02.028Z · score: 2 (2 votes) · LW · GW

Yeah, I'll second the caution to draw any conclusions from this. Especially because this is macroeconomics.

Comment by factorialcode on Money creation and debt · 2020-08-12T22:00:44.591Z · score: 13 (6 votes) · LW · GW

https://en.wikipedia.org/wiki/Sectoral_balances

It is my understanding that this is broadly correct. It is also my understanding that this is not common knowledge.

Comment by factorialcode on Generalizing the Power-Seeking Theorems · 2020-07-28T18:44:24.303Z · score: 3 (2 votes) · LW · GW

One hypothesis I have is that even in the situation where there is no goal distribution and the agent has a single goal, subjective uncertainty makes powerful states instrumentally convergent. The motivating real world analogy being that you are better able to deal with unforeseen circumstances when you have more money.

Comment by factorialcode on Open & Welcome Thread - July 2020 · 2020-07-25T06:20:42.958Z · score: 5 (3 votes) · LW · GW

I've gone through a similar phase. In my experience you eventually come to terms with those risks and they stop bothering you. That being said, mitigating x and s-risks has become one of my top priorities. I now spend a great deal of my own time and resources on the task.

I also found learning to meditate helps with general anxiety and accelerates the process of coming to terms with the possibility of terrible outcomes.

Comment by factorialcode on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T00:38:26.770Z · score: 3 (2 votes) · LW · GW

The way I was envisioning it is that if you had some easily identifiable concept in one model, e.g. a latent dimension/feature that corresponds to the log odd of something being in a picture, you would train the model to match the behaviour of that feature when given data from the original generative model. Theoretically any loss function will do as long as the optimum corresponds to the situation where your "classifier" behaves exactly like the original feature in the old model when both of them are looking at the same data.

In practice though, we're compute bound and nothing is perfect and so you need to answer other questions to determine the objective. Most of them will be related to why you need to be able to point at the original concept of interest in the first place. The acceptability of misclassifying any given input or world-state as being or not being an example of the category of interest is going to depend heavily on things like the cost of false positives/negatives and exactly which situations get misclassified by the model.

The thing about it working or not working is a good point though, and how to know that we've successfully mapped a concept would require a degree of testing, and possibly human judgement. You could do this by looking for situations where the new and old concepts don't line up, and seeing what inputs/world states those correspond to, possibly interpreted through the old model with more human understandable concepts.

I will admit upon further reflection that the process I'm describing is hacky, but I'm relatively confident that the general idea would be a good approach to cross-model ontology identification.

Comment by factorialcode on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T22:52:22.304Z · score: 1 (1 votes) · LW · GW

I think you can loosen (b) quite a bit if you task a separate model with "delineating" the concept in the new network. The procedure does effectively give you access to infinite data, so the boundary for the old concept in the new model can be as complicated as your compute budget allows. Up to and including identifying high level concepts in low level physics simulations.

Comment by factorialcode on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T21:31:35.025Z · score: 5 (3 votes) · LW · GW

I think the eventual solution here (and a major technical problem of alignment) is to take an internal notion learned by one model (i.e. found via introspection tools), back out a universal representation of the real-world pattern it represents, then match that real-world pattern against the internals of a different model in order to find the "corresponding" internal notion.

Can't you just run the model in a generative mode associated with that internal notion, then feed that output as a set of observations into your new model and see what lights up in it's mind? This should work as long as both models predict the same input modality. I could see this working pretty well for matching up concepts between the latent spaces of different VAEs. Doing this might be a bit less obvious in the case of autoregressive models, but certainly not impossible.

Comment by factorialcode on $1000 bounty for OpenAI to show whether GPT3 was "deliberately" pretending to be stupider than it is · 2020-07-22T06:06:27.129Z · score: 16 (7 votes) · LW · GW

I think this is pretty straight forward to test. GPT-3 gives joint probabilities of string continuations given context strings.

Step 1: Give it 2 promps, one suggesting that it is playing the role of a smart person, and one where it is playing the roll of a dumb person.

Step 2: Ask the "person" a question that demonstrates that persons intelligence. (something like a math problem or otherwise)

Step 2: Write continuations where the person answers correctly and incorrectly

Step 3: Compare the relative probabilities GPT-3 assigns to each continuation given the promps and questions.

If GPT-3 is sandbagging itself, it will assign a notably higher probability to the correct answer when conditioned on the smart person prompt than when conditioned on the dumb person prompt. If it's not, it will give similar probabilities in both cases.

Step 4: Repeat the experiment with problems of increasing difficulty and plot the relative probability gap. This will show the limits of GPT-3's reflexive intelligence. (I say reflexive because it can be instructed to solve problems it otherwise couldn't with the amount of serial computations at it's disposal by carrying out an algorithm as part of its output, as is the case with parity)

This is an easy $1000 for anyone who has access to the beta API.

Comment by factorialcode on Collection of GPT-3 results · 2020-07-19T04:24:19.149Z · score: 10 (4 votes) · LW · GW

Hypothesis: Unlike the language models before it and ignoring context length issues, GPT-3's primary limitation is that it's output mirrors the distribution it was trained on. Without further intervention, it will write things that are no more coherent than the average person could put together. By conditioning it on output from smart people, GPT-3 can be switched into a mode where it outputs smart text.

Comment by factorialcode on Collection of GPT-3 results · 2020-07-19T02:11:12.976Z · score: 5 (4 votes) · LW · GW

According to Gwern, it fails the Parity Task.

Comment by factorialcode on The New Frontpage Design & Opening Tag Creation! · 2020-07-09T18:00:58.395Z · score: 3 (2 votes) · LW · GW

Huh.

I did not believe you so went and checked the internet archive. Sure enough, all the old posts with a ToC are off center. I did not notice until now.

Comment by factorialcode on AI Research Considerations for Human Existential Safety (ARCHES) · 2020-07-09T16:59:18.789Z · score: 1 (1 votes) · LW · GW

Nitpick, is there a reason why the margins are so large?

Comment by factorialcode on The New Frontpage Design & Opening Tag Creation! · 2020-07-09T16:37:54.235Z · score: 1 (1 votes) · LW · GW

The content on the front page is noticeably off center to the right on 1440x900 monitors.

https://imgur.com/VhPQsv6

Edit: The content is noticeably off center to the right in general.

https://imgur.com/015ewvd

Comment by factorialcode on What should we do about network-effect monopolies? · 2020-07-07T16:35:44.879Z · score: 1 (1 votes) · LW · GW

On the standardization and interoperability side of things. There's been effort to develop decentralized social media platforms and protocols. Most notably being the various platforms of the Fediverse. Together with opensource software, this let's people build large networks that keep the value of network effects while removing monopoly power. I really like the idea of these platforms, but due to the network monopoly of existing social media platforms I think they'll have great difficulty gaining traction.

Comment by factorialcode on [Crowdfunding] LessWrong podcast · 2020-07-06T06:19:02.744Z · score: 5 (4 votes) · LW · GW

Yeah, that's pretty pricy. Google is telling me that they can do 1 million characters/month for free using a wavenet. That might be good enough.

Comment by factorialcode on [Crowdfunding] LessWrong podcast · 2020-07-05T15:35:30.813Z · score: 1 (1 votes) · LW · GW

What's the going rate for audio recordings on Fiverr?

Comment by factorialcode on FactorialCode's Shortform · 2020-06-23T19:18:13.138Z · score: 3 (2 votes) · LW · GW

With the ongoing drama that is currently taking place. I'm worried that the rationalist community will find itself inadvertently caught up in the culture war. This might cause a large influx of new users who are more interested in debating politics than anything else on LW.

It might be a good idea to put a temporary moratorium/barriers on new signups to the site in the event that things become particularly heated.

Comment by factorialcode on SlateStarCodex deleted because NYT wants to dox Scott · 2020-06-23T16:34:19.058Z · score: 5 (4 votes) · LW · GW

Organizations, and entire nations for that matter, can absolutely be made to "feel fear". The retaliation just needs to be sufficiently expensive for the organization. Afterwards, it'll factor in the costs of that retaliation when deciding how to act. If the cost is large enough, it won't do things that will trigger retaliation.

Comment by factorialcode on Image GPT · 2020-06-21T18:05:50.613Z · score: 3 (2 votes) · LW · GW

There is no guarantee that it is learning particularly useful representations just because it predicts pixel-by-pixel well which may be distributed throughout the GPT,

Personally, I felt that that wasn't really surprising either. Remember that this whole deep learning thing started with exactly what OpenAI just did. Train a generative model of the data, and then fine tune it to the relevant task.

However, I'll admit that the fact that theres an optimal layer to tap into, and that they showed that this trick works specifically with transformer autoregressive models is novel to my knowledge.

Comment by factorialcode on Image GPT · 2020-06-19T04:47:04.268Z · score: 6 (4 votes) · LW · GW

This isn't news, we knew that sequence predictors could model images for almost a decade now and openAI did the same thing last year with less compute, but no one noticed.

Comment by factorialcode on Creating better infrastructure for controversial discourse · 2020-06-17T02:45:30.834Z · score: 17 (7 votes) · LW · GW

I'll quote myself:

Many of the users on LW have their real names and reputations attached to this website. If LW were to come under this kind of loosely coordinated memetic attack, many people would find themselves harassed and their reputations and careers could easily be put in danger. I don't want to sound overly dramatic, but the entire truth seeking and AI safety project could be hampered by association.

That's why even though I remain anonymous, I think it's best if I refrain from discussing these topics at anything except the meta level on LW. Even having this discussion strikes me as risky. That doesn't mean that we shouldn't discuss these topics at all. But it needs to be on a place like r/TheMotte where there is no attack vector. This includes using different usernames so we can't be traced back here. Even then, the reddit AEO and the admins are technically weak points.

Comment by factorialcode on Simulacra Levels and their Interactions · 2020-06-15T21:17:56.100Z · score: 2 (2 votes) · LW · GW

I'm going to second the request for a title change and propose:

Simulacra levels and their interactions, with applications to COVID-19

Comment by factorialcode on Self-Predicting Markets · 2020-06-11T18:50:06.178Z · score: 1 (1 votes) · LW · GW

I'm getting 404 on that link. I think you need to get rid of the period.

Comment by factorialcode on Self-Predicting Markets · 2020-06-11T06:46:33.003Z · score: 1 (1 votes) · LW · GW

Allow me to present an alternative/additional hypothesis:

https://www.reddit.com/r/wallstreetbets/comments/h0daw4/this_is_the_most_autistic_thing_ive_seen_done_by/

The market is only as smart as the people who participate in it. In the long run, the smarter agents in the system will tend to accrue more wealth than the dumber agents. With this wealth they will be able to move markets and close arbitrage opportunities. However, if an army of barely litterate idiots are given access to complex leveraged financial instruments, free money, and they all decide to "buy the dip", it doesn't matter what the underlying value of the stock is. It's going up.

Not to say that what you're saying doesn't apply. It probably exacerbates the problem, and is the main mechanism behind market bubbles. But there are multiple examples of a very public stocks going up or getting a large amount of attention, and then completely unrelated companies with plausible sounding tickers also shooting up in tandem.

This only makes any sense in the world where the market is driven by fools eager to loose all their money or more.

Comment by factorialcode on We've built Connected Papers - a visual tool for researchers to find and explore academic papers · 2020-06-09T01:25:43.466Z · score: 9 (5 votes) · LW · GW

Alright, I've only played with this a bit, but I'm already finding interesting papers from years past that I've missed. I'm just taking old papers I've found notable and throwing them in and finding new reading material.

My only complaint is that it feels like there's actually too little "entropy" in the set of papers that get generated they're almost too similar, I end up having to make several hops through the graph to find something truly eye catching. It might also just be that papers I consider notable are few and far between.

Comment by factorialcode on Consequentialism and Accidents · 2020-06-07T20:07:41.901Z · score: 1 (1 votes) · LW · GW

I think virtue ethics and the "policy consequentialism" I'm gesturing at are different moral frameworks that will under the right circumstances make the same prescriptions. As I understand it, one assigns moral worth to outcomes, and the actions it prescribes are determined updatelessly. Whereas the other assigns moral worth to specific policies/policy classes implemented by agents, without looking at the consequences of those policies.

Comment by factorialcode on Everyday Lessons from High-Dimensional Optimization · 2020-06-06T23:20:41.733Z · score: 6 (4 votes) · LW · GW

Epistemic status: Ramblings

I don't know how much you can really generalise these lessons. For instance, when you say:

How much slower is e-coli optimization compared to gradient descent? What’s the cost of experimenting with random directions, rather than going in the “best” direction? Well, imagine an inclined plane in n dimensions. There’s exactly one “downhill” direction (the gradient). The n-1 directions perpendicular to the gradient don’t go downhill at all; they’re all just flat. If we take a one-unit step along each of these directions, one after another, then we’ll take n steps, but only 1 step will be downhill. In other words, only ~O(1/n) of our travel-effort is useful; the rest is wasted.

In a two-dimensional space, that means ~50% of effort is wasted. In three dimensions, 70%. In a thousand-dimensional space, ~99.9% of effort is wasted.

This is true, but if I go in a spherically random direction, then if my step size is small enough, ~50% of my efforts will be rewarded, regardless of the dimensionality of the space.

How best to go about optimisation depends on the cost of carrying out optimisation, the structure of the landscape, and the relationship between the utility and the quality of the final solution.

Blind guess and check is sometimes a perfectly valid method when you don't need to find a very good solution, and you can't make useful assumptions about the structure of the set, even if the carnality of the possible solution set is massive.

I often don't even think "optimisation" and "dimensionality" are really natural ways of thinking about solving may real world engineering problems. There's definitely an optimisation component to engineering process, but it's often not central. Depending on circumstances, it can make more sense to think of engineering as "satisficing" vs "optimising". Essentially, you're trying to find a solution instead of the best solution, and the process used to solve the problem is going to look vastly different in one case vs another. This is similar to the notions of "goal directed agency" vs "utility maximisation".

In many cases when engineering, you're taking a problem and coming up with possible high level breakdowns of the problem. In the example of bridges, this could be deciding weather to use a cantilever bridge or a suspension bridge or something else entirely. From there, you solve the related sub-problems that have been created by the breakdown, until you've sufficiently fleshed out a solution that looks actionable.

The way you go about this depends on your optimisation budget. In increasing order of costs:

-You might go with the first solution that looks like it will work.

-You'll recursively do a sort of heuristic optimisation at each level, decide on a solution, and move to the next level

-You'll flesh out multiple different high level solutions and compare them.

. . .

-You search the entire space of possible solutions

This is where the whole "slack" thing and getting stuck in local optima comes back, even in high dimensional spaces. In many cases, you're often "committed" to a subset of the solution space. This could be because you've decided to design a cantilever bridge instead of a suspension bridge. It could also be because you need to follow a design you know will work, and X is the only design your predecessors have implemented IRL that has been sufficiently vetted. (This is especially common in aircraft design, as the margins of error are very tight) It could even be because you're comrades have all opted to go with a certain component, and so that component benefits from economies of scale and becomes the best choice even if another component would be objectively better we're it to be mass produced.(I'll leave it as an exercise to the reader to think of analogous problems in software engineering)

In all cases, you are forced to optimise within a subset of the search space. If you have the necessary slack, you can afford to explore the other parts of the search space to find better optima.

Comment by factorialcode on Consequentialism and Accidents · 2020-06-06T21:49:25.358Z · score: 4 (3 votes) · LW · GW

I don't know about criticism, but the problem disappears once you start taking into account counterfactuals and the expected impact/utility of actions. Assuming the killer is in any way competent, then in expectation the killers actions are a net negative, because when you integrate over all possible worlds, his actions tend to get people killed, even if that's not how things turned out in this world. Likewise, the person who knowingly and voluntarily saves lives is going to generally succeed in expectation. Thus the person who willingly saves lives is acting more "moral" regardless of how things actually turn out.

This gets more murky when agents are anti-rational, and act in opposition to their preferences, even in expectation.

Comment by factorialcode on Inaccessible information · 2020-06-06T17:36:23.199Z · score: 1 (1 votes) · LW · GW

It seems to me that you could get around this problem by training a model *() that takes M(x) and outputs M's beliefs about inaccessible statements about x after seeing x as input. You could train *() by generating latent information y and then using that information y to generate x. From there, compute M(x) and minimize the loss L(*(M(x)),y). If you do this for a sufficiently broad set of (x,y) pairs, you might have the ability to extract arbitrary information from M's beliefs. It might also be possible for *() to gain access to information that M "knows" in the sense that it has all the relevant information, but is still inaccessible to M since M lacks the logical machinery to put together that information.

This is similar to HS english multiple choice questions, where the reader must infer something about the text they just read. It's also similar to experiments where neuroscience researchers train a model to map brain cell activity in animals to what an animal sees.

Comment by factorialcode on Reexamining The Dark Arts · 2020-06-02T07:57:02.408Z · score: 1 (1 votes) · LW · GW

If everyone made sure their arguments looked visually pleasing, would that be sustainable? Yes, in fact the world would look more beautiful so it's totally allowed.

Here's a frequent problem with using the dark arts, they very frequently have higher order effects that hurt the user and the target in ways that are difficult to immediately foresee.

In the above proposal, there are frequently times when the most effective method of communication is to be blunt, or one argument is going to inherently be more ascetically pleasing than another. In these circumstances, if you start optimizing for making arguments pretty, then you will very likely sacrificing accuracy or effectiveness. Do this too much and your map starts to disconnect from the territory. From there it becomes easy to start taking actions that look correct according to your map, but are in fact suboptimal or outright detrimental.

Comment by factorialcode on GPT-3: a disappointing paper · 2020-05-31T07:19:08.632Z · score: 15 (10 votes) · LW · GW

When you boil it all down, Nostalgebraist is basically Reviewer #3.

That your response is to feign Socratic ignorance and sealion me here, disgenuously asking, 'gosh, I just don't know, gwern, what does this paper show other than a mix of SOTA and non-SOTA performance, I am but a humble ML practitioner plying my usual craft of training and finetuning', shows what extreme bad faith you are arguing in, and it is, sir, bullshit and I will have none of it.

Unless I'm missing some context in previous discussions, this strikes me as extremely antagonistic, uncharitable, and uncalled for. This pattern matches to the kind of shit I would expect to see on the political side of reddit, not LW.

Strongly downvoted.

Comment by factorialcode on Signaling: Why People Have Conversations · 2020-05-20T05:45:45.692Z · score: 1 (1 votes) · LW · GW

I think people information trading, and coordinating are good reasons for why humans evolved language, but I think that signalling gives a stronger explanation for why "casual" conversations happen so often.

That sounds reasonable. I still think there's more going on in casual conversation than signalling, as evidenced by signalling in conversation getting called out as "bragging" or "humble bragging" or "flexing", indicating that people would like you to do less signalling and more of whatever else casual conversation is used for.

Why do you think the signalling interpretation doesn't fully explain why relevance is necessary?

I think it the best argument against signalling fully explaining relevance is that there are situations where signalling is pointless or impossible, this happens between people who know each other very well as any attempt to signal in those cases would either be pointless or immediately called out. However, relevance is almost a universal property of all conversation and the norm rarely if ever breaks down. (Unless you're dealing with people who are really high, but I would explain this as a consequence of these people no longer being able to keep track of context even if they wanted to.)

Comment by factorialcode on Signaling: Why People Have Conversations · 2020-05-20T04:26:24.519Z · score: 1 (1 votes) · LW · GW

I think that this is one reason why people have conversations, but there are many others.

Things like:

-Information trading

-Coordinating

-Manipulating

-etc...

However, I like that you pointed out that relevance is important in conversations. That's something I find myself taking for granted but is actually kind of weird when you think about it. I don't think signalling fully explains why relevance is necessary. I'll put forth an alternative hypothesis:

I think conversations having a requirement for being relevant is a consequence of language being an efficient code for communicating information that also efficiently uses and changes working memory. When you talk about something, you often need a good deal of background context to get these ideas across. This can manifest itself at the lowest levels of abstraction as the need to decipher homonyms and homophones, and at the highest level when understanding why someone would want to "blow up the plane". Keeping track of this context eats up working memory. If you switch topics frequently, you'll either have to keep track of multiple smaller contexts, or waste your time wiping and repopulating working memory after every context switch. However, by tackling one topic at a time and moving smoothly between related topics, all working memory can be devoted to keeping track of the conversation, and you only need to partially recontextualize infrequently.

Comment by factorialcode on Movable Housing for Scalable Cities · 2020-05-16T16:21:23.254Z · score: 12 (7 votes) · LW · GW

Moving companies already make the friction of moving from one location to another pretty low. I feel like having to move an entire house would make this far more complicated, and raise the cost by at least 1-2 orders of magnitude, even if the house was designed to do that.

However, the biggest issues with this proposal is that for the houses to not look like jawa sand crawlers, the city would have to provide some sort of static infrastructure. This could anything from plumbing, concrete foundations, roads, electricity, or even just a piece of paper that says you are allowed to park there. In all cases, you haven't actually gotten rid of the problem. The thing that becomes exorbitantly expensive and extracts rent from you now is just a plot of land now instead of a house.

Also note that this is literally the business model of a trailer park. Understanding why everyone except the poorest people prefer homes or apartments to those would probably be enlightening.

Comment by factorialcode on Why do you (not) use a pseudonym on LessWrong? · 2020-05-07T23:10:35.695Z · score: 7 (5 votes) · LW · GW

When I made this account, that was just what you did as part of the online culture. You picked a cool username and built a persona around it.

Now it's just basic OpSec to never associate anything with your real name unless it makes you look good and you can take it down later when the cultural tides change and that stops being true. I have several pseudonyms, one or more for each online community I participate in. This makes it far harder for people to tie together bits of information that they could use against me.

Comment by factorialcode on Maths writer/cowritter needed: how you can't distinguish early exponential from early sigmoid · 2020-05-06T22:07:29.806Z · score: 4 (2 votes) · LW · GW

Isn't this really straight forward? I'm pretty sure ln(e^x) and ln(sigma(x)) only differ by about e^x + O(e^(2x)) when x < 0. You can't tell apart 2 curves that basically make the same predictions.

Comment by factorialcode on Individual Rationality Needn't Generalize to Rational Consensus · 2020-05-05T21:05:28.688Z · score: 9 (6 votes) · LW · GW

the "ideal" way to aggregate values is by linear combination of utility functions

This is not obvious to me. Can you elaborate?

Comment by factorialcode on Individual Rationality Needn't Generalize to Rational Consensus · 2020-05-05T06:24:16.381Z · score: 3 (2 votes) · LW · GW

Explicit voting isn't even necessary for this effect to show up. This is an explanation of a notable effect wherein a group of people appear to hold logically inconsistent beliefs from the perspective of outsiders.

Examples: -My (political out-group) believes X and ~X -(Subreddit) holds inconsistent beliefs

Comment by factorialcode on Negative Feedback and Simulacra · 2020-04-29T03:04:59.118Z · score: 4 (3 votes) · LW · GW

“How do I find those?” you might ask. I don’t know.

I wonder if a better strategy is not to answer the question directly but to answer the question, "How can I reliably signal that that I'm operating on level 1?"

Comment by factorialcode on The Best Virtual Worlds for "Hanging Out" · 2020-04-27T22:41:52.384Z · score: 3 (2 votes) · LW · GW

Unlike the previous options it doesn't have "proximity chat". It works better if you're interacting with a smallish group of people, who can all hear each other and participate in a single conversation.

Fortunately, Minecraft also has an excellent modding community:

Proximal chat: https://github.com/magneticflux-/fabric-mumblelink-mod

VR Support: http://www.vivecraft.org/

Comment by factorialcode on Solar system colonisation might not be driven by economics · 2020-04-22T14:50:10.448Z · score: 3 (2 votes) · LW · GW

I agree with the thrust of this article, but I think it will still look a lot like an economics driven expansion.

One of the things they teach in mining engineering is the notion of the "social license to operate". Essentially, everyone, from your local government, to the UN, to the nearby residents need to sign off on whatever it is that you're doing. For often quite legitimate reasons, mining has acquired a reputation as potentially environmentally disastrous. As a result, you need to effectively bribe the local residents. This is easy to do when the locals are poor third worlders who make a few dollars a day. However, the world will develop and more people will be lifted out of poverty and become more environmentally conscious, as a result the price of these licences will shoot up dramatically.

One of the greatest advantages of space are that there are no environmentalists or natives in space and the ones on earth can't muster the political will to stop you because environmental costs are much smaller and externalized. Once it becomes cheaper to blast off and mine in space than to wade through years of paperwork, you'll see immediate economic expansion into space.

Comment by factorialcode on Reflections on Arguing about Politics · 2020-04-14T01:12:23.115Z · score: 3 (2 votes) · LW · GW

Also keep in mind that it's entirely possible for both of you to agree on all of the facts of a situation, but if you have different values, preferences, or utility functions, you can still disagree on policy.

Comment by factorialcode on An Orthodox Case Against Utility Functions · 2020-04-07T23:50:24.831Z · score: 3 (2 votes) · LW · GW

suppose I'm a general trying to maximize my side's chance of winning a war. Can I evaluate the probability that we win, given all of the information available to me? No - fully accounting for every little piece of info I have is way beyond my computational capabilities. Even reasoning through an entire end-to-end plan for winning takes far more effort than I usually make for day-to-day decisions. Yet I can say that some actions are likely to increase our chances of victory, and I can prioritize actions which are more likely to increase our chances of victory by a larger amount.

So, when and why are we able to get away with doing that?

AFAICT, the formalisms of agents that I'm aware of (Bayesian inference, AIXI etc.) set things up by supposing logical omniscience and that the true world generating our hypotheses is in the set of hypotheses and from there you can show that the agent will maximise expected utilty, or not get dutch booked or whatever. But humans, and ML algorithms for that matter, don't do that, we're able to get "good enough" results even when we know our models are wrong and don't capture a good deal of the underlying process generating our observations. Furthermore, it seems that empirically, the more expressive the model class we use, and the more compute thrown at the problem, the better these bounded inference algorithms work. I haven't found a good explanation of why this is the case beyond hand wavy "we approach logical omniscience as compute goes to infinity and our hypothesis space grows to encompass all computable hypotheses, so eventually our approximation should work like the ideal Bayesian one".

Comment by factorialcode on Conflict vs. mistake in non-zero-sum games · 2020-04-06T06:50:13.056Z · score: 3 (3 votes) · LW · GW

I feel like that strategy is unsustainable in the long term. Eventually the cost of the search will get more and more expensive as the lower hanging fruit get picked.

Comment by factorialcode on Conflict vs. mistake in non-zero-sum games · 2020-04-06T05:44:20.192Z · score: 4 (4 votes) · LW · GW

So what happens to mistake theorists once they make it to the Pareto frontier?

Comment by factorialcode on Partying over Internet: Technological Aspects · 2020-04-05T18:39:08.099Z · score: 3 (2 votes) · LW · GW

It has a high barrier to entry, but I I think VRchat and software like it is making a lot of progress towards solving the problems you bring up. Especially with body, eye, and facial expression tracking. Obviously, you still can't do things that involve physical touch, but keeping track of who's looking at who and making eye contact is possible. It also lets you make other gestures with your "hands" and arms to communicate. There's also some work going on to make it possible to play games in a social VR setting.

Here are some examples of this:

https://www.youtube.com/watch?v=VahuChwc_O8

https://www.reddit.com/r/VRchat/comments/fmj4xe/media_sent_this_to_my_worried_mother_who_watches/

https://preview.redd.it/ij7icgsvx3m31.png?width=768&auto=webp&s=8fff2f8b992552e14e30713a4a0458472f6861b8

https://www.youtube.com/watch?v=dVcU6i2k9-M

Comment by factorialcode on Taking Initial Viral Load Seriously · 2020-04-05T17:17:47.954Z · score: 6 (4 votes) · LW · GW

No he didn't. The idea and terminology has been bouncing around the rat-sphere a bit earlier than that.