Posts

Comments

Comment by etymologik on Welcome to Less Wrong! (9th thread, May 2016) · 2016-05-24T21:08:23.612Z · LW · GW

Thanks much for the link to the Secretary Problem solution. That's will serve perfectly. Even if I don't know the total number of houses that will be candidates for serious consideration, I do know there's an average, which is (IIRC) six houses visited before a purchase.

As for cheating ... what I mean by that is deluding myself about some aspects of the property I'm looking at so that I believe "this is the one" and make an offer just to stop the emotional turmoil of changing homes and spending a zillion dollars that I don't happen to possess. "Home sweet home" and "escape the evil debt trap" are memes at war in my head, and I will do things like hallucinate room dimensions that accommodate my furniture rather than admit to myself that an otherwise workable floor plan in a newly gutted and beautifully renovated yet affordable home is too dang small and located in a declining neighborhood. I take a tape measure and grid paper with me to balk the room size cheat. But I also refer to the table, which requires me to check for FEMA flood zone location. This particular candidate home was in a FEMA 100-year flood zone, and the then-undeveloped site had in fact been flooded in 1952. That fact was enough to snap me out of my delusion. At that point the condition of the neighboring homes became salient.

The extent to which self-delusion and trickery are entwined in everyday thought is terribly disheartening, if you want to know the truth.

On weighting my functional criteria based on dollars, the real estate market has worked out a marvelous short circuit for rationality. Houses are no longer assessed for value based on an individual home's actual functional specifications. The quantity of land matters (so yard size matters to price). Otherwise, overwhelmingly, residential properties are valued for sale and for mortgages based on recent sales of "comparable" homes. "Comparable" means "the same square footage & same number of bedrooms and baths within half of mile of your candidate home." The two homes can otherwise be completely dissimilar, but will nevertheless be considered "comparable". No amount of improvements to the house or yard will change the most recent sale price of the other homes in the neighborhood. What this means is that sales prices are just for generic shelter plus the land, where the land is most of the value and neighborhood location is most of the land value. So the price of the home you're looking at is not really very closely tied to anything you might value about the home. This makes it very difficult to come up with a reasonable market price for, say, an indoor laundry versus laundry facilities in the garage. It's certainly beyond my meager capacity to calibrate the value of home amenities based on dollars.

I'm told it wasn't this way in the 1950s, but given the history of land scams in the U.S., which go all the way back to colonial land scams in Virginia, I have my doubts that prices for real estate were ever rational.

But I'll try to find something for weights. Back to the drawing board. And thanks for your help.

Comment by etymologik on Welcome to Less Wrong! (9th thread, May 2016) · 2016-05-24T05:54:42.802Z · LW · GW

In the home purchase decision use case, I'm currently working with a "utility table" where the columns list serious home purchase candidates, and one column is reserved for my current home as a baseline. (The theory there is I know what my current home feels like, so I can map abstract attribute scores to a tangible example. Also, if a candidate new home fails to score better overall than my current home, there's no sense in moving.)

The rows in the utility table list various functions or services that a home with its land might perform and various attributes related to those functions. Examples:

  • Number of bedrooms (related to family size & home uses like office, library)
  • Floor plan features (count number of desirable features from list - walk-in closet in MBR, indoor laundry, etc)
  • Interior steps (related to wheelchair friendliness)
  • Exterior steps (ditto)
  • Roof shape (related to leak risk/water damage & mold repair costs, also roof replacement frequency)
  • Exterior material (related to termite risk/repair costs, earthquake risk)
  • Size of yard (related to maintenance costs, uses such as entertaining, horses, home business)
  • Slope, elevation, sun exposure, wind exposure factors
  • Social location factors (distance to work, distance to favorite grocery store, walkable to public transit, etc)
  • Wildfire risk zone
  • FEMA flood risk zone
  • Number of access/evacuation routes
  • Price
  • Square footage
  • Cost per square foot
  • Total housing cost per month
  • Etc

Each of these gets a set of possible values defined, and the possible values are then ranked from 1 to n, where 1 is less desirable and n is more desirable. A rank of 0 is assigned to outright aversive conditions such as being located in a high wildfire risk zone or in a FEMA 100-year flood zone or a multi-story home (your rankings will vary). I then normalize the rank scores for each row to a value between zero and 1.

One squirrelly feature of my system is that some of the row score ranks are not predefined but dynamic. By that I mean that the actual base value before scoring -- such as the price of the house -- is rank ordered across all the columns rather than placed in a standard price interval that is given a rank relative to other price intervals. This means that the ranks assigned to each of the home candidates can change when a new home is added to the table. (And yes, I have to renormalize when this happens, because n increments by 1.)

Then I sum up the scores for each candidate, plus my baseline existing home, and see which one wins.

It all sounds logical enough, but unfortunately it's not as easy as it sounds. It's hard to optimize across all possible home choices in practice, because candidate homes have a basically random arrival rate on the market and they can't all be compared at once. You can't even wait to make pairwise comparisons, because -- at least in Southern California -- any reasonably affordable, acceptable home is likely to be snapped up for cash by an investor or flipper within days of coming on the market unless you make an offer first, right then.

Another problem with the serial arrival rate of candidate homes is that you can get fixated on the first house you see on Zillow or the first house your real estate agent trots out for you to visit. I've got hacks outside of the utility table (as I'm calling it) for getting around that tendency, but I want the utility table to work as tool for preventing fixation as well.

Just trying to create a utility table has helped me tremendously with figuring out what I want and don't want in a home. That exercise, when combined with looking at real homes, also taught me that most things I thought were absolutes on paper actually were not so when I got to looking at real houses and real yards and real neighborhoods and experiencing what the tradeoffs felt like to occupy. The combination of experience and analysis has been a good tool for updating my perceptions as well as my utility table. Which is why I think this might be a useful tool: it gives me method of recording past experience for use in making rapid but accurate judgments on subsequent, serially presented, one-time-opportunities to make a home purchase.

But I've also had a lot of trouble making the scoring work. First I tried to weight each row by how important I considered it to be, but that made things too easy to cheat on.

Then I tried to weight rows by probability of experiencing the lifestyle function or cost or risk involved. For example, I sleep every night and don't travel much, so a functional bedroom matters basically 99.9% of the time. The risk of wildfire, on the other hand, is lower, but how do I calculate it? This county averages about 8 major fires a year -- but what base to I use to convert that to a percentage? Divide 365 days per year into 8 fire events per year, or into 8 times the average days duration of the fire? Or should I count the number of homes burned per year as a percentage of all homes in the county? These latter statistics are not easily obtained, unlike the count of fires per year. Plus, I plan to live here 30 years, and here has more fires than elsewhere, while sleeping probability is unaffected by location. How do I account for that? And then there's the fact that only one fire would be catastrophic and possibly life-ending, while one night of bad sleep can be shrugged off. In the end, I couldn't find any combination of probabilities and costs that was commensurable across all the rows and not subject to cheating.

I also tried giving negative numbers to aversive situations like high wildfire risk and FEMA flood zones, but all that did was make an always crummy but safe house and an always spectacular but occasionally life-threatening house look exactly the same. This just didn't feel right to me.

So I ended up just taking the total unweighted but normalized rank scores for each house, and supplementing that with a separate count of the negatives. That gives me two ways to score the candidate homes, and if the same house wins on both measures, I consider that an indicator of reliability.

By keeping score on all the homes I seriously consider making an offer on, I think I can make a pretty good serial judgment on the current candidate even if I can't optimize concurrently. Or so I believe.

Is this a reasonable approach? I doubt there's anything Bayesian whatsoever about it, but I really don't care as long as the method is reasonable and doesn't have any obvious self-deception in it.

Comment by etymologik on Welcome to Less Wrong! (9th thread, May 2016) · 2016-05-20T18:22:54.812Z · LW · GW

Thanks for that explanation of utility functions, gjm, and thanks to protostar for asking the question. I've been struggling with the same issue, and nothing I've read seems to hold up when I try to apply it to a concrete use case.

What do you think about trying to build a utility TABLE for major, point-in-time life decisions, though, like buying a home or choosing a spouse?

P.S. I'd upvote your response to protostar, but I can't seem to make that happen.

Comment by etymologik on The Best Textbooks on Every Subject · 2011-01-31T07:49:48.105Z · LW · GW

I would also like to recommend two superb encyclopedia-style works on linguistics:

(1) "The Cambridge Encyclopedia of Language", by David Crystal

(2) "The Cambridge Encyclopedia of the English Language," by David Crystal

Both are characterized by lot of short articles, sidebars, pictures, cartoons, and examples of texts to the point at hand. I read them both cover to cover, and have refered to them again and again when beginning to explore a new topic in the field.

Comment by etymologik on The Best Textbooks on Every Subject · 2011-01-31T07:38:09.826Z · LW · GW

Recommended for LINGUISTICS: "Contemporary Linguistics", by William O'Grady, John Archibald, Mark Aronoff, & Janie Rees-Miller. Truly comprehensive, addressing ALL the areas of interesting work in linguistics -- phonetics, phonology, morphology, syntax, semantics, historical linguistics, comparative linguistics & language universals, sign languages, language acquisition and development, second language acquisition, psycholinguistics, neurolinguistics, sociolinguistics & discourse analysis, written vs spoken language, animal communication, & computational/corpus linguistics. Each chapter is sharp & targetted; you will really know what you want to read next after studying this text.

NOT recommended: "Linguistics: An Introduction to Linguistic Theory", edited by Victoria A. Fromkin & authored by Bruce Hayes, Susan Curtiss, Anna Szabolcsi, Tim Stowell, Edward Stabler, Dominique Sportiche, Hilda Koopman, Patricia Keating, Pamela Munro, Nina Hyams, & Donca Steriade. This text provides a solid guide to generative phonology, generative syntax, and formal semantics -- but only in their mainstream (aka Chomskian) formulations, and with no reference to actual language use (which, for theoretical reasons, is anathema to the Chomskian crowd). Interestingly, at least 8 of the authors I recognize as faculty from UCLA, which makes the text a bit ingrown for my taste.

NOT recommended: "Syntax: A Generative Introduction", by Andrew Carnie. First problem: This book covers syntax and only syntax, and does so solely from a generative perspective. Second problem: Although Carnie is a reknowned expert in Irish Gaelic syntax and doubtless knows his stuff, he can't write a clear expository textbook to save his soul. This is the most confusing book on linguistics that I've ever read.