Posts

Comments

Comment by Mark_Neznansky on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2024-03-15T17:41:30.706Z · LW · GW

Regarding the bar charts. Understanding that 100 nokens were sampled at each radius and supposing that at least some of the output was mutually exclusive, how come both themes, "group membership" and "group nonmembership" have full bars on the low radii?

Comment by Mark_Neznansky on Phallocentricity in GPT-J's bizarre stratified ontology · 2024-03-15T13:23:39.492Z · LW · GW

It might be astonishing, but this is fundamentally how word embedding works, by modelling the co-distribution of words/ expressions. You know the "nudge, nudge, you know what I mean" Python sketch? Try appending "if you know what I mean" to the end of random sentences.

Comment by Mark_Neznansky on A vote against spaced repetition · 2014-05-14T18:12:35.387Z · LW · GW

Funny. I've used triumphant LoTR music once to overcome my terrible fear of heights. I was climbing mount Kathadin with friends (including passing along "Knife Edge "), and the humming/singing out loud this music (+imagining a chopper-camera shooting from above) has completely effaced my fear. Possibly being called "Legolas" during middle-school and high-school helped, too.

Comment by Mark_Neznansky on A vote against spaced repetition · 2014-05-14T16:59:08.944Z · LW · GW

It was to be expected-- Someone had already created a "hierarchy Tags" addon: https://ankiweb.net/shared/info/1089921461

I haven't used it myself, but a comment there said "Simple, nice, and easy."

Comment by Mark_Neznansky on A vote against spaced repetition · 2014-05-14T15:25:36.694Z · LW · GW

This is an idea I had only toyed with but have yet to try in practice, but one can create meta-cards for non-data learning. Instead of creating cards that demand an answer, create cards that demand a drill, or a drill with a specific success outcome. I find it a bit hard to find "the best example" for this, perhaps because the spectrum of learnable-skills is so broad, but just for the sake of illustration: if you're learning to paint, you can have "draw a still object", "draw a portrait", "practice color", "practice right composition", "practice perspective" &c, cards. After you finish your card-prompted drill, you move to the next card. Or if you're practicing going pro at a game (with existing computer program AIs), you can have "Play AI X in a game situation S and achieve A", "Practice game opening against AI until (able to reach a certain state)", "practice a disadvantaged end-game situation against AI and bring the game to a draw", and so on, cards. Of course reviewing the cards would take longer, but they are only meant to be used as scaffolding to harness the Anki spacing algorithm. The numeric parameters of the algorithm might need an adjustment (which is easy to do in Anki) for that, but I think that qualitatively it should work, at least for specific skills. Of course, this set-up, especially if it needs a major parametric-overhauling[1], is an investment, but every human breakthrough required its avant-gardians.

[1] Which is not granted: perhaps the algorithm is only problematic at the beginning of the "learning", being too frequent, in which case you can just "cheat" carefully and "pass" every other review for a while, which is not a major disturbance. Or, on the contrary, perhaps "well learned cards" (interval > 3 months, or even 1 month, for example) should be discarded for more challenging ones (i.e, "beat the expert AI" replacing "beat beginner AI", or "juggle 5 balls while riding a unicycle on a mid-air rope" replacing "juggle 4 balls"), which is even less of a problem, as you should immediately recognize well-learned skills (i.e. "practice counting up to 20").

Comment by Mark_Neznansky on A vote against spaced repetition · 2014-05-14T14:48:21.430Z · LW · GW

This is not quite a "tech-tree" dependency structure, but you can use tags to stratify your cards and always review them in sequence from basic to dependent (i.e., first clear out the "basic" cards, then "intermediate", then "expert"). Even if the grouping is arbitrary, I think you can go a long way with it. If your data is expected to be very large and/or have a predictable structure, you can always go for a "multiple-pyramid" structure, i.e, have "fruits basic" < "fruits advanced" < "fruits expert", "veggies basics" < "veggies pro" tags &c, and perhaps even have an "edibles advanced" > veggies & fruits tag for very dependent cards.

On the assumption that the Anki algorithm works, just "reviewing down" to an empty deck every tag and proceeding thus sequentially from tag to tag, I think this would work too. Even if it so happened that by one Sunday you forgot "What is an American president" (basic) fact, it might still be profitable to rehearse that day the "Washington was the first president" card, despite the "20 rules" mentioned somewhere above. Presumably, if you had forgotten what a president is, the appropriate card is probably going to appear for review in the next few days, and so with a consistent (or even a semi-consistent) use of Anki, it would probably turn alright. This is more for the anecdotal sake, but this reminds me a time when I burst out laughing out loud while at the dictionary. I was reading at the time "Three Men in a Boat", and there was one sentence in which I didn't know 2-3 of the words; the punchline clicked as I read the definition of the last of them.

Either way, somewhere higher on this commenting thread, I have also thought about the possibility (or rather, lack of) of creating dependencies in Anki. I'm actually thinking of creating an add-on/plugin to enable that--- I'm learning Python these days (on which Anki runs), and I'm just about to start grad school (if I get admitted), so it seems like just the right time to make this (possibly major) meta-learning investment.*

* Not to mention that, since I'm learning Python, it's also a (non-meta) learning investment. Win-win.

Comment by Mark_Neznansky on A vote against spaced repetition · 2014-05-14T14:00:44.693Z · LW · GW

Just to comment on the last bit: It seems odd to me that you stress the "3 weeks BARE minimum" and the "crossing point at 3 to 6 months" as a con, while you have used SRS for three years. Given that SRS is used for retention, and assuming that 6 months is the "crossing point", one would think that after three years of consistent SRS use you'd reap a very nice yield.

I know it's a metaphoric language, but it seems additionally ironic that the "BARE minimum" you stress equals to your frequency of exams, while you disfavor the cloze deletion's tendency to teach "guessing the teacher's password".

Is the advice perhaps against using SRS to learn/cram complex knowledge under a very limited time?

Comment by Mark_Neznansky on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2014-04-21T20:42:50.577Z · LW · GW

Thank you.

Comment by Mark_Neznansky on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2014-04-20T21:10:29.748Z · LW · GW

Being new to this whole area, I can't say I have preference for anything, and I cannot imagine how any programming paradigm is related to its capabilities and potential. Where I stand I rather be given a (paradigmatic, if you will) direction, rather than recommended a specific programming language given a programming paradigm of choice. But as I understand, what you say is that if one opts for going for Haskell, he'd be better off going for F# instead?

Comment by Mark_Neznansky on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2014-04-19T23:33:48.270Z · LW · GW

I was thinking in a similar direction. From a biological perspective, computation seems to be a costly activity --- if you just think of the metabolic demand the brain puts on the human being. I assumed that it is very different with computer, however. I thought that the main cost of computation for computers, nowadays, is in size, rather than energy. I might be wrong, but I assumed that even with laptops the monitor is a significant battery drainer in comparison to the actual computer. (sorry, mainly thinking out loud. I better read this and related posts more carefully. I'm glad to see the restriction on computations per amount of time, which I thought was unbounded here).

Comment by Mark_Neznansky on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2014-04-19T23:25:29.422Z · LW · GW

PS.

I. Probably doesn't add much to the consideration of language of choice, but I thought I might as well as add it: In my conceptualization of the game, the constitution of each agent is more than the "behavioral sheet" --- there are properties of several types that constitute an interface with the environment, and affect the way the agent comes into interaction with other individuals and the environment at large (mainly the former).

II. I'm speaking here of learning programming languages as if it was as easy as buying eggs at the corner store, but I wanted to mention that during my browsing Haskell did come across my attention (I actually think I've seen the name on LW before, a long time ago, which brought further attention to it), and it did seem to be a language worthwhile for me to learn, and now the existence of the Botworld seems like further evidence that it is suited to one of my current main directions of inquiry with programming --- though I wonder if at this point, where I have little existing programming proficiency, it wouldn't be better to learn another one that might be better suited to my task at hand?

Comment by Mark_Neznansky on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2014-04-19T23:00:10.542Z · LW · GW

Hey,

Sounds very cool, promising and enticing. I do have a technical question for you (or anybody else, naturally).

I was wondering how "intentional" the choice of Haskell was? Was it chosen mainly because it seemed the best fitting programming language out of all familiar ones, or due to existing knowledge/proficiency at it at the time of formulation of the bot-world idea? How did cost/utility come into play here?

My inquiry is for purely practical, not theoretical purposes--- I’m looking for an advice. In the summer two years ago I was reading as much as I could about topics related to evolutionary psychology and behavioral ecology. During the same period, I was also working with my physics professor, modeling particle systems using Wolfram Mathematica. I think it was this concurrence that engendered in me the idea of programming a similar to yours, yet different, “game of life” program.

Back at the time programming things in AutoHotkey and in Mathematica was as far as my programming went. Later that year I took a terribly basic python course (that was concerned mainly with natural language processing), and that was about it. However, in the last couple of weeks I returned to python, this time taking the studying of it seriously. It brought back the idea of the life game of mine, but this time I feel like I can acquire the skills to execute the plan. I’m currently experiencing a sort of honeymoon period of excitement with programming, and I expect the following few months, at least, to be rather obligation-free for me and an opportune time to learn new programming languages.

I’ve read the above post only briefly (mainly due to restrictions of time--- I plan to read it and related posts soon), but it seems to me that our motivations and intentions with our respective games (mine being the currently non-existing one) are different, though there are similarities as well. I’m mainly interested in the (partially random) evolution/emergence of signaling/meaning/language/cooperation between agents. I’ve envisioned a grid-like game with agents that are “containers” of properties. That is, unlike Conway’s game where the progression of the game is determined purely on the on-the-grid mechanics, but like yours (as I understand it), where an individual agent is linked to an “instruction sheet” that lies outside the grid. I think what differentiates my game from yours (and excuse me for any misunderstandings), is the “place” where the Cartesian barrier is placed. [1] While in yours there’s the presence of a completely outside “god” (and a point that I had missed is whether the “player” writes a meta-language at t=0 that dictates how the robot-brain that issues commands is modified and then the game is let to propagate itself, or whether the player has a finer turn-by-turn control), in mine the god had simply created the primordial soup and then stands watching. Mine is more like a toy, perhaps, as there is no goal whatsoever (the existential version?). If to go with the Cartesian analogy, it’s as if every agent in my game contains an array of pineal glands of different indices, each one mapped to a certain behavior (of the agent), and to certain rules regarding how the gland interacts with other glands in the same agent. One of the “core” rules of the game is the way these glands are inherited by future agents from past agents.

What I had foreseen two years ago as the main obstacle to my programming of it remains my current concern today, after I had acquired some familiarity with python. I want the behavior-building-blocks (to which “the glands” of the agent are mapped to) to be as (conceptually) “reduced” as possible –– that is, that the complex behavior of the agents would be a phenomenon emerging from the complexity of interaction between the simple behaviors/commands –– and to be as mutable as possible. As far as I can tell, python is not the best language for that.

While browsing for languages in Wikipedia, I came across LISP, which appealed to me since it (quoth Wikipedia) “treats everything as data” – functions and statements are cut from the same cloth, and it is further suggested there that it is well suited for metaprogramming. What do you (or anybody else in here) think? Also, quite apart from this pursuit, I have intentions to at least begin learning R. I suspect it won’t have much relevancy for the construction of this game (but perhaps for the analysis of actual instance of the game play), but if it somehow goes into the consideration of the main language of choice--- well, here you go.

Thank you very much for your time,

[1] My point here is mainly to underscore what seem to be possible differences between your game and mine so that you could – if you will – advise me better about the programming language of choice.

Comment by Mark_Neznansky on Rationality Quotes April 2014 · 2014-04-19T05:13:41.583Z · LW · GW

I can't comment on the size (so LW is growing?), but I have a tingling memory that long time ago (several years back) people did post LW quotes. Since LW doesn't exist that long I suppose it was the case in its inception. I can't say for sure, but actually Eugine's post seems to suggest that as well; otherwise it wouldn't have been "creeping into". Either way, should be easy to check. I do, too, think it is worthwhile to put LW quotes. I remember (I do!) reading those and being led to read the original articles whence they came.

Comment by Mark_Neznansky on How to Beat Procrastination · 2011-04-29T05:11:57.233Z · LW · GW

I suppose there were studies of placebo effect - which I haven't read - but just a thought: Could it be that placebo treatment induces the placebo effect not only by making the patients believe they perceive a positive effect, but by actually changing their behavior? Of course it depends on the treated problem, but placebo surely raises the patients' expectation of getting better and thus raises their motivation to help themselves (according to the procrastination equation).

Comment by Mark_Neznansky on How to Beat Procrastination · 2011-04-25T05:37:07.210Z · LW · GW

Do you know about any research that relates this to the "anti-" case of this? That is, how expectancy, "value", delay and impulsiveness affects evaluation of risk and potential future punishment and how it affects one's behavior under that evaluation?
I wonder how this can be applied to action one might perform that is shunned by society, such as crime. Perhaps it's basically the same case (we incorporate the risk and adverse effects to the value and expectancy), but it seems that there are two stages in such cases which make it more complex - there's the cost of doing the action, there's the expected reward (which has its own value, expectancy, etc...) of the action, and then there's the expected punishment exerted by society (which has its own expectancy - the probability of getting caught - value/loss of value, etc.). How does the temporal relations between the reward and the punishment affect the decision? The crime might have immediate benefit which means that it comes before the punishment (if get caught), or the crime might induce permanent change to the world which might be enjoyed after the punishment (if the culprit will be able to enjoy said change) so the reward comes after the punishment. Any thoughts/research about it? I used the example of crime, but this applies to any kind of action taken "against society" or anything that calls for expected counter-action from the surroundings. Dissidents, rebels and such can be inspected similarly.

Comment by Mark_Neznansky on Just Try It: Quantity Trumps Quality · 2011-04-10T06:00:35.268Z · LW · GW

I wish to expand on your conclusions and look for their limits. It might be more relevant to the "Go Try Things" post, but it being a kind of series of posts, I suppose it makes sense most to comment here.

So, data collection is good. But aside of getting one better at some area in which one tries to reach expertise or improvement, data collection is also good for discovering almost totally new facets of reality, territory that is outside the map's margins.

Data collection bring to light not only known unknowns, but unknown unknowns too. There's a risk involved, however. It seems that for the most part, the opportunity cost of researching unknown unknowns is greater than researching known unknowns: When practicing anything, the costs and possible benefits are pretty known. You know what you have to do to get better at playing an instrument, build better robots, programming or dancing tango. You also pretty much know what are the fruits of that labor (though perhaps not entirely, especially when it is many "quantum steps" away in terms of skill expertise).

On the other hand, when you consider whether to delve into some new unknown territory, you’re less familiar with the costs (you don’t know whether you’ll enjoy “uncovering data” or not, for example) as well as with the possible utility. Let's say some person A is invited to a salsa dancing party or class. He considers the idea but decides not to go. He thinks how It will obviously take a few hours which he could invest in more familiar activities that yield more utility than dancing; it will probably have some social costs involved, as in any new endeavor which is unfamiliar and especially one involves the moving of one’s body; even if he will enjoy it he doesn’t think he’ll have the time to invest on more such occasions, and he doesn’t think doing it once will be very useful, etc. etc. etc.

However, what if this person is unaware that salsa, if he were to try it out, will greatly benefit him? Elevate his spirit, exercise his body and provide some new kind of social interactions which will benefit him on non-dancing social occasions, and that if he decided to fully incorporate it in his life, it would provide excellent rest from his usual activity (say, his profession) and even benefit it in other ways?

So it must be benefitial to also collect data outside of the map, to explore new frontiers and horizons. But there must be a limit to this. The great many activities the world provides can probably fill a few life-times of human beings (or maybe not?). But either way, there must be some point where more exploration is actually adverse in its effects, if no activity is being engaged more than superficially. So how can one decide whether to embark on exploration or not?

Of course, there is meta-data available on activities. There is some text on the internet for probably most tried-out activities out there, friends share their experience with things they’ve done, movies and books tell us about activities unknown to us, and so on. But would such data actually help a person decide whether to engage in an activity or not, is it overwhelming enough to “change his mind” from not-doing the activity to doing it? My guess is not. Most people (as noted on “Hold off proposing solutions”) probably decide if they want to engage in the activity upon first hearing about the opportunity to engage in it, and, more than that, I suspect that their decision is based less upon the nature of the activity and more upon the nature of the “activists”, the people who are commonly engaged in that activity. Many activities produce some kind of culture around them, which hardy can be ignored. Since for an activity to exists it needs to be done, and if it is being done then someone must be doing it, so to imagine that activity one must imagine someone doing that activity, or imagine oneself as the kind of person who does that (of course, if it is taken more seriously, one can imagine the activity more “naturally”, ignoring the nature of other people who engage in that).

To actually decide whether to engage in some new activity, one needs to take the decision seriously. But then, to avoid such “paralysis analysis”, it would probably be easier just to start “doing it” instead of thinking about it (with the exception of activities with really high costs such as exploring the south pole or conceiving a child). But then again, there must be a limit to the amount of “new things” a person can do. Some people are likely (have high probability) to greatly beefit from exploration, while others are unlikely to benefit from it. How can one recognize which one she/he is?

What do you think?

Comment by Mark_Neznansky on Just Try It: Quantity Trumps Quality · 2011-04-09T08:11:01.921Z · LW · GW

Wouldn't doing that (instead of writing up the whole argument in a full text) make you feel as if you've already achieved the materialization of the idea, hence reducing your motivation to write it in the future (which might lead to never actually writing the text)?

Comment by Mark_Neznansky on Just Try It: Quantity Trumps Quality · 2011-04-09T05:57:18.381Z · LW · GW

A "class for fun" implies that grade shouldn't matter to the participants, so, allegedly, the two different grading schemes wouldn't affect the participants' behavior.

But things (such as motivation) change as a person who did pottery for fun at home, goes to do pottery for fun in a class, don't they?

Comment by Mark_Neznansky on Rational Reading: Thoughts On Prioritizing Books · 2011-04-06T00:57:44.551Z · LW · GW

Assuming you're familiar with both, which one do you think works better? RescueTime or ManicTime?

Comment by Mark_Neznansky on Spaced Repetition Database for the Mysterious Answers to Mysterious Questions Sequence · 2010-07-02T00:07:55.762Z · LW · GW

Anki also allows to tag cards, so instead of splitting your data bases to different decks, you can split them to different tags on a single deck. This way you can review them all together, as well as review specific tags if the need rises.

Comment by Mark_Neznansky on Positive Bias Test (C++ program) · 2009-06-07T21:31:29.937Z · LW · GW

So it's not really about the laws themselves (being "mindless" or "mind") as it's the context in which the guessing/researching is done. Guessing a a natural law known by a person in front of you is different than discovering it anew by yourself.

Comment by Mark_Neznansky on Positive Bias Test (C++ program) · 2009-05-20T14:44:13.635Z · LW · GW

What's the difference between one's mind laws and mindless "natural" laws?

Comment by Mark_Neznansky on Epistemic vs. Instrumental Rationality: Approximations · 2009-04-28T14:47:27.773Z · LW · GW

It seems to me you use wrong wording. In contrary to the epistemic rationalist, the instrumental rationalist does not "gain" any "utility" from changing his beliefs. He is gaining utility from changing his action. Since he can either prepare or not prepare for a meteoritic catastrophe and not "half prepare", I think the numbers you should choose are 0 and 1 and not 0 and 0.5. I'm not entirely sure what different numbers it will yield, but I think it's worth mentioning.

Comment by Mark_Neznansky on Epistemic vs. Instrumental Rationality: Approximations · 2009-04-28T14:27:15.012Z · LW · GW

I admit that I've learned about the KL divergence just now and through the wiki-link, and that my math in general is not so profound. But as it's not about calculation but about the reasoning behind the calculation, I suppose I can have my word:

The wiki-entry mentions that

Typically P represents the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.

So P here is 10^-18 and Q is either 0 or 0.5.

What your epistemic rationalist has done seems like falling pray to the bias of anchoring and adjusting. The use of mathematical equations just makes the anchoring mistake look more fomal; it's not less wrong in any way. So while the instrumental rationalist might have a reason to choose the arbitary figure of 1/2 (it makes his decisions be more simple, for example) the epistemic rationalist does not. If the epistemic rationalist is shown the two figures of 0 and 1/2 and is asked what approximation is "better" he would probably say 0. And that's for several reason: First of all, if he is an epistemic rationalist and thus trueseeking, he wouldn't use the KL equation at all. The KL takes something accurate (or true) P and makes it less accurate (or less true) KLD, and that's exactly against what he is seeking - having more accurate and true results. But you tell me he has to choose between either "0" or "1/2". Well, if he has to chooce between one of these numbers he will still not choose to use the KL equation. The wiki mentions that the Q in the equation typically stands for "... a theory, model, description, or approximation of P" while the number "1/2" in your example is none of these but an arbitary number - this equation, then, does not fit the situation. He will use a different mathematical method, let's say, subtraction, and see the absolute value of what difference is smaller, in which case it will be 0's. Also, since 1/2 and 0 are arbitary numbers, an epistemic rationalist would know better than use any of these numbers in any equation, since it will produce a result that is accurate just as if would use any other two arbitary numbers. He would know that he should do his own calculations - ignoring the numbers 0 and 1/2 - and then compare his result to the numbers he is "offered" (0 and 1/2) and choose the closest number to his own calculation. Since he knows that the "true" probability is 10^-18 he will choose the closest number to his result which seems to be 0.

Of course, everything that I said about "1/2" above holds true about "0".

(I'm sorry in advance if my mathematical explentation are unclear or clumsy. If I explain arguments through math badly, then I explain arguments through math in English much worst as I was studying mathematics in a different language)