Supermarkets, Stoners, and n-dimensional vector analysis
Go to the market and fill your shopping cart. Wait in line at the checkout counter. When the person behind the counter asks you if you have a "Club Card," you have a decision to make. If you give out that number you'll get a few bucks off catfood. But you've also pegged yourself in a table index. You are ROW 1 OF 1, and would you like 15 cents off this inferior brand of catfood next time you shop?
But you're a savvy one. You don't provide ID when asked. You NOSPAM your email address. You PGP sign all your email. And you definitely don't slide your "club card." And you think you're just an anonymous shopper, an opaque entity. No database row, you!
But you're wrong....
Have you ever asked a search engine to find "Pages Similar To This One?" At the core of that search engine may lie a document vector model. To understand how this works, consider that in order to index and search for each document, a search engine must first tokenize each document. Semantic sugar (stopwords) like the, and, and a are stripped from the document, and some words are normalized (crying ⇒ cry). For each document d, the number of times a particular token appears in the document is taken to represent a vector in an n-dimensional space.
We live in a three dimensional space, and points in our universe are therefore formed by linear combination of vectors in three directions. Documents live in n-dimensional space. What is n? n is the total number of tokens that have been or ever can be found in any document ("that's "all words, ever"). Every document has a special place in n-dimensional space. And just as you can find the closest supermarkets in three dimensional space with a little trigonometry, so too can you find the nearest documents in n-dimensional space using a little n-gonometry. Those are the "Pages Similar To This One."
Speaking of supermarkets... What happens when you fill up your cart with groceries? Some catfood here, two red peppers, one green pepper, three cans of soup (spaghetti with meatballs), a gallon of milk, a Hershey's chocolate bar. Your shopping cart is more unique than you think. And when you go to checkout, the items in that cart become a set of vectors in an n-dimensional space, where n is the set of all products for sale in the supermarket. Sure, one week you don't need catfood, and decide to try a different brand of toothpaste. But for every cart, its possible to ask "Find Me Shoppers Similar To This One." And chances are, those shoppers are you!
Supermarkets can use the "club card" and the vector space model simultaneously. Once your card identity is linked to your own little home in n-dimensional space, they can identify you by name next time... even if you don't swipe your card, and pay by cash.
I make this point only to give perspective to an experience I had this week. My wife and I stopped in the supermarket to pick up some popcorn before watching a DVD with a friend. And realizing that we had not yet had dinner that night, we wandered the aisles self-indulgently, grabbing any snack-food caught our eyes. We checked out with shopping bags full of absolute crap. On the back of my receipt, there was an advertisement for Cheetos. We hadn't purchased any product confusingly similar to Cheetos...
And then I realized. We were the shopper most similar to The Stoner. Not just any stoner, but that Archetypal Stoner. That "aggregate" customer, that volume in n-dimensional space carved out by 2am munchie runs since time immemorial.
And dammit, I had just swiped my club card.
Comments
Yeah, privacy doesn't exist anymore. You ever read the book 'Snow Crash'?
Posted by: Nick Gerakines | February 4, 2006 02:05 PM
Hey, are you going to use that Cheetos coupon?
Posted by: Paul | February 23, 2006 12:41 PM