Scratchpad

Silos

(, , , , , — )

23 Jan. 2008

I finally decided that after being in NY for four months it was time to check out at least one local lesbian hangout, albeit with trepidation. (Honestly, could there possibly be a less interesting scene in the entire world than a lesbian bar? If so, morbid curiosity impels me to keep looking for it.) So I popped on down to a little joint in the West Village, sidled up to the bar, turned to my left, and to my complete non-surprise saw another Sarah Lawrence grad from my class on the stool next to me. We'll call her H.

Needless to say we got to talking (if nothing else, it's hard to shut a Sarah Lawrence student up) on a whole host of topics, including words, neighborhoods, experiments in group ethics, pulling people off of subway tracks, girlfriends, the Internet, waiting tables, tiny apartments. You know, the usual topics for liberal arts geeks who get together. All in all, quite an enjoyable evening, and I had to scold myself for expecting it to turn out badly.

Even so, I was - and have remained - particularly discouraged by one aspect of the conversation. We were talking about words. I believe her friend asked whether a particular word was a real word or if she was making it up, and the bartender, who was a weird, veritable font of random trivia and aspires one day to be on Jeopardy, indicated that it was. Then the OED came up and we started talking dictionaries and made up words. I suggested that if H. was really as intrigued and simultaneously disgusted by made-up words as she indicated, she should check out the Urban Dictionary.

Now, as an aside, I find Urban Dictionary wonderful and fascinating and hilarious, and although I used to be rather uptight about "proper English," I've since come to embrace how wonderful and rich language can be through the process of evolution. Sure, there are certain words that I abhor and refuse to use (I'm thinking mostly of business-ese here...the utilizes and synergies and concretizing), but, on the whole, I think that playfulness and ingenuity are admirable traits in all other areas of life, so why not with language?

So, having divulged that about myself, I shouldn't have to tell you that I was pretty crestfallen when the bartender piped in with, "Urban Dictionary? Oh my God, that is the worst fucking site ever. I fucking hate that site. It's an affront to the English language. It's illiterate, ghetto central," or something pretty similar.

I felt like I had to come to the site's rescue. "Well that's the point, isn't it? That's what makes it so fascinating. I mean, it's precisely because these aren't the types of words you would use that it's so wonderful. This isn't stuff you're going to hear from most of your friends or the circles you normally run in, it gives you a little window into a whole different culture." Which is true, but I know that my defense came across as pretty lame.

And then, what the fuck out of the blue, H. comes at me with something about how everyone thinks the Internet is so wonderful but it's not, because it leaves out whole swaths of the population. Essentially, she came at me with the access argument, but couched in slightly different terms. Her implication to me was how could I be defending the Internet as being great when there are so many minorities who aren't represented on it? (As if that's the Internet's fault and not society's fault, but I haven't quite got to that part of the story yet).

Now, that's fine, that's all well and good, I've heard that argument a million times before, but the part that really slapped me at the time and which has only more persistently been seeping into every pore and just nagging the hell out of me ever since is that we were just discussing a website frequented by and essentially made by these very same "underrepresented minorities" she was purporting to defend, and she and everyone else in the conversation was trashing the site as being an illiterate piece of shit. But more to the point, they were making it very, very clear that they absolutely, positively do not go to those sorts of websites.

So the bit that's really started to nag at me is just how accurate is the party line on underrepresented groups on the Internet? And I do mean party line - I hear this argument stated as fact all the time on any number of the mailing lists I'm on, in articles in the Times, on Slate, Personal Telco...everywhere. (And by everywhere, I mean everywhere there are liberal, white, educated folk who have the white man's burden to make sure everyone gets access, or at least talk about it whether or not they are actually trying to do anything about it.) After watching the conversations in such places carefully over the last several years, after studying topics like viral marketing, after listening to endless political rhetoric, I've become keenly aware of how truths wax and wane and become more true within closed communities and it's become very, very hard for me to accept anything as fact just because I hear it a lot. If anything, the more I hear something the more suspect it becomes in my mind. I've come to hate taking anything for granted, least of all my own beliefs.

So that's it. It's been bothering me like mad ever since I had this conversation. I've been on some crazy sites during my love affair with the Internet. Forums haunted by professional mercenaries. Social networks comprised of miners and welfare moms. Porn sites. I came across a blog the other day by and for perfume industry professionals that practically bordered on scent fetishism. Black power sites. Sites from school kids in rural Appalachia. Sites from African-American expats living in Africa. Latino dating sites. I even looked at a couple of neo-Nazi sites once...for a few minutes, anyway. Even I have my limits. But mostly I just stick to my little corner of things and chat with the folks I know and who have similar interests to my own. And that's the part that bothers me. Here we are talking about underrepresented this and unfair that and no access and human rights, and from what I've seen (and this extends beyond H., to be fair to her) most of the people doing the talking aren't willing to explore the very corners of the Internet they claim don't exist.

So is it that they don't exist? Or are they there and are we just too damn ignorant and self-important to know they do exist? Do we actually want to know they exist? Or maybe we want them to have access, but only if the pages they make look and sound white?

The art of search

(, , , , — )

21 Nov. 2007

Recently, I've been thinking more and more about the skills that go into an effective search, prompted in particular by my efforts to search for high-quality, academic pages written in Spanish. I was looking, specifically, for sites on the history of information.

In English, this would be a no-brainer: I'd start out with "history of (information|epistemology|knowledge)," based on my assumption that a person discussing the topic I was interested in would actually use the complete phrase "history of X." After 5 seconds of skimming result titles, I'd probably tack on something like -"history of information technology." Tweak, play, rinse, repeat. In Spanish, however, I immediately ran up against several problems:

  1. Word frequency.
    I don't actually have any idea how common particular words or phrases are in Spanish. I can do a direct translation, turning "history of information" into "historia de la informacion," (taking the time to actually use grave marks, which I am not doing right now) but I don't know if a native Spanish speaker would use this particular turn of phrase. I don't know how often they would use it. I don't know how often one word is used over another.
  2. Word pairings.
    Just as certain words are more common than others, so do certain words appear near others with degrees of regularity. Using Spanish, I don't have a clue what words might be used in conjunction with one another.
  3. Synonyms.
    Related to word frequency, I don't know which synonym for a word would be used in the context I am looking at. I don't know what particular words are considered high-brow, low-brow, pedantic, or slangy. If I'm looking at academic pages, I can assume the word chingar (fuck) will not appear, but what is the preferred alternative? "Relaciones sexuales?"
  4. Tone, source.
    In the unlikely event that I actually found a page that seemed to discuss what I was interested in, I found I had no way to evaluate the source using methods I am accustomed to. Normally a person reads something and decides, using multiple, subtle clues, whether the author seems reliable. Have they made absurd statements? Do you agree with their political sensibilities? Do they seem well-spoken, or practically illiterate? Is their language pretentious, plain, educated, insane? Is there a good flow? Does it just "sound right?" Using Spanish, I had no way to actually perform these actions.

Of course, this is important even for people who are ostensibly fluent in the language they are performing their searches in. I noticed earlier today that someone landed on my site using the search "animal planet mouse that fucks till its heart explodes" (without quotes). After getting over my initial reaction (something along the lines of, "dear god, what the fuck?"), I immediately started picking apart the search itself. Why didn't the user put "animal planet" in quotes as a phrase? Why didn't the user replaces fucks with something more appropriate to an Animal Planet audience, like copulates, mates, or just good ol' sex? Why didn't the user notice, before going to the effort to click through to my site, that the Google preview of that result shows that my site has nothing to do at all with the topic being searched for?

My second reaction was to marvel at how totally inappropriate all the results were, and to realize that blogging has had a profound effect on the usability of search results. The default setting for most blogs presents 10 posts on one page - 10 posts that often have little if anything to do with each other. 10 posts which are, in essence, completely different documents, but which search engines are treating as a single entity. Couple this with the fact that most blog posts are informal, written on the fly, not carefully planned for things like SEO and it's a wonder people can find anything.