Scratchpad

Slightly improved KTRU Top 35 list

(, , — )

4 Aug. 2007

I realized my last list cut out a huge swath of much older Top 35 lists, so I've modified my script slightly. It's still ugly and definitely not perfect, but, really, how much time do I have to play with perfecting this (don't ask)?


wget -w3 -r -l1 -IOLDLIST --no-parent http://bang.rice.edu/top35archive.shtml


rm test.txt; for i in * ; do sed -n '/op 35:/,$p' $i >> test.txt ; done


sed 's/<br>/%/g' test.txt | tr '%' 'backslashn' | tr "[:upper:]" "[:lower:]" | egrep -v "top 35" | sed -e 's/- /:: /g; s/ / / :: /g; s/   / :: /g' | egrep '::' | sed -e 's/^[0-9+] :: //g; s/^.[0-9] :: //g; s/^[0-9+]. //g; s/^.[0-9]. //g; s/^ //g' | sed -e :a -e '$!N;s/backslashn[^a-z0-9]/ /;ta' -e 'P;D' | tr -s " " | sed -e 's/<[a-z0-9 /"=]*>//g; s/^M//g; s/: ::/ ::/g;' | sort | uniq > top35.txt

Apologies if it comes out like crap or doesn't work for anyone. The special characters may or may not appear properly in the browser. If you run it and it doesn't work, give me a yell. Or, just look at the finished list.

No Comments

No comments yet.

RSS feed for comments on this post. ||

Leave a comment