parsing another website

theice · Post by **theice** » Sun Mar 23, 2008 12:48 am

set title [lrange $text 0 end]

putserv "PRIVMSG $c :$title:" 
regexp {<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>} $data - data
regexp {<td><b><a href="/wiki/.*?" title="(.+?)">.*?</a></b></td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $data - artist guitar bass drums vocals band
putserv "PRIVMSG $c :by-$artist , Difficulties: Guitar-$guitar , Bass-$bass , VoX-$vocals , Drums-$drums , 

Band-$band" 

http::cleanup $data
			
}

working partially:

http://en.wikipedia.org/wiki/List_of_songs_in_Rock_Band

trying to grab the information from the site the problem is, its using different types of html coding for each title =[

Code: Select all

[00:47] <@|ICE|> .song Black Hole Sun
[00:47] <+ICEdrop> Black Hole Sun:
[00:47] <+ICEdrop> by-Jet (band) , Difficulties: Guitar-Tier 6 , Bass-Tier 6 , VoX-Tier 7 , Drums-Tier 5 , Band-Tier 6

instead of grabbing the correct $title, it grabs the very first one "Are You Gonna Be My Girl"

speechles · Post by **speechles** » Sun Mar 23, 2008 11:13 pm

theice wrote:

Code: Select all

regexp {<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>} $data - data

This is wrong, will never work within curly braces (substitution does not take place within curly bracings). The type of regexp you desire is known as a dynamic regexp. Look at the wikipedia/wikimedia portion of the unofficial google script, it uses these for #subtag look-ups. To use them correctly first build your regexp into a variable, then use quotes to build the regexp.

Code: Select all

set dynregex "<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>"
if {![regexp "$dynregex" $data - data]} {
  #notfound
} {
  #found
}

Notice, you MUST escape quotes within other quotes, but within curly braces there is no need.

also, what is the purpose of this beauty?!

Code: Select all

set title [lrange $text 0 end]

remember, do not confuse lists with strings, or vice versa. When you do unexpected behavior occurs, and you will be constantly fighting this later with code kludges and messy filters to compensate. It's always better to do it correctly to begin with.

Code: Select all

set title [join [lrange [split $text] 0 end]]

Notice the split (to protect special characters mischevious users may try for input), then an lrange on the list split creates, and afterwards a join to turn this list back into a string. Remember, #1 rule of Tcl never confuse a list and a string.

metroid · Post by **metroid** » Wed Apr 02, 2008 3:23 pm

though you told him how to use split and join properly, you still didn't fix that nasty lrange.

Using lrange $var 0 end is the exact same as not doing anything at all.

In this case, you can just use set title $text because "set title [join [lrange [split $text] 0 end]]" quite simply is the exact same.

egghelp/eggheads community

parsing another website

parsing another website

Re: parsing another website