This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

parsing another website

Help for those learning Tcl or writing their own scripts.
Post Reply
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

parsing another website

Post by theice »

Code: Select all

set title [lrange $text 0 end]

putserv "PRIVMSG $c :$title:" 
regexp {<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>} $data - data
regexp {<td><b><a href="/wiki/.*?" title="(.+?)">.*?</a></b></td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $data - artist guitar bass drums vocals band
putserv "PRIVMSG $c :by-$artist , Difficulties: Guitar-$guitar , Bass-$bass , VoX-$vocals , Drums-$drums , 

Band-$band" 

http::cleanup $data
			
}
working partially:

http://en.wikipedia.org/wiki/List_of_songs_in_Rock_Band

trying to grab the information from the site the problem is, its using different types of html coding for each title =[

Code: Select all

[00:47] <@|ICE|> .song Black Hole Sun
[00:47] <+ICEdrop> Black Hole Sun:
[00:47] <+ICEdrop> by-Jet (band) , Difficulties: Guitar-Tier 6 , Bass-Tier 6 , VoX-Tier 7 , Drums-Tier 5 , Band-Tier 6
instead of grabbing the correct $title, it grabs the very first one "Are You Gonna Be My Girl"
Image
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Re: parsing another website

Post by speechles »

theice wrote:

Code: Select all

regexp {<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>} $data - data
This is wrong, will never work within curly braces (substitution does not take place within curly bracings). The type of regexp you desire is known as a dynamic regexp. Look at the wikipedia/wikimedia portion of the unofficial google script, it uses these for #subtag look-ups. To use them correctly first build your regexp into a variable, then use quotes to build the regexp.

Code: Select all

set dynregex "<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>"
if {![regexp "$dynregex" $data - data]} {
  #notfound
} {
  #found
}
Notice, you MUST escape quotes within other quotes, but within curly braces there is no need.

also, what is the purpose of this beauty?!

Code: Select all

set title [lrange $text 0 end] 
remember, do not confuse lists with strings, or vice versa. When you do unexpected behavior occurs, and you will be constantly fighting this later with code kludges and messy filters to compensate. It's always better to do it correctly to begin with.

Code: Select all

set title [join [lrange [split $text] 0 end]]
Notice the split (to protect special characters mischevious users may try for input), then an lrange on the list split creates, and afterwards a join to turn this list back into a string. Remember, #1 rule of Tcl never confuse a list and a string.
m
metroid
Owner
Posts: 771
Joined: Wed Jun 16, 2004 2:46 am

Post by metroid »

though you told him how to use split and join properly, you still didn't fix that nasty lrange.

Using lrange $var 0 end is the exact same as not doing anything at all.

In this case, you can just use set title $text because "set title [join [lrange [split $text] 0 end]]" quite simply is the exact same.
Post Reply