This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

parsing this website

Help for those learning Tcl or writing their own scripts.
Post Reply
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

parsing this website

Post by theice »

I'm new to tcl as you know...

but I couldn't find anywhere how to easily open a connecting to a website and parse the information.

http://www.rockband.com/dlc is the website I want to open

Code: Select all

</style><div class="dlc_week">=== Week 17 ===</div>

<tr class="dlc_info_row">
<td>Black Tide</td>
<td>Shockwave</td>
</tr>

<tr class="dlc_info_row">
<td>Paramore</td>
<td>Crushcrushcrush</td>
<td>Master</td>
</tr>

<tr class="dlc_info_row">
<td>Serj Tankian</td>
<td>Beethoven's C***</td>
<td>Master</td>
</tr>
is the information on the website I want to grab ie:
set to this:

Code: Select all

</style><div class="dlc_week">$week</div>
<tr class="dlc_info_row">
<td>$artist</td>
<td>$song</td>
<td>$type</td>
</tr>
Last edited by theice on Sat Mar 15, 2008 5:57 am, edited 1 time in total.
Image
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Re: Connecting to websites in tcl

Post by rosc2112 »

theice wrote:but I couldn't find anywhere how to easily open a connecting to a website and parse the information.
Are you kidding?

You mean, out of all the hundreds of web-parsing script topics here, and all the web-parsing scripts in the archive, you couldn't find one example of how to grab/parse a website?
User avatar
DragnLord
Owner
Posts: 711
Joined: Sat Jan 24, 2004 4:58 pm
Location: C'ville, Virginia, USA

Post by DragnLord »

Do you want all the weeks, or just the latest?
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Re: Connecting to websites in tcl

Post by theice »

rosc2112 wrote:
theice wrote:but I couldn't find anywhere how to easily open a connecting to a website and parse the information.
Are you kidding?

You mean, out of all the hundreds of web-parsing script topics here, and all the web-parsing scripts in the archive, you couldn't find one example of how to grab/parse a website?
sorry I guess I didn't look aroung very much. I'm dumb.

I think it would be cool just to have the current week, once I figured out how to do that maybe in the future I would write something to be like Last 3 Weeks Downloaded Content!
Image
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Post by theice »

this is easier than I thought-

so far Ive been able to grab the week via:

Code: Select all

regexp {<div class="dlc_week">(.*?)</div>} $data - week
now I just have to figure out how to make it read the other info.

having trouble getting the artist / song / type

if someone can help me
Image
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Post by theice »

still not having and luck with the multi line parsing

its hard:

Code: Select all

regexp {<div class="dlc_week">(.*?)</div>} $data - week
putserv "PRIVMSG $c :$week" 

regexp {<tr class="dlc_info_row">
        <td>(.+?)</td>} $data - artist
    putserv "PRIVMSG $c :$artist"

http::cleanup $data

was what I was trying with no luck
Image
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

Try this, as one line:

Code: Select all

regexp {<tr class="dlc_info_row">.*?<td>(.*?)</td>} $data - artist 
Newlines are just another char in regexp. And, the tcl parser isn't going to work properly if you have newlines in the middle of a regexp (unless perhaps if you escape them so the parser knows to continue reading the next line as part of the previous.)
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Post by theice »

Code: Select all

regexp {<tr class="dlc_info_row">.*?<td>(.*?)</td>.*?<td>(.*?)</td>.*?<td>(.*?)</td>} $data - artist title type

ended up being the code thanks to you rosc2112...

but I can't come up with a good way to keep grabbing more songs until I get to </table>

this is how the page is laid out.

Code: Select all

<tr class="dlc_label_row">
<td width="45%">Band</td>
<td width="40%">Song</td>
<td width="15%">Type</td>
</tr>
<tr class="dlc_info_row">
<td>*</td> #Artist
<td>*</td> #Title
<td>*</td> #Type
</tr>
<tr class="dlc_credits_row">
<td colspan="3"></td>
</tr>
<tr class="dlc_info_row">
<td>*</td> #Artist
<td>*</td> #Title
<td>*</td> #Type
</tr>
<tr class="dlc_credits_row">
<td colspan="3"> “CrushCrushCrush” as performed by Paramore courtesy of Warner Music Group<br />
Hayley Williams and Josh Farro<br />
2007 WB Music Corp. (Ascap), But Father, I Just Want To Sing Music (ASCAP), FBR Music (ASCAP) And Josh's Music (ASCAP) All rights reserved.  Used by permission</td>
</tr>
<tr class="dlc_info_row">
<td>*</td> #Artist
<td>*</td> #Title
<td>*</td> #Type
</tr>
<tr class="dlc_credits_row">
<td colspan="3"> “Beethoven’s C***” as performed by Serjical Strike courtesy of Warner Music Group<br />Serj Tankian<br />2007 Stunning Suppository Sounds (BMI)All Rights Administered By Warner Tamerlane Publishing Corp.<br />
All rights reserved.  Used by permission</td>
</tr>
</table>
the *'s are what I want, the problem is this page is updated each week, and the format will stay the same, but the amount of songs wont. so idk what to do
Image
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Post by user »

Code: Select all

foreach tr [regexp -all -inline {<tr class="dlc_info_row">.*?</tr>} $data] {
	# put your regexp to extract details from $tr here
}
Have you ever read "The Manual"?
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Post by theice »

Code: Select all

regexp {<div class="dlc_week">(.*?)</div>} $data - week
putserv "PRIVMSG $c :$week" 

foreach tr [regexp -all -inline {<tr class="dlc_info_row">.*?</table>} $data] { 
regexp {<tr class="dlc_info_row">.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $data - artist title type 
putserv "PRIVMSG $c :$artist - $title ($type)"
}
http::cleanup $data
			
}
deff not working iight
Image
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Code: Select all

foreach tr [regexp -all -inline {<tr class="dlc_info_row">.*?</tr>} $data] { 
    regexp {<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $tr - artist title type 
    putserv "PRIVMSG $c :$artist - $title ($type)"
}
You goofed user's help. This corrects your goof. Don't change it this time.
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Post by theice »

that works great, but what I was trying to figure out is how to make it only paste the first weeks information, it keeps going until it runs out of songs, when I want it to stop after it reaches the first </table>
Image
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Code: Select all

# your header
regexp {<div class="dlc_week">(.*?)</div>} $data - week
putserv "PRIVMSG $c :$week"

# slim our html down to just that one week
regexp {<table class="dlc_table".*?>(.*?)</table>} data - $data

# parse as usual
foreach tr [regexp -all -inline {<tr class="dlc_info_row">.*?</tr>} $data] {
  regexp {<tr class="dlc_info_row">.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $data - artist title type
  putserv "PRIVMSG $c :$artist - $title ($type)"
}
edit: corrected obvious mistake... future glances can see this as correct now.
Last edited by speechles on Mon Mar 17, 2008 11:52 pm, edited 1 time in total.
t
theice
Voice
Posts: 36
Joined: Thu Mar 13, 2008 4:20 pm

Post by theice »

your way didn't completly work, but thanks you got me where I needed =]

Code: Select all

regexp {<div class="dlc_week">(.*?)</div>} $data - week 
putserv "PRIVMSG $c :$week" 


regexp {<table class="dlc_table".*?>(.*?)</table>} $data - data 

foreach tr [regexp -all -inline {<tr class="dlc_info_row">.*?</tr>} $data] { 
  regexp {<tr class="dlc_info_row">.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $tr - artist title type 
  putserv "PRIVMSG $c :$artist - $title ($type)" 
}
http::cleanup $data
			
}
Image
Post Reply