This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Help with regexp/regsub

Help for those learning Tcl or writing their own scripts.
Post Reply
M
Metuant
Voice
Posts: 3
Joined: Sat Jul 28, 2007 6:08 pm

Help with regexp/regsub

Post by Metuant »

Hi,

I'm trying to parse some information from a website using regsub and regexp, but i'm completely useless at regexp so now that they've updated their website the regexp no longer works.

The block of information I'm trying to parse (which is sometimes repeated multiple times - hence the while in the code) is:

<tr>
<td class='tablebottom'><img src="/img/member.gif" alt="[M]"/></td>
<!--name--><td class='tablebottom'>Abyssal whip</td>
<td class="tablebottom" title="Former average price: 1,650,000gp [decreased by 50,000gp]"><img src="/img/market/p_d.gif" alt="This price has decreased" /></td>
<!--price--><td class="tablebottom">1,550,000gp - 1,650,000gp</td>
<td class="tablebottom" width="20"><a href="/priceguide.php?report=45&par=" title="Report Incorrect Price"><img src="/img/!.gif" alt="[!]" border="0" /></a></td>
<td class="tablebottom"><a href="/priceguide.php?category=45">Obsidian & Abyssal</a></td>
</tr>

</table></form><br />

I'm trying to grab the item name (Abyssal whip) and its price (1,550,000gp - 1,650,000gp)
Using...

Code: Select all

	                
		while {[regexp "<!--name--><td class=\'tablebottom\'>(.*?)</td>\n\n<!--price--><td class=\"tablebottom\">(.*?)</td>\n<td class=\"tablebottom\" width=\"20\">" $data junk tname tprice]} {
			
			regsub "<!--name--><td class=\'tablebottom\'>[addslashes $tname]</td>\n\n<!--price--><td class=\"tablebottom\">[addslashes $tprice]</td>\n<td class=\"tablebottom\" width=\"20\">" $data - data
			if {$i == 0 || ([string match [string tolower [string range $item 0 1]] [string tolower [string range $tname 0 1]]] && [string length $tname] < [string length $name])} {
				set name $tname
				set price $tprice
I'm assuming that you can't just use \n\n to skip the line of useless data as I'd hoped..

Any help is appreciated
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Perhaps sanitize the data before you attempt to parse it. So newlines, carriage returns, tabs, etc.. get eliminated before you get to that step.

Code: Select all

regsub -all "\t" $data "" data
regsub -all "\n" $data "" data
regsub -all "\r" $data "" data
regsub -all "\v" $data "" data
You can use a quantifier to express a range. This snippet should work:

Code: Select all

      while {[regexp "<!--name--><td class=\'tablebottom\'>(.*?)</td>.*?<!--price--><td class=\"tablebottom\">(.*?)</td>.*?<td class=\"tablebottom\" width=\"20\">" $data junk tname tprice]} {

         regsub "<!--name--><td class=\'tablebottom\'>[addslashes $tname]</td>.*?<!--price--><td class=\"tablebottom\">[addslashes $tprice]</td>.*?<td class=\"tablebottom\" width=\"20\">" $data - data
         if {$i == 0 || ([string match [string tolower [string range $item 0 1]] [string tolower [string range $tname 0 1]]] && [string length $tname] < [string length $name])} {
            set name $tname
            set price $tprice
M
Metuant
Voice
Posts: 3
Joined: Sat Jul 28, 2007 6:08 pm

Post by Metuant »

Thanks for the help - it works great :]
Post Reply