RegExp help please [SOLVED]

Wannabe · Post by **Wannabe** » Thu May 03, 2007 2:48 pm

Hey, im still learning regular expressions, and im pretty much stumped on this one, several people have tried to help me already and its just not right yet, im using a http package to read a website, and then parse for a specific piece of info on that website, the information im after is stored in a table.

The two lines that im looking at are :

<td width="45%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">Kills per Death:</font></td>
<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>

what i need to do, is check that the line before has Kills per Death, and then pull the value from the next line which in this case is 0.6193. there are several sections of the table that are identical appart from the text Kills per Death: hence why i need to check both lines.

i have tried many diffrent regexp to get this working, and all return nothing.

Any help and i would be very greatful.

Post by **Sir_Fz** » Thu May 03, 2007 2:59 pm

Code: Select all

regexp {\d+\.\d+} {<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>} value

this will store 0.6193 in value.

Wannabe · Post by **Wannabe** » Thu May 03, 2007 3:07 pm

That gives me a result of 4.01, which im not sure where its getting it from but its not correct.

The entire source that the regexp needs to search through is the source of this page : http://ns.wireplay.co.uk/hlstats.php?mo ... &player=55

if thats any help, i really dont understand that regexp you gave me atall. i stuggle to get my head around it

EDIT :
Ok i found that it match the 4.01 in this line

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

again i dont really have a clue how the regexp works, or id try fixing it myself

Post by **Sir_Fz** » Thu May 03, 2007 3:13 pm

Well \d matches any digit, \. matches a period '.' and + (or {1,}) means 1 or more. An alternative regexp you can try is:

Code: Select all

regexp {<.+><.+>(.+)<.+><.+>} {<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>} grbg value

$value should contain 0.6193.

Wannabe · Post by **Wannabe** » Thu May 03, 2007 3:24 pm

I think i explained badly, the regexp works on the entire source of the website, not just the two lines i posted, thats the reason i wanted to get the words Kills per Death: so that i was sure it was the right data.

the problem i have is the two seperate lines i dont know how to deal with. but thanks for explaining that regexp. it actually makes sence to me now

Post by **Sir_Fz** » Thu May 03, 2007 3:30 pm

If you provided code, it would've been easier. The concept is easy, this should explain it:

Code: Select all

# variable $lines is a list containing the html source
set notFound 1
foreach line $lines {
 if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
  set notFound 0
 } elseif {!$notFound} {
  regexp {\d+\.\d+} $line value
  break
 }
}
# $value contains the number.

Wannabe · Post by **Wannabe** » Thu May 03, 2007 3:49 pm

ive attempted to do what you suggested, however it never seems to find the Kill per Deaths:

Im wondering if ive split the file wrong, ive written $::html to a text document, and it comes out exactly as it is in the source. so im not sure why it wouldnt work my code is :

Code: Select all

set notFound 1
set lines [split $::html \n]
        foreach line $lines {
              if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
                  set notFound 0
              } elseif {!$notFound} {
                  putquick "PRIVMSG $chan : Line $line found"
                  regexp {\d+\.\d+} $line value
                  putquick "PRIVMSG $chan : Value is $value"
                  break
             }
       }

Post by **Sir_Fz** » Thu May 03, 2007 6:36 pm

Worked fine for me; tested it on tclsh

Code: Select all

proc bla {} {
 set url "http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55"
 set token [::http::geturl $url]
 set content [::http::data $token]
 ::http::cleanup $token
 set notFound 1
 foreach line [split $content \n] {
  if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
   set notFound 0
  } elseif {!$notFound} {
   regexp {\d+\.\d+} $line value
   puts $value
   break
  }
 }
}

% package require http
2.5.2
% bla
0.6649

Wannabe · Post by **Wannabe** » Thu May 03, 2007 8:10 pm

Yep, sorry about that, i accedently deleted a character when removing some trash code, and it make it check the wrong html var, hence no result. its all fixed and working now, thanks

Post by **Sir_Fz** » Thu May 03, 2007 8:54 pm

This is a much faster method to grep the information:

Code: Select all

proc blo {} {
 set url "http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55"
 set token [::http::geturl $url]
 set content [split [::http::data $token] \n]
 ::http::cleanup $token
 if {[set i [lsearch -glob $content {*Kills per Death:*}]]!=-1} {
  regexp {\d+\.\d+} [lindex $content [incr i]] value
  puts $value
 }
}