This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

RegExp help please [SOLVED]

Help for those learning Tcl or writing their own scripts.
Post Reply
W
Wannabe
Voice
Posts: 17
Joined: Fri Feb 10, 2006 1:02 pm

RegExp help please [SOLVED]

Post by Wannabe »

Hey, im still learning regular expressions, and im pretty much stumped on this one, several people have tried to help me already and its just not right yet, im using a http package to read a website, and then parse for a specific piece of info on that website, the information im after is stored in a table.

The two lines that im looking at are :

<td width="45%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">Kills per Death:</font></td>
<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>

what i need to do, is check that the line before has Kills per Death, and then pull the value from the next line which in this case is 0.6193. there are several sections of the table that are identical appart from the text Kills per Death: hence why i need to check both lines.

i have tried many diffrent regexp to get this working, and all return nothing.

Any help and i would be very greatful.
Last edited by Wannabe on Thu May 03, 2007 8:11 pm, edited 1 time in total.
User avatar
Sir_Fz
Revered One
Posts: 3794
Joined: Sun Apr 27, 2003 3:10 pm
Location: Lebanon
Contact:

Post by Sir_Fz »

Code: Select all

regexp {\d+\.\d+} {<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>} value
this will store 0.6193 in value.
Last edited by Sir_Fz on Thu May 03, 2007 3:07 pm, edited 1 time in total.
W
Wannabe
Voice
Posts: 17
Joined: Fri Feb 10, 2006 1:02 pm

Post by Wannabe »

That gives me a result of 4.01, which im not sure where its getting it from but its not correct.

The entire source that the regexp needs to search through is the source of this page : http://ns.wireplay.co.uk/hlstats.php?mo ... &player=55

if thats any help, i really dont understand that regexp you gave me atall. i stuggle to get my head around it


EDIT :
Ok i found that it match the 4.01 in this line

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

again i dont really have a clue how the regexp works, or id try fixing it myself
User avatar
Sir_Fz
Revered One
Posts: 3794
Joined: Sun Apr 27, 2003 3:10 pm
Location: Lebanon
Contact:

Post by Sir_Fz »

Well \d matches any digit, \. matches a period '.' and + (or {1,}) means 1 or more. An alternative regexp you can try is:

Code: Select all

regexp {<.+><.+>(.+)<.+><.+>} {<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>} grbg value
$value should contain 0.6193.
W
Wannabe
Voice
Posts: 17
Joined: Fri Feb 10, 2006 1:02 pm

Post by Wannabe »

I think i explained badly, the regexp works on the entire source of the website, not just the two lines i posted, thats the reason i wanted to get the words Kills per Death: so that i was sure it was the right data.

the problem i have is the two seperate lines i dont know how to deal with. but thanks for explaining that regexp. it actually makes sence to me now :)
User avatar
Sir_Fz
Revered One
Posts: 3794
Joined: Sun Apr 27, 2003 3:10 pm
Location: Lebanon
Contact:

Post by Sir_Fz »

If you provided code, it would've been easier. The concept is easy, this should explain it:

Code: Select all

# variable $lines is a list containing the html source
set notFound 1
foreach line $lines {
 if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
  set notFound 0
 } elseif {!$notFound} {
  regexp {\d+\.\d+} $line value
  break
 }
}
# $value contains the number.
W
Wannabe
Voice
Posts: 17
Joined: Fri Feb 10, 2006 1:02 pm

Post by Wannabe »

ive attempted to do what you suggested, however it never seems to find the Kill per Deaths:

Im wondering if ive split the file wrong, ive written $::html to a text document, and it comes out exactly as it is in the source. so im not sure why it wouldnt work my code is :

Code: Select all

set notFound 1
set lines [split $::html \n]
        foreach line $lines {
              if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
                  set notFound 0
              } elseif {!$notFound} {
                  putquick "PRIVMSG $chan : Line $line found"
                  regexp {\d+\.\d+} $line value
                  putquick "PRIVMSG $chan : Value is $value"
                  break
             }
       }
User avatar
Sir_Fz
Revered One
Posts: 3794
Joined: Sun Apr 27, 2003 3:10 pm
Location: Lebanon
Contact:

Post by Sir_Fz »

Worked fine for me; tested it on tclsh

Code: Select all

proc bla {} {
 set url "http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55"
 set token [::http::geturl $url]
 set content [::http::data $token]
 ::http::cleanup $token
 set notFound 1
 foreach line [split $content \n] {
  if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
   set notFound 0
  } elseif {!$notFound} {
   regexp {\d+\.\d+} $line value
   puts $value
   break
  }
 }
}
% package require http
2.5.2
% bla
0.6649
Last edited by Sir_Fz on Thu May 03, 2007 8:58 pm, edited 1 time in total.
W
Wannabe
Voice
Posts: 17
Joined: Fri Feb 10, 2006 1:02 pm

Post by Wannabe »

Yep, sorry about that, i accedently deleted a character when removing some trash code, and it make it check the wrong html var, hence no result. its all fixed and working now, thanks :)
User avatar
Sir_Fz
Revered One
Posts: 3794
Joined: Sun Apr 27, 2003 3:10 pm
Location: Lebanon
Contact:

Post by Sir_Fz »

This is a much faster method to grep the information:

Code: Select all

proc blo {} {
 set url "http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55"
 set token [::http::geturl $url]
 set content [split [::http::data $token] \n]
 ::http::cleanup $token
 if {[set i [lsearch -glob $content {*Kills per Death:*}]]!=-1} {
  regexp {\d+\.\d+} [lindex $content [incr i]] value
  puts $value
 }
}
Post Reply