This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Regexp Problem

Help for those learning Tcl or writing their own scripts.
Post Reply
J
Jarek
Voice
Posts: 3
Joined: Mon Nov 19, 2007 10:41 am

Regexp Problem

Post by Jarek »

Hi Folks.

I'd like to get the profile id value out of this line:

Code: Select all

<td align="center"><a class="profil_link" href="javascript:;" onclick="window.open('/profile/index.php?profile_id=20129','_blank','width=730,height=600,status=no,toolbars=no,scrollbars=yes');"><img class="td_border" src="/pictures/60x80/11-07/20129_47400a57eefb8.jpg" width="60" height="80" border="0" alt="jaroslove"></a></td>
How I've to build the regular expression to get the value "20129"?

Thanks.
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

Assuming the data is in a var called $html:

Code: Select all

regexp {'/profile/index.php?profile_id=(.*?)'} $html fullmatch exactmatch

# the data you want will be in $exactmatch var.
J
Jarek
Voice
Posts: 3
Joined: Mon Nov 19, 2007 10:41 am

Post by Jarek »

Hm, $exactmatch is empty after doing this.

My proc looks this like:

Code: Select all

proc poloniaflirt::internalCom { suche } {

  set fullmatch ""
  set exactmatch ""
  set log1 [open pf.txt a]
  set log2 [open reg.txt a]
  set pfsearchurl "http://www.polonia-flirt.de/search/index.php"
  set pfquery [::http::formatQuery sea_nickname "$suche" send "send"]
  set page [http::config -useragent "Mozilla/4.0 (compatible\; MSIE 6.0\; Windows NT 5.0)"]
  set page [::http::geturl $pfsearchurl -query $pfquery -timeout $poloniaflirt::pftimeout]
  set html [::http::data $page]
  puts $log1 "$html"
  close $log1
  regexp {'/profile/index.php?profile_id=(.*?)'} $html fullmatch exactmatch
  puts $log2 "$exactmatch"
  close $log2
  return $page

}
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Jarek wrote:

Code: Select all

regexp {'/profile/index.php?profile_id=(.*?)'} $html fullmatch exactmatch

Code: Select all

regexp {'/profile/index\.php\?profile_id=(.*?)'} $html fullmatch exactmatch
You need to \escape the period(.) and you need to \escape the question mark(?)
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

I don't think that would make much difference, as both . (dot) and ? are wildcard chars, so it should have matched the string.

Is the var $html empty of data? You don't handle any error conditions, so it could be that the data is not being retrieved.

Here is an example of getting html data and handling error conditions, then fishing out the data you want:

Code: Select all

set xeurl "http://www.xe.com/ucc/convert.cgi"
set xequery [::http::formatQuery Amount "$amount" From "$fromcur" To "$tocur"]
catch {set page [::http::geturl $xeurl -query $xequery -timeout $xeutimeout]} error
if {[string match -nocase "*couldn't open socket*" $error]} {
        puthelp "PRIVMSG $nick :Error: couldn't connect to XE.com..Try again later"
        ::http::cleanup $page
        return
}
if { [::http::status $page] == "timeout" } {
        puthelp "PRIVMSG $nick :Error: Connection timed out to XE.com."
        ::http::cleanup $page
        return
}
set html [::http::data $page]
::http::cleanup $page

if {[regexp {>Live rates at (.*?)</span>} $html match xetime]} {
        #some of the IF above has been deleted for this example
        # manipulate the data:
        regsub -all {<!.*?>} $fromamount {} fromamount
        regsub -all {<!.*?>} $toamount {} toamount
        puthelp "PRIVMSG $chan :XE.COM: \002$fromamount\002 equals \002$toamount\002 as of $xetime"
} else {
        puthelp "PRIVMSG $chan :Could not obtain results from XE.com, sorry!"
}
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

Half right, half wrong...
. would match any character, and would survive not being escaped.
? however does not match any characters by itself, but is used to match 0 or 1 occurances of the prefixed atom (in this case the character p). In this case it must be escaped.
NML_375
J
Jarek
Voice
Posts: 3
Joined: Mon Nov 19, 2007 10:41 am

Post by Jarek »

You need to \escape the period(.) and you need to \escape the question mark(?)
Thanks, mate! This was the right thing. I had to escape the special chars. Now it works!
Post Reply