This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Parsing HTML

Old posts that have not been replied to for several years.
Locked
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Parsing HTML

Post by Darkj »

Basically, I have zero idea how to even start this. I wanna make a server status script for my bot that would read off a webpage (http://chronicle.ubi.com/). Now if you view that webpage, you will see on the right about 10 servers. I would like to be able to get the status for the one server called Deception. So basically i just wanna see if beside Deception, if it says UP or Down.

I have got the http script loaded, but after trying the parsing I get totally lost, if someone can provide some insight, that would be greatly appreciated. Thanks.
User avatar
CrazyCat
Revered One
Posts: 1279
Joined: Sun Jan 13, 2002 8:00 pm
Location: France
Contact:

Post by CrazyCat »

If you catch the html page, so it's allright.
Just put each line in $line and make

Code: Select all

set present [string first "Deception" $line]
if {$present != 0} {
# it's ok, you have your line
} else {
# it is not the good line
}
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Post by Darkj »

Ok, I can't even get this connect stuff down, how do I connect to the site, search for the line, take arg 1 from that line.
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Post by Darkj »

Code: Select all

proc status_callback {sock} { 
  global inifilestatus
  set data [[b]egghttp[/b]:data $sock] 
  [b]egghttp[/b]:cleanup $sock 
  regsub -all "\n" $data "" data 
  regsub -all "<br>" $data "\n" data 
  foreach line [split $data \n] { 
    if {[string match "*Deception Up*" $line]} { 
      set item [join [lindex [split $line] 1]]
    }
    return 0
  } 
} 
That probably looks totally wrong, i'm not sure it even works, or if I'm calling it right.

The page has multiple words "Deception" in it, so I'm hoping that *Deception Up* would work, then just write another if statement for Down servers.

But my problem is now, I can't get the data from that proc back to a var so I can use the !status command.

So now I just need help fixing that messy code, and setting the var in the !status proc. Thanks
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Post by Darkj »

Oh and this is the line on the page that I am trying to get

<tr><td>  </td><td><a href="/WorldMap/CityList.htm?World=Deception">Deception</a></td><td><font color="green"><b>Up</b></font></td><td>  </td></tr>
User avatar
strikelight
Owner
Posts: 708
Joined: Mon Oct 07, 2002 10:39 am
Contact:

Post by strikelight »

Darkj wrote:Oh and this is the line on the page that I am trying to get

<tr><td>  </td><td><a href="/WorldMap/CityList.htm?World=Deception">Deception</a></td><td><font color="green"><b>Up</b></font></td><td>  </td></tr>
If that's the line you are trying to get, then your string match is invalid..

Code: Select all

if {[string match "*Deception Up*" $line]} { 
should be:

Code: Select all

if {[string match "*Deceoption*Up*" $line]} {
User avatar
CrazyCat
Revered One
Posts: 1279
Joined: Sun Jan 13, 2002 8:00 pm
Location: France
Contact:

Post by CrazyCat »

Darkj wrote:Oh and this is the line on the page that I am trying to get

<tr><td>  </td><td><a href="/WorldMap/CityList.htm?World=Deception">Deception</a></td><td><font color="green"><b>Up</b></font></td><td>  </td></tr>
if you're on a *nix system with lynx, you can use the simple way I often use:

Code: Select all

file delete -force $usertemp
set fs [open $usertemp w]
puts $fs [exec $lynx -preparsed -dump $rub]
close $fs
and you don't have any html code in $fs :)
User avatar
GodOfSuicide
Master
Posts: 463
Joined: Mon Jun 17, 2002 8:00 pm
Location: Austria

Post by GodOfSuicide »

the problem with lynx is that there is no real timeout
i had some of my scripts with lynx too, but when the webpage to slow etc the exec just took some minutes -> bot times out....
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Post by Darkj »

Code: Select all

proc status_callback {sock} { 
  global inifilestatus
  set data [egghttp:data $sock] 
  egghttp:cleanup $sock 
  regsub -all "\n" $data "" data 
  regsub -all "<br>" $data "\n" data 
  foreach line [split $data \n] { 
    if {[string match "*Deception*Up*" $line]} { 
      set item [join [lindex [split $line] 1]]
      ini_write $inifilestatus server status $item
    }
    if {[string match "*Deception*Down*" $line]} { 
      set item [join [lindex [split $line] 1]]
      ini_write $inifilestatus server status $item
    }
    return 0
  } 
} 
Ok so when it checks the status, its supposed to use my ini_write proc. But the problem is, is its not even getting the line, I'm not sure If its even getting the page properly.

To call it in another script, i do this:

Code: Select all

set sock [egghttp:geturl chronicle.ubi.com/ status_callback]
set server_status [ini_read $inifilestatus server status]
But thats totally wrong as I've never ever used this http stuff in a script before so i'm totally lost.
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Post by Darkj »

anyone able to help?
User avatar
strikelight
Owner
Posts: 708
Joined: Mon Oct 07, 2002 10:39 am
Contact:

Post by strikelight »

Darkj wrote:

Code: Select all

proc status_callback {sock} { 
  global inifilestatus
  set data [egghttp:data $sock] 
  egghttp:cleanup $sock 
  regsub -all "\n" $data "" data 
  regsub -all "<br>" $data "\n" data 
  foreach line [split $data \n] { 
    if {[string match "*Deception*Up*" $line]} { 
      set item [join [lindex [split $line] 1]]
      ini_write $inifilestatus server status $item
    }
    if {[string match "*Deception*Down*" $line]} { 
      set item [join [lindex [split $line] 1]]
      ini_write $inifilestatus server status $item
    }
    return 0
  } 
} 
Ok so when it checks the status, its supposed to use my ini_write proc. But the problem is, is its not even getting the line, I'm not sure If its even getting the page properly.

To call it in another script, i do this:

Code: Select all

set sock [egghttp:geturl chronicle.ubi.com/ status_callback]
set server_status [ini_read $inifilestatus server status]
But thats totally wrong as I've never ever used this http stuff in a script before so i'm totally lost.
Because egghttp works in non-blocking mode (ie. it won't freeze up your bot), your 'set server_status' line will most likely be called before the callback proc gets called.. If you want to set anything, it will have to be within the callback proc itself... To verify it is working, add 'putlog' statements in your callback proc, and stay in your bot's partyline while the bot tries to connect to the site and watch what is going on.
S
Syntax
Voice
Posts: 10
Joined: Sun Oct 05, 2003 2:37 pm

Post by Syntax »

Are there any good tutorials on the subject parsing files/html
???
if there is i would reallt like an url.
or if someone could help me that would be good too.
prefrably real time like chat on ICQ or IRC..
D
Darkj
Halfop
Posts: 86
Joined: Sun Jul 06, 2003 9:58 pm

Post by Darkj »

GodOfSuicide wrote:the problem with lynx is that there is no real timeout
i had some of my scripts with lynx too, but when the webpage to slow etc the exec just took some minutes -> bot times out....
what would be the best method here, does fetch or wget have a good timeout?
User avatar
BarkerJr
Op
Posts: 104
Joined: Sun Mar 30, 2003 1:25 am
Contact:

Post by BarkerJr »

You really don't want to use exec on anything slow. Use open and use a pipe. Then you can read from the fd that open returns.

e.g.:
set fd [open "|wget http://www.cnn.com/" r]
Locked