This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Simple parse

Old posts that have not been replied to for several years.
Locked
T
TC^

Simple parse

Post by TC^ »

I need help on how to parse information from a simple HTML page...

for example:

This is the HTML-page:
9,1,16,80,9,128,some text
And i need the script to return the first number (9) on !first and "some text" on !second


Please don't redirect me to some large documentation!.. I'm still quite a nub in tcl-scripting...

Hope someone can help...
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

I am afraid, we will direct you to large documentation.

However, we we will give you pointers on where to be looking within it.

First, you say it's HTML page, but what you posted isn't. SO I am guessing, that that text is embeded somwhere within the HTML page tiself.

First thing I sugest, is to download google.tcl, then read through the Tcl docs and see whop to download the HTML page. This part is straight forward.

Next, You will need to make the Tcl, parse through, and locate the text in question.

A series of string commands, or regexps should get you to this location, by looking at the ltext, for strings of texts, that remain within the page, even if the page is dynamicly generated.

Once you have obtained the text, and stored it in a variable, you can use "split" and "lindex" to obtaint he values you need.

I fyou need any more ifnormation, you will need to ask specific questions. Simply saying I need A from a web-page, wihtout giving any other details about the page contents, is simply a non-starter for us. Only you know what the content is so far.
T
TC^

Post by TC^ »

Sorry I wasn't more precise in my question..

This is the page i'm talking about: http://213.114.155.110:8000/7.html

But thanks anyway! For your fast answer.. I'm thanful for that!

If you could give me some more pointers i'll be delighted!
ppslim wrote:I am afraid, we will direct you to large documentation.

However, we we will give you pointers on where to be looking within it.

First, you say it's HTML page, but what you posted isn't. SO I am guessing, that that text is embeded somwhere within the HTML page tiself.

First thing I sugest, is to download google.tcl, then read through the Tcl docs and see whop to download the HTML page. This part is straight forward.

Next, You will need to make the Tcl, parse through, and locate the text in question.

A series of string commands, or regexps should get you to this location, by looking at the ltext, for strings of texts, that remain within the page, even if the page is dynamicly generated.

Once you have obtained the text, and stored it in a variable, you can use "split" and "lindex" to obtaint he values you need.

I fyou need any more ifnormation, you will need to ask specific questions. Simply saying I need A from a web-page, wihtout giving any other details about the page contents, is simply a non-starter for us. Only you know what the content is so far.
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

A nice simple page.

First off, you should learn how to use the HTTP package for Tcl. As stated, google.tcl uses this, so is a good place to an example.

Using the data it downloads, you will can start paring the contents.

Once simple idea, would be to locate the end of the tag <BODY>. This can be done using some of the simplified commands Tcl provides.

You would then save all content up to the </BODY> tag, again with the provided commands.
T
TC^

Post by TC^ »

I've tried looking through google.tcl, and I must confess, it's a little too complicated for me..

However I've found another script that was far more simple, but not nearly as advanced, and therefore more difficult to get to do what I want..

Code: Select all

set radio_url "http://213.114.155.110:8000/7.html"

bind pub - !users users_get

proc users_get {nick mask hand chan args} {
	global radio_url
	set file [open "|lynx -source $radio_url" r]
	set html "[gets $file]"
	regsub "<HTML><meta http-equiv=\"Pragma\" content=\"no-cache\"></head>" $html "" html
      regsub "<body>" $html "\002 Listeners: \002" html
	putchan $chan $html
}
The webpage is following:
<HTML><meta http-equiv="Pragma" content="no-cache"></head><body>4,1,8,80,4,128,Fpu - Racer Car (Voidcom)</body></html>
And the script returns:
Listeners: 4,1,8,80,4,128,Fpu - Racer Car (Voidcom)</body></html>
My question is... Can regsub be used in such a way, that it ignores all text from the first comma and forward.. So it returns this:
Listeners: 4
I hope it is detailed enough this time ;)
Locked