This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Parsing from webcontent

Help for those learning Tcl or writing their own scripts.
Post Reply
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Parsing from webcontent

Post by Elfriede »

Hi @ all :)

I'd like to parse ->

Code: Select all

<a href="/genre/Action">Action</a> <span>|</span> <a href="/genre/Adventure">Adventure</a> <span>|</span> <a href="/genre/Comedy">Comedy</a>
the genres, like Action/Adventure/Comedy. What i have

Code: Select all

	foreach line [split $content \n] {

if {[regexp -nocase {<a\shref="\/genre\/(.*)">(.*)<\/a>} $line match genre1 genre2 genre3]} {
and i knwo, that is bad. Next problem: Im not knowing at the beginning how many genres ill have to parse. it can be just one or up to 4 or somethg like that.

Thank you !
User avatar
Trixar_za
Op
Posts: 143
Joined: Wed Nov 18, 2009 1:44 pm
Location: South Africa
Contact:

Post by Trixar_za »

Could you post the link to the website? Might be easier to see how it handles different kinds of input and how it changes the code.

foreach is a good start btw, but why don't you add each match to the end of a single variable like set real_var "$real_var|$match" - this way you don't need to make the regex match do all the work and you get a near unlimited amount of genres you could add so long as they match the regex. They'll all end up looking like Adventure|Action|Fantasy with the above example.
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Post by Elfriede »

http://www.imdb.com/title/tt0942385/
set real_var "$real_var|$match"
Sounds good :) Im excited to see how that proc part will look like
Post Reply