Parsing from webcontent

Elfriede · Post by **Elfriede** » Wed Jan 26, 2011 10:14 am

Hi @ all

I'd like to parse ->

<a href="/genre/Action">Action</a> <span>|</span> <a href="/genre/Adventure">Adventure</a> <span>|</span> <a href="/genre/Comedy">Comedy</a>

the genres, like Action/Adventure/Comedy. What i have

Code: Select all

	foreach line [split $content \n] {

if {[regexp -nocase {<a\shref="\/genre\/(.*)">(.*)<\/a>} $line match genre1 genre2 genre3]} {

and i knwo, that is bad. Next problem: Im not knowing at the beginning how many genres ill have to parse. it can be just one or up to 4 or somethg like that.

Thank you !

Trixar_za · Post by **Trixar_za** » Wed Jan 26, 2011 12:15 pm

Could you post the link to the website? Might be easier to see how it handles different kinds of input and how it changes the code.

foreach is a good start btw, but why don't you add each match to the end of a single variable like set real_var "$real_var|$match" - this way you don't need to make the regex match do all the work and you get a near unlimited amount of genres you could add so long as they match the regex. They'll all end up looking like Adventure|Action|Fantasy with the above example.

Elfriede · Post by **Elfriede** » Wed Jan 26, 2011 12:39 pm

http://www.imdb.com/title/tt0942385/

set real_var "$real_var|$match"

Sounds good

Im excited to see how that proc part will look like