The last bit of your regexp doesn't take into account the carriage returns in the page:
You have:
</table></td>
The webpage has:
You probably don't need the </td> anyway, as the html is very simple, so this bit should get your data:
Code: Select all
regexp {table width="100%">(.*?)</table>} $html match data
Then you'll have to clean things up. One thing to note, putcmdlog won't show you data if there's carriage returns, so you can either strip them out first with regsub, or use a foreach loop:
Code: Select all
foreach line [split $data \n] {
putcmdlog "line '$line'"
}
Once you have your block of data, this should clean up the data and give you just the movie titles:
Code: Select all
regsub -all {<td width="43%" valign="top">(.*?)</td>} $data {\1} outputvar
foreach line [split $outputvar \n] {
set line [string trim $line]
if {$line != ""} {
putcmdlog "line '$line'"
}
}
One peculiarity I've noticed when using regsub's substitutions (like \& \0 \1 etc) is it inserts bold codes and IIIIIII's into the output.. string trim seems to work to get rid of them in otherwise empty lines. You'll probably see it in the raw output.