This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

striping html tags

Old posts that have not been replied to for several years.
Locked
O
Ofloo
Owner
Posts: 953
Joined: Tue May 13, 2003 1:37 am
Location: Belguim
Contact:

striping html tags

Post by Ofloo »

how do i remove html tags from a string..

example:
text <a href="http://host.tld/">desc</a> text => text http://host.tld/ text
i know this is been discussed before but i can't seem to find anything like this in the forum..

i tryed several querys but none outputted the result i was looking for

i checked regsub but can't put my finger on it ..
XplaiN but think of me as stupid
g
greenbear
Owner
Posts: 733
Joined: Mon Sep 24, 2001 8:00 pm
Location: Norway

Post by greenbear »

Something like this ?

Code: Select all

set html {text <a href="http://host.tld/">desc</a>}
regexp {(.*).<a.*="(.*)">(.*)</a>} $html garbage text url desc
O
Ofloo
Owner
Posts: 953
Joined: Tue May 13, 2003 1:37 am
Location: Belguim
Contact:

Post by Ofloo »

i knew that .. but that doesn't exactly solve my problem
% set html {text <a href="http://host.tld/">desc</a> test text <a href="http://host.tld/">desc</a> test}
text <a href="http://host.tld/">desc</a> test text <a href="http://host.tld/">desc</a> test
% regexp {<a href=\"(.*)\">(.*)</a>} $html a b
1
% puts $a
<a href="http://host.tld/">desc</a> test text <a href="http://host.tld/">desc</a>
% puts $b
http://host.tld/">desc</a> test text <a href="http://host.tld/
%
see i need only the urls .. so i can remove them ..

a string can contain more then one url tag ..

Code: Select all

proc tagRemove {arg} {
  set arg [string map {\\ \\\\ \[ \\\[ \] \\\] \{ \\\{ \} \\\} \( \\\( \) \\\) \" \\\"} $arg]
  regexp {<a href=\"(.*)\">(.*)</a>} $arg a b
  set arg [string map [list $a $b] $arg]
}
the string above didn't do that as well but tnx for tyring..
XplaiN but think of me as stupid
g
greenbear
Owner
Posts: 733
Joined: Mon Sep 24, 2001 8:00 pm
Location: Norway

Post by greenbear »

ah, guess i didn't understand you correctly

this code should remove all the html tags

Code: Select all

regsub -all {<(.|\n)*?>} $html {} clean
O
Ofloo
Owner
Posts: 953
Joined: Tue May 13, 2003 1:37 am
Location: Belguim
Contact:

Post by Ofloo »

tnx this worked not exactly as i wanted but it worked ;)

it didn't returned the url from href="<here>" but thats ok ..
XplaiN but think of me as stupid
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Post by user »

looks like you're looking for the non-greedy quantifiers.. try .*? instead of .* :)
Have you ever read "The Manual"?
O
Ofloo
Owner
Posts: 953
Joined: Tue May 13, 2003 1:37 am
Location: Belguim
Contact:

Post by Ofloo »

% puts $html
text <a href="http://host.tld/">desc</a> test text <a href="http://host.tld/">desc</a> test
% regexp {<a href=\"(.*?)\">(.*?)</a>} $html a b
1
% puts $a
<a href="http://host.tld/">desc</a>
% puts $b
http://host.tld/
%
verry nice
XplaiN but think of me as stupid
Locked