This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.
For more information, see this announcement post . Click the X in the top right-corner of this box to dismiss this message.
Help for those learning Tcl or writing their own scripts.
ComputerTech
Master
Posts: 399 Joined: Sat Feb 22, 2020 10:29 am
Contact:
Post
by ComputerTech » Tue Mar 30, 2021 12:57 am
So i am trying to retrieve the entire code from this https:://google.com/search?q=lego
Code: Select all
bind PUB - "!test" the:test
package require http
package require tls
proc the:test {nick host hand chan text} {
http::register https 443 [list ::tls::socket]
set url "https://www.google.com/search?q=lego"
set data [::http::data [::http::geturl "$url" -timeout 10000]]
::http::config -useragent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0"
foreach lines2 $data {putserv "PRIVMSG $chan :$lines2"}
http::unregister https
}
And i am getting this
Code: Select all
<Tech> <HTML><HEAD><meta
<Tech> http-equiv="content-type"
<Tech> content="text/html;charset=utf-8">
<Tech> <TITLE>302
<Tech> Moved</TITLE></HEAD><BODY>
<Tech> <H1>302
<Tech> Moved</H1>
<Tech> The
<Tech> document
<Tech> has
<Tech> moved
<Tech> <A
<Tech> HREF="https://www.google.com/sorry/index?continue=https://www.google.com/search%3Fq%3Dlego&q=EhAmB1MAAGEA2QAMAAAAAAAAGIDuioMGIhkA8aeDS7Cl4MTYJvxJOGvj5SyvlN0tmGEIMgFy">here</A>.
<Tech> </BODY></HTML>
ComputerTech
CrazyCat
Revered One
Posts: 1306 Joined: Sun Jan 13, 2002 8:00 pm
Location: France
Contact:
Post
by CrazyCat » Tue Mar 30, 2021 1:55 am
This is because you didn't think about potential redirections (as 301 or 302), and don't analyse the status.
Your line:
Code: Select all
set data [::http::data [::http::geturl "$url" -timeout 10000]]
The better way (not the best):
Code: Select all
set tok [::http::geturl $url]
if {[::http::ncode $tok]==301 || [::http::ncode $tok]==302} {
// this is a redirection
} else {
set data [::http::data $tok]
}
You can also use ::http::status and other infos to know if you are on the good page.
Have a look on
https://www.tcl.tk/man/tcl8.4/TclCmd/http.htm
ComputerTech
Master
Posts: 399 Joined: Sat Feb 22, 2020 10:29 am
Contact:
Post
by ComputerTech » Tue Mar 30, 2021 2:27 am
Thanks CrazyCat will try that
ComputerTech
ComputerTech
Master
Posts: 399 Joined: Sat Feb 22, 2020 10:29 am
Contact:
Post
by ComputerTech » Tue Mar 30, 2021 4:16 pm
Tried your suggestion CrazyCat,
Code: Select all
bind PUB - "!test" the:test
package require http
package require tls
proc the:test {nick host hand chan text} {
http::register https 443 [list ::tls::socket]
set url "https://www.google.com/search?q=lego+ninjago"
set tok [::http::geturl $url]
if {[::http::ncode $tok]==301 || [::http::ncode $tok]==302} {
putserv "PRIVMSG $chan :FAIL"
} else {
set data [::http::data $tok]
}
::http::config -useragent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0"
foreach lines2 $data {putserv "PRIVMSG $chan :$lines2"}
http::unregister https
}
Results
Code: Select all
20<ComputerTech>30 !test
18<Tech18> FAIL
Google still thinks i am a bot, any ideas to bypass this?
ComputerTech
CrazyCat
Revered One
Posts: 1306 Joined: Sun Jan 13, 2002 8:00 pm
Location: France
Contact:
Post
by CrazyCat » Tue Mar 30, 2021 5:46 pm
Google don't think you're a bot, google redirects you to a version you can read (without javascript).
Code: Select all
set tok [::http::geturl $url]
if {[::http::ncode $tok]==301 || [::http::ncode $tok]==302} {
set meta $tok(meta)
set data [::http::data [::http::geturl $meta(Location)]]
} else {
set data [::http::data $tok]
}
Note that this system works only if there is just one redirection.
And I don't understand why you do
::http::config -useragent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0" after having used ::http ? The ::http::config must be at the initialisation of ::http