testebr wrote:Can anyone check if whois google search still working?
!g whois bbc.co.uk
In this example the bot don't reply nothing more.
Thank you!
Solved, the problem is for top results it appears google is changing the normal div class=g into an h2 class=r. This might affect other onebox results and I may need to fix them as well, but this solves the issue for whois. I've also corrected search result totals to appear once again. So when searching accurate totals should appear before results are displayed. Local has also been corrected to parse "special locations" properly as well.<speechles> !g whois bbc.co.uk
<sp33chy> 49,100 Results | Whois record for bbc.co.uk (Created on Aug. 01, 1996 and Expires on Dec. 13, 2008) @ http://whois.domaintools.com/bbc.co.uk | CoolWhois.com - WHOIS search of bbc.c @ http://www.coolwhois.com/d/bbc.co.uk | CoolWhois.com - WHOIS search of ns1.rb @ http://www.coolwhois.com/d/ns1.rbsov.bbc.co.uk | bbc.co.uk - Who.is @ http://www.who.is/whois-uk/ip-address/bbc.co.uk/
Code: Select all
# Customized Wikimedia
# allow customized triggers for special wikimedia pages
# Anything other than 0 will enable and will use the list below.
variable wiki_custom 1
# Custom wiki triggers
# This is used to customize triggers for different wikimedia sites.
# The format is "trigger:wikisite.here"
variable wiki_customs {
"swiki:wiki.sabayonlinux.org"
"gwiki:www.gentoo-wiki.com"
"ed:encyclopediadramatica.com"
"un:uncyclopedia.org"
}
This is included in the 1.9.8s update and was requested.<speechles> !gwiki unix
<sp33chy> SECURITY Anonymizing Unix Systems | This text is for any human being out there who wishes to keep their data and doings private from any snooping eye - monitoring network traffic and stealing/accessing the computer including electronic forensics. Hackers, phreakers, criminals, members of democracy parties in totalitarian states, human rights workers, and people with high profiles might be interested in
<sp33chy> this information. It was especially written for novice hackers so they are not so easily convicted when busted for their early curiosity. @ http://www.gentoo-wiki.com/SECURITY_Ano ... ix_Systems
<speechles> !swiki unix
<sp33chy> Linux | Linux (also known as GNU/Linux) is a Unix-like computer operating system. It is one of the most prominent examples of open source development and free software; unlike proprietary operating systems such as Windows or Mac OS, all of its underlying source code is available to the general public for anyone to freely use, modify, and redistribute. According to wikipedia. Linux is the combination of
<sp33chy> the Linux kernel, the GNU set of operating system relates applications, and other FOSS (Free Open Source software) software. Linux is the basis of Gentoo GNU Linux which is the basis of Sabayon GNU Linux or Sabayon Linux. For more information see the Wikipedia link above. @ http://wiki.sabayonlinux.org/index.php?title=Linux
<speechles> !gwiki unix#toc
<sp33chy> SECURITY Anonymizing Unix Systems | ToC: THE AUDIENCE; GOAL; PREREQUISITES; USER DATA; Sensitive user data; Protecting home directories; Traceable user activity; Protecting /var/spool/* files; SYSTEM DATA; Sensitive system data; Traceable system activity; Logging - important and dangerous; Protecting system configs; Computer Memory and sensitive /proc interfaces; DELETE(D) DATA AND SWAP; How to delete
<sp33chy> files in a secure way; How to wipe free disk space; How to handle swap data; How to handle RAM; Temporary data - it is evil; NETWORK CONNECTIONS; HIDING PRIVACY SETTINGS; Mount is your friend; Removable Medias; ???; Final Comments; Example Configuration And Scripts; Crypto Filesystems; Tools; Additional thoughts; Credits; Greetings; Greets to individuals (in alphabetic order):; Greets to groups:; Greets to
<sp33chy> channel members: @ http://www.gentoo-wiki.com/SECURITY_Ano ... ystems#toc
<speechles> !gwiki unix#?
<sp33chy> SECURITY Anonymizing Unix Systems | ??? Any other ideas? Think about it! (and maybe send me your ideas ;-) @ http://www.gentoo-wiki.com/SECURITY_Ano ... #.3F.3F.3F [1 Redirect(s)]
DOH!~ I've corrected that, thanks for spotting it. And about the props, it was an easy addition and makes the script more versatile with less typing so why not make it a reality. Enjoy.eMxyzptlk wrote:Thank you for the custom wiki, you rock dude
P.S: The link in the post above is pointing to http://ereader.kiczek.com/incith-google-v1.98r.tcl instead of http://ereader.kiczek.com/incith-google-v1.98s.tcl ( Previous Version )
They did quite more than update their html templates. They changed the entire query. What it does now is use a php backend to retrieve the search results using cookie and referrer fields, which presently i'd need to investigate how those even work (although i do remember reading a post by user concerning this exact issue) before I could add something to fix it. If you leave any of these details out, your returned html merely contains a "searching..." where normally the results appeared (you can test this yourself, do a !game anything. Now check your eggdrop root for a file named ig-debug.txt, contained within is the html with 'searching...' instead of usable results). I would need to question why gamespot would do something to prevent potential free advertising from any and all index/scrape bots? Gamespot must not be getting enough click-through impressions from people scraping their pages . I've always had direct links to gamespot and every other site scraped appearing within the given results so it isn't blatant theft, it's helping advertise for them imo...Phyxion wrote:GameSpot ain't working anymore speechles. They updated their code once again.
Let me explain why with a slight ethics lesson. Websites wish their content to be viewed on their medium. They sometimes take countermeasures to discourage scraping, which is the method of data retrieval this script uses.pwner wrote:hmm the script is great, but I have a little problem; out of all the features, only a few work for me (google search is gone, wiki, ebay and basically all the good ones ).
Could this be the fault of my shell provider, or my the tcl version I'm currently using?
I'm using incith-google-v1.98s, someone please help...
I haven't written parsers for the template this new server gives. Notice the search.ebay.com becomes shop.ebay.com, this server uses a new template design not supported at the moment. Only the search.ebay.com template is supported presently.<bot> redirected: http://search.ebay.com/dog_W0QQpqryZdog -> http://shop.ebay.com/items/_W0QQ_nkwZdo ... omZQQ_mdoZ
<bot> url: http://shop.ebay.com/items/_W0QQ_nkwZdo ... omZQQ_mdoZ charset: iso8859-1 encode_string: iso8859-1
This means google will only allow you to use its services if you can complete their captcha requirement given on the sorry.google.com page. This is some problem between you and google. The other google based sites may not work either for you because something identifies you as malicious possibly. This is beyond my control, contact google.<bot> redirected: http://www.google.com/search?hl=&q=anyt ... _all&num=1 -> http://sorry.google.com/sorry/?continue ... %26num%3D1
<bot> url: http://sorry.google.com/sorry/?continue ... %26num%3D1 charset: utf-8 encode_string:
The search url still works and the info is also in the page but just build up different (You can check using Firefox -> View page source). But since I don't understand a lot from TCL (regexp etc dont understand anything of it unfortunatly) I can't help.speechles wrote:They did quite more than update their html templates. They changed the entire query. What it does now is use a php backend to retrieve the search results using cookie and referrer fields, which presently i'd need to investigate how those even work (although i do remember reading a post by user concerning this exact issue) before I could add something to fix it. If you leave any of these details out, your returned html merely contains a "searching..." where normally the results appeared (you can test this yourself, do a !game anything. Now check your eggdrop root for a file named ig-debug.txt, contained within is the html with 'searching...' instead of usable results). I would need to question why gamespot would do something to prevent potential free advertising from any and all index/scrape bots? Gamespot must not be getting enough click-through impressions from people scraping their pages . I've always had direct links to gamespot and every other site scraped appearing within the given results so it isn't blatant theft, it's helping advertise for them imo...Phyxion wrote:GameSpot ain't working anymore speechles. They updated their code once again.
If you can tell me what you think, it would help. Is it immoral and wrong to scrape a website, when it is obvious that website is trying to eradicate scraping? If so, then it wouldn't be just of me to turn this script into something illicit (like heroin) where it's traded more for what it does wrong, then what it does right... If we all are damned and going to hell anyways, then we can soullessly and callously scrape them to death and update to a cookie/referrer approach rather than a simple query. Depends on what the object of this script is which I leave solely up to each and every one of you. The people using the script.
I can't believe you just said that...You fail to understand how eggdrop works. Sure, it works on firefox because firefox can supply the cookie and referrer fields. It DOES NOT work on eggdrop until I supply those requirements. There IS NO search data to search for. There IS ONLY a static "searching..." message. Don't believe me? Check this out! Now where are the results to parse? There aren't any. Do you see what I've been saying all along now?Phyxion wrote:The search url still works and the info is also in the page but just build up different (You can check using Firefox -> View page source). But since I don't understand a lot from TCL (regexp etc dont understand anything of it unfortunatly) I can't help.speechles wrote:They did quite more than update their html templates. They changed the entire query. What it does now is use a php backend to retrieve the search results using cookie and referrer fields, which presently i'd need to investigate how those even work (although i do remember reading a post by user concerning this exact issue) before I could add something to fix it. If you leave any of these details out, your returned html merely contains a "searching..." where normally the results appeared (you can test this yourself, do a !game anything. Now check your eggdrop root for a file named ig-debug.txt, contained within is the html with 'searching...' instead of usable results). I would need to question why gamespot would do something to prevent potential free advertising from any and all index/scrape bots? Gamespot must not be getting enough click-through impressions from people scraping their pages . I've always had direct links to gamespot and every other site scraped appearing within the given results so it isn't blatant theft, it's helping advertise for them imo...Phyxion wrote:GameSpot ain't working anymore speechles. They updated their code once again.
If you can tell me what you think, it would help. Is it immoral and wrong to scrape a website, when it is obvious that website is trying to eradicate scraping? If so, then it wouldn't be just of me to turn this script into something illicit (like heroin) where it's traded more for what it does wrong, then what it does right... If we all are damned and going to hell anyways, then we can soullessly and callously scrape them to death and update to a cookie/referrer approach rather than a simple query. Depends on what the object of this script is which I leave solely up to each and every one of you. The people using the script.
testebr wrote:Test -> Max Payne
The above comes from:{"search_results":"<div class="sort_results">\n <select class="{'term':'max payne','type':'game'
,'offset':false,'track':true}">\n <option selected="selected" value="rank">Sort By Rank<
\/option>\n <option value="date">Sort By Date<\/option>\n \n <option value
="score">Sort By Score<\/option>\n <\/se.....
Read my reply. I already know this...testebr wrote:Read my reply above (I edited).
See the problem? The script merely does a single page load. Which can get the http headers. The script will need to do a second request to the search_ajax.php url filling in the request headers correctly to retrieve any search results. The cookie session is all that matters notice the referring site is egghelp and I still got successful search data in the browser.gamespot wrote:Response Headers
Date Sun, 17 Aug 2008 18:08:12 GMT
Server Apache
Accept-Ranges bytes
X-Powered-By PHP/5.2.5
Set-Cookie gspot_side_081708=4; expires=Wed, 20-Aug-2008 18:08:12 GMT; path=/; domain=.gamespot.com
Keep-Alive timeout=300, max=990
Connection Keep-Alive
Transfer-Encoding chunked
Content-Type text/html; charset=ISO-8859-1
Request Headers
Host www.gamespot.com
User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16
Accept text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Connection keep-alive
Referer http://forum.egghelp.org/viewtopic.php?p=84640
Cookie gspot_side_081408=100; geolocn=NzAuMTMyLjAuOTE6ODQw; XCLGFbrowser=Cg8ILkh0Qr9HAAAAXg8; mbox=PC#1216060154750-11875#1280814507|session#1217742433671-451299#1217744367|check#true#1217742567; __qca=4869b91b-5b1c2-cf30b-ab8d7; MADCAPP=083B3d:1; __utmz=14953632.1217742436.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=14953632.3523376941471426000.1217742436.1217742436.1217742436.1; gspot_promo_081408=1; gspot_promo_081608=1; gspot_side_081608=2; u_srv_0_0=-1; __qcb=1709914989; gspot_side_081708=3
Cache-Control max-age=0
That's what I meant too speechles.testebr wrote:Test -> Max Payne
The problem is not with referrer, but with javascript ajax result :]
Try disable javascript in your browser and test it.