Alphacarbon wrote:I suppose my question was more of how to fix this problem from the bot's end...
If my bot is being sorried, does this mean that the script altogether is no longer useful? And if others are still getting it to work, where is the discripancy?
I will look at the link you referenced. Maybe I can find my answer there.
The problem is not the script, and it is not your bot. The problem is the IP used to make the query. There are potentially hundreds upon thousands of others experiencing the same issue as you.
Below is merely a teaspoon full of others having the same problem:
http://blog.wired.com/monkeybites/2008/ ... s-sor.html
http://groups.google.com/group/Google_W ... 1648fbd316
http://www.boingboing.net/2008/01/10/gu ... le-is.html
http://www.mydigitallife.info/2007/11/2 ... gle-error/
http://digg.com/security/Google_s_Sorry_You_re_Infected
Basically, Google in it's infinite wisdom, has assumed (either by lack of cookie sessions, lack of referrer, lack of headers passed, etc, etc) that the IP source contributing the query (your bot) appears to be scripted/automated. There is very little to do. Google will issue a captcha based picture depending on the session id passed within the query when it initially redirects the query (302). After which, the captcha is visible. Once entry of the captcha is done, it is checked against the hash passed at the end of the redirected query session. The only way to have this work would be to have users use their 'links' browser to generate a valid session id and cookie on the machine the bot is on. Then use the data contained within that cookie to validate google searches with the script. This is not as easy to do as it sounds as the cookie sessions will expire. To renew them would require doing the browser thing again, generating the cookie again. Very very tedious...
You can read about it here, this is Google's official statement on the matter:
http://www.google.com/support/bin/answe ... swer=86640
If you have never witnessed a google sorry page, well, there are major flaws within google's sorry policy...
1) Our query below is simply
forum topic and we start at the first page. This google will allow:
http://www.google.com/search?&q=forum+topic&start=1
2) Now we will keep the same
forum topic query but instead, let's go straight to page 18 (10 results per page, 10 x 18 = 180):
http://www.google.com/search?&q=forum+topic&start=180
For 1, most everyone should be able to easily see the results in any browser. For 2, everybody should see that Google sorry page and without a captcha. This is how your treated when things go awry. Google hasn't even issued a captcha to allow the query to be fulfilled, for anyone. If this isn't censorship than what is? Basically Google has censored out anyone from seeing beyond the 17th page for 'forum topic'.