Googlebot spidering simple GET forms
Nov 10, 2007
I’ve noticed some interesting behavior from Googlebot at work this week. I recently changed the forms on the teacher trainer lookup page to submit via GET, because that is the more appropriate method for the function (searching/lookup) and it makes it easier to track usage via the site’s request logging (unique URLs, instead of custom logging for POSTs made on this page). Since then, I’ve seen people from Google searches landing directly on the results pages, such as: http://suzukiassociation.org/teachers/training/trainers/?state=BC
Looking at the log files, Googlebot seems to be indexing these pages by submitting the forms with all the options in the select inputs. (Here, just state or instrument—only about 75 possibilities.) I’m not the only one to notice this:
On this particular page, having Google index all the search results pages is desirable—I’ve already seen a number of people find these pages in Google by searching on specific trainer names, which would not have been possible before. But there are many occasions I wouldn’t want Google to index form submission results: where there are a large number of possibly useless pages generated (such as submitting random queries to our site search), or submitting registration forms, causing junk records I would need to remove from the database. I hope Google is going to be intelligent about spidering forms—there are a lot of forms out there that are really for humans only!
- Prev: Harvest Moon
- Next: MySQL GROUP_CONCAT()
1 Comments
UlrichMar 27, 2008
I found this page looking for form-related Googlebot problems … on my site, I just discovered that Googlebot even starts to (re)submit completely filled-out forms . Like a user would fill out a support request (yes, it is a GET-form) and a couple minutes later the exact same request would come in a second time from googlebot. This means that google does not only follow GET forms with values from select boxes etc, but it also resubmits URLs with that it must have taken from the client side (from the toolbar, Firefox ext. or wherever else). This is spooky—I imagine what happens if someone uses GET-style links in an admin context. IMHO this is one step too far for a search robot.
Feed: Comments
Leave A Comment
Things to know:
[i] [b] [q] [code]and[url=]