There are many parts of the web that Googlebot has not been able to access, but Google has been working to shrink that. Google wants to find content, and while many webmasters do not make it easy, Googlebot finds a way.
1. Crawling flash!
Adobe announced today that they have released technology and information to Google and Yahoo enabling them to crawl flash files. It may take the search engines some time before they are able to integrate and implement these abilities, but a time is coming where rich media is less of a liability. I wonder if MSN/Live was left out to prevent them from reverse engineering Flash for their new silverlight competitor? At any rate, MSN is still working on accessing text links, so let’s not swamp them.
2. Crawling forms
Googlebot recently started filling out forms on the web in an attempt to discover content hidden behind jump menus and other forms. See our previous article if you’d like to keep Google out of your forms.
3. Working with Government entities to make information more accessible
A year or so ago, Google started providing training to government agencies to assist them in getting their information onto the web. I’m assuming much of the information has been hidden by URLs with large amounts of parameters.
4. Crawling JavaScript
Many menus and other dynamic navigation features have been created in JavaScript, and googlebot has started crawling those as well. Instead of relying on webmasters to provide search friendly navigation, Google is finally getting to access sites created by neophyte webmasters that haven’t been paying attention.
5. Google’s patent to read text in images
Google also knows many newbie webmasters use text buttons for navigation. By attempting to read text in images, the Googlebot will once again be able to open up previously inaccessible areas of a site.
6. Inbound links
Of course, Googlebot has always been great at following inbound links to new content. Much of the invisible web has been discovered just through humans linking to a previously unknown resource.
7. Submission
Of course, you can always submit a page location of currently invisible content to Google. This is usually the slowest way, especially compared to inbound links.
8. Google toolbar visits, analytics
Recently, many Denver SEO professionals have noticed links being indexed that have not been submitted. The only plausible explanation was that Google has been mining it’s toolbar and analytics for information about new URLs. Be careful – Google is watching and sees all!
9. Sitemap.xml files
The somewhat new stemap.xml protocol is very helpful for webmasters and googlebots alike in getting formerly invisible content into google’s hands.