• PERSONALIZED WEB CRAWLER FOR INVISIBLE WEB
Abstract
This paper discusses about the Hidden web. The vast expanses of the Web are completely invisible to search engines. Even worse, this "Invisible Web" is in all likelihood growing significantly faster than the visible Web you're familiar with. The Invisible Web is made up of information stored in databases. Unlike pages on the visible Web, information in databases is generally inaccessible to the software spiders and crawlers that compile search engine indexes". Here in this Paper I discuss the existence of a hidden or “deep Web" with approximately 500 billion individual documents, most of which are available to the public but not accessible through conventional search engines. That's because many of these documents use frames or are in database-driven Web sites such as eBay, Amazon.com, and the Library of Congress, which the spiders can't crawl. Here I discuss the different issues related to invisible Web and different existent strategies to crawl the deep web. Next I try to give some novel idea to crawl the deep web i.e. Personalized Crawler.
Keywords
Deep/Invisible or Hidden Web, Crawlers, Spiders.
Full Text:
PDFThis work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
© 2010-2022 International Journal of Mathematical Archive (IJMA) Copyright Agreement & Authorship Responsibility |