Shukla Karishma*, Mahesh Singh


This paper discusses about the Hidden web. The vast expanses of the Web are completely invisible to search engines. Even worse, this "Invisible Web" is in all likelihood growing significantly faster than the visible Web you're familiar with. The Invisible Web is made up of information stored in databases. Unlike pages on the visible Web, information in databases is generally inaccessible to the software spiders and crawlers that compile search engine indexes". Here in this Paper I discuss the existence of a hidden or “deep Web" with approximately 500 billion individual documents, most of which are available to the public but not accessible through conventional search engines. That's because many of these documents use frames or are in database-driven Web sites such as eBay, Amazon.com, and the Library of Congress, which the spiders can't crawl. Here I discuss the different issues related to invisible Web and different existent strategies to crawl the deep web. Next I try to give some novel idea to crawl the deep web i.e. Personalized Crawler.


Deep/Invisible or Hidden Web, Crawlers, Spiders.

Full Text:


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
© 2010-2024 International Journal of Mathematical Archive (IJMA)
Copyright Agreement & Authorship Responsibility
Web Counter