TODAY -

E-Pao! Education :: World Wide Web: A Digital Library

World Wide Web: A Digital Library
By: Sanasam Ranbir Singh *



In the span of a decade, the World Wide Web, in short WWW has grown from a small research project into a vast repository of information and a new medium of communication. Today, many organizations (Governmental, Non-Governmental), Institutions, Business organizations etc., use Web as their information media and Internet has become an indispensable part of day-to-day life. Unlike other great networks of the past century, - such as the electronic power grid, the telephone system, or the highway and rail system, - the Web does not have an engineered architecture. Rather, it is a virtual network of content and hyperlinks with over a billion interlinked “pages” created by the uncoordinated actions of tens of millions of individuals.

WWW can be viewed as a graph, in which each node represents the page and edges connecting the nodes are the hyperlinks. The topology of this graph determines Web’s connectivity and consequently how effectively can we locate information on it. But, its enormous size estimated at least 8x108 documents and rapidly growing poses a big challenge to search related pages for specific topic. Major search engine companies have often claimed that they can keep up with the size of the Web. But, due to its continuous changing of documents, links and rapidly growing nature make it impossible to catalogue all the vertices and edges. The estimated coverage of the web by some search engines is given in table below.

Search EngineCoverage (%)
HotBot57.5
AltaVista46.5
Northern Light32.9
Excite23.1
Infoseek16.5
Lycos4.41


Table 1: Estimated coverage of each search engine (average over 575 queries performed during 15 to 17 December 1997) [10].

Because of the decentralized nature of its growth, the Web has been widely believed to lack of structure and organization as a whole. Analysis of the Web’s network of hyperlinks have revealed an intricate structure that is providing to be valuable for organizing information, improving search methods and understanding the Web in a broader technological and serial context. HITS [2], SALSA[3] are some of the widely used hyperlink structure based algorithms and surprisingly found to provide good results. Many studies like PHITS[4], PageRank[1] etc. are farther studies on HITS. Several studies like spectral filtering[7], HyCon[6] find that combining content and link structure of the web provides better result. Our recent study, confidence based web search[6] finds to provide surprisingly good result by monitoring users’ web page access records. Precise classification of the web pages into categories is a key factor in web indexing that plays a major role to improve web search performance and provide quality pages. LSA (Latent semantic analysis), PLSA (probabilistic LSA), k-nearest neighbor etc. are some of the algorithms used to classify web pages.

A recent study [8] indicates that the Web contains a large, strongly connected core in which every page can reach every other by a path of hyperlinks. This core contains most of the prominent sites on the Web. The remaining pages can be characterized by their relation to the core: Upstream nodes can reach the core but cannot be reached from it, downstream nodes can be reach from the core but cannot reach it and “tendrils” contain nodes that can neither reach the core nor be reached from the core.

Searching for relevant pages of specific topic from this huge library poses a large degree of complexity due to its unstructured and dynamic in nature. Available search engines can be broadly classified into two such as content-based search engine and citation-based search engine based on the mechanisms used. Content-based search engines use only the document content and use them for key word matching operation. Users query topic is used as the keyword and find out the documents containing the keyword in the web. Some of the full text content-based search engines are Lycos (www.lycos.com), Alta Vista (www.altavista.digital.com), Excite (www. Excite.com), HotBot (www.hotbot.com), In-foseek (www.infoseek.com), Northern Light (www.nlsearch.com) etc. Several researches find that content-based search engines often provide poor quality results. Hyperlink structure based search engines use citations as well as content of the pages and provide better results than content-based search engines. Google (www.google.com), clever etc. are some of the hyperlink-based search engines.

There are many issues like extraction of the features from the pages, organizational structure of web, identifying community of pages, crawling the web, large-scale search engine and its architecture, web structure, personalized web search, page ranking methods, optimizing web structure, web indexing etc. which are required for better web mining. Due to good amount of resources for research in Web, many researchers are attracted into this area.

References:
[1] Sergey Brin, Lawrence Page. The anatomy of a large-Scale hypertextual Web search engine. In Proc. 7th Int. World Wide Web Conf., 1998.
[2] J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of ACM (JASM), 46, 1999.
[3] R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In 9th Int. WWW Conference, Amsterdam, Nrtherlands, May 2000.
[4] D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. Preprint, 2000.
[5] D. Mukhopadhyay, D. Giri, S. R. Singh. A Confidence Based Methodology to Deduce User Oriented Page Ranking in Searching the Web. Preprint, 2003.
[6] D. Mukhopadhyay, S. R. Singh. HyCon: A Hyperlink and Content based Topic Search Technique. Preprint, 2003.
[7] S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalam, A. Tomkins. Spectral filtering for resource discovery. Preprint, 1998.
[8] Jon Kleinberg, Steve Lawrence. The structure of the Web. In Science Vol. 294, 30 November 2001.
[9] S. Deerwester et. al. Indexing by latent semantic analysis. Journal of the Society for Information Sc. 1990.
[10] Stave Lawrence and C. Lee Gilis. Searching the World Wide Web. Science, Vol.280, 3 April 1998. www.science.com.


Sanasam Ranbir Singh is a scholar at the Dept. Of Comp. Sc. & Engg. Haldia Institute Of Technology, West Bengal, India

* Comments posted by users in this discussion thread and other parts of this site are opinions of the individuals posting them (whose user ID is displayed alongside) and not the views of e-pao.net. We strongly recommend that users exercise responsibility, sensitivity and caution over language while writing your opinions which will be seen and read by other users. Please read a complete Guideline on using comments on this website.




LATEST IN E-PAO.NET
  • The Happiness Code : Download
  • Violence in Manipur 2023-2024 : Timeline
  • NH-2 Bridge bombed @Sapermeina : Gallery
  • Crop/animal for higher productivity in NE #1
  • Training Programme under SPARK concluded
  • Why environment control is so difficult
  • 4th Foundation Day- Young Minds Collective
  • All set for second phase poll
  • The Nongsaba phenomenon
  • Khongjom Day @Khebaching #1 : Gallery
  • India's responsibility to end Manipur violence
  • Migrant worker could access TB services only
  • Importance of reading magazines as student
  • SHG pioneering agricultural innovation
  • Nearing the one year mark
  • The enemy within
  • Id-ul-Fitr @Hatta #2 : Gallery
  • Workshop @ NSU Manipur : Gallery
  • 15th Manipur State Film Awards 2023
  • "ST status for Meetei" at Panthoibi Shanglen
  • GSDP doubles, health shines
  • Vote has been cast, repoll held
  • Two faces of democracy
  • Laurels for Scientist Ngangkham Nimai
  • Crime against women in Manipur
  • "ST status for Meetei" at Sugnu
  • Creativity & innovation for vibrant career
  • 4th Foundation Day of YMC
  • Racing towards one year mark
  • Prophetic words, indeed
  • Nupi Landa Thaunaphabishing #14 :: Book
  • 174th Anniv Maharaj Narasingh #1 : Gallery
  • Ensuring Fair Voting in Hills of Manipur
  • Dr Irengbam Mohendra's latest book :: Rvw
  • NDA has the advantage in both
  • Lok Sabha polls in Manipur #1 : Gallery
  • L Rup's Robot 'Kangleinganbi' in Manipuri
  • Art- means of connecting hearts in Manipur
  • Is it Living Alive or Living Death ? :: Poem
  • Rabies - A preventable zoonotic disease
  • April 19, 2024: The blackest day of all
  • Ugly turns on voting day
  • Children Camp @JNMDA Imphal #2 : Gallery
  • The chasm between TB & HIV continues
  • Parliament and its Members
  • Kimchi for health and glowing skin
  • LS election with a difference
  • To vote, or not to vote ?
  • Sajibu Cheiraoba Chak Katpa #2 : Gallery
  • "ST status for Meetei" at Lamjao, Kakching
  • The Power of Poppy - 27 :: Poem
  • Mother Language based education essential
  • Modi's warriors wear regional hats
  • Nest Asia promoting Northeastern Cuisine
  • Now look beyond LS poll
  • The rot in the system
  • Scientists of Manipur : Laitonjam Warjeet
  • Community seed bank @Umathel : Gallery
  • 10 candidates cracked Civil Services Exam
  • Milk of Paradise: History of Opium : Rvw
  • How plastics find their way into our bodies
  • Condemning attack on Trucks along NH-37
  • Cong looking to buck the trend
  • Saving Manipur
  • Sajibu Cheiraoba: 1 occasion, 2 narratives #2
  • Election Duty :: Travellog
  • 1st Nagas' Meet in Punjab
  • How to select right MP to represent Manipur
  • "ST status for Meetei" at Tejpur
  • Bats are Keystone species for the Planet
  • The '15 days' conundrum
  • Free but not so fair
  • Descent of Radha-Krishna #30: Download
  • Before You Vote : My Rumbling Thoughts
  • "ST status for Meetei" at Kakching
  • Meiraba wins All India Sr tournament
  • Finding light in dark through my daughter
  • Navigating life's unreasonable expectations
  • Test of people's character
  • BJP's election manifesto
  • Athoubasingi Numit #1 : Gallery
  • Black rice & Glycemic Index
  • What Nadda should speak at Dimapur rally
  • Open Letter to CM Office Manipur
  • Meghalaya unveils Strawberry festival
  • Benefits of maths newspapers for students
  • Id-ul-Fitr @Hatta #1 : Gallery
  • Are you a good person ?
  • Physics Academy of NE : Executive Body
  • "ST status for Meetei" at Moirang
  • Cherrapunji Eastern Craft Gin launched
  • Cong on cautious path
  • Botox for Hair
  • Posers voters should raise now
  • The lull before the storm
  • 80th Anniv- Battle of Kanglatongbi @UK
  • Vir Chakra Ngangom Joydutta's bust unveiled
  • Hun - Thadou Cultural Festival : Gallery
  • "ST status for Meetei" at Singjamei
  • Election Eclipses: Ballad of Battle & Loss
  • Our voices are equal at the ballot box
  • Scientists of Manipur : Ngangkham Nimai
  • Urgent Call for Solidarity in Manipur
  • Meitei Nongsha #2 :: An Artwork
  • "ST status for Meetei" at Waikhong
  • About NPF-BJP-NPP alliance & why ?
  • World Veterinary Day, 2024
  • The heavy stake behind the LS polls
  • The politics of lying & deception
  • Sajibu Cheiraoba Chak Katpa #1 : Gallery
  • Hun-Thadou Cultural Fest @ Delhi: Report
  • Appeal to Parties & Candidates
  • "ST status for Meetei" at Wangoo
  • Establishment of community seed bank
  • Awareness Programme on new Criminal Laws
  • Make a right choice at the Lok Sabha election
  • Sajibu Cheiraoba: 1 occasion, 2 narratives #1
  • RIST talk-58 : Support systems of elderly
  • "ST status for Meetei" at Hiyanglam
  • Vote, do not boycott !
  • Lok Sabha election: A new dawn in politics ?
  • IIT-Guwahati Half Marathon report
  • Taking ST demand to the election ring
  • Lesson to be learnt from across border
  • Mirabai: Poised for Paris Olympics
  • Legal position for protection of environment
  • "ST status for Meetei" at Keisamthong
  • Heterocyclic compound & biochemical science
  • Inner, torn between two lovers
  • Certification Music Therapy Workshop
  • NOTA as a choice
  • Caesar's wife must be above suspicion
  • Descent of Radha-Krishna #29: Download
  • World Health Day 2024
  • "ST status for Meetei" at Pangantabi
  • The Power of Poppy - 26 :: Poem
  • Fulbright Fellowship Outreach at Arunachal
  • Id-ul Fitr da namaz nattana..
  • Nupi Landa Thaunaphabishing #13 :: Book
  • Lok Sabha election is coming, be prepared
  • 6th Hun-Thadou Cultural Festival
  • Let There Be Free & Fair Election
  • "ST status for Meetei" at Lamlong
  • Science magazines are important for student
  • Interesting choice of candidates
  • The power of We, the voters
  • Inspirations from Scientists of Manipur #1
  • The Case for Amendment of Article 371-C
  • Meitei Nongsha #1 :: Artwork
  • Link between forest & conflict in Manipur
  • Final Call for Application MFA - Phase-2
  • ST for Meiteis call before elections
  • Passing the buck
  • Beating of the Retreat #1 : Gallery
  • Life of our Lives in Ethnic Strife Era! :: Poem
  • IIT-Guwahati annual Half Marathon
  • Follow up: European Parliament on Manipur
  • Yoga & Kegel exercise: Pelvic floor workout
  • Opting for the NOTA button
  • Yearning of the displaced people
  • Kenedy Khuman (Singer) : Gallery
  • 5th NE Women's Peace Congregation
  • World Autism Awareness Day 2024
  • Election fever grips Manipur despite unrest
  • Looking for a decent election hustings
  • Clock ticking towards voting day
  • An exemplary directive
  • Children Camp @JNMDA Imphal #1 : Gallery
  • Memo to Election Commission of India
  • Easter & Holi echo in Nilgiris
  • Holiday Camp for children at JNMDA, Imphal
  • Zero waste is our moral responsibility
  • Elections & loyalty vis-a-vis Manipur crisis
  • Show of strength without unity
  • Yaoshang Pichakari #2 : Gallery
  • Panthoi Chanu : 1st to play in Australia
  • Intensive labs in film preservation
  • Building bridges with books
  • Need of the hour: Political maturity
  • Accepting defeat before the election
  • Descent of Radha-Krishna #28: Download
  • April Calendar for Year 2024 : Tools
  • Natural packaging from bamboo : Gallery
  • The Power of Poppy - 25 :: Poem
  • Everyone has their own Bharat Ratna
  • Nupi Landa Thaunaphabishing #12 :: Book
  • Demand- Manipuri as classical language
  • The Drummer from Odisha
  • Beauty benefits of lemon
  • Yaoshang Mei Thaba #2 : Gallery
  • Manipur's original Ponies : Gallery
  • Yaoshang & Dance of Democracy loom
  • Symposium on Jagadguru Shankaracharya
  • Choosing ITI as a campus after X
  • Yaoshang Pichakari #1 : Gallery
  • Yaoshang @Nabadwip Dham : Gallery
  • How oral health affects your pregnancy
  • Two faces of Holi
  • Prawaas 4.0, Multimodal Transport Show
  • A decade of development of higher education
  • Yaoshang Mei Thaba #1 : Gallery
  • Our Eternal Kangleipak :: Poem
  • Micro-livestock for livelihoods: For NE States
  • The fun of Holi used to be monotonous
  • 2nd Annual Art Exhibition #1 : Gallery
  • About the "Meitei" community from Manipur
  • Unveiling the medicinal benefits of honey
  • The incalculable value of wildlife
  • Promises of true love
  • Trends, Alliances, & Challenges in Elections
  • Meitei Goddess Ngaleima : An Artwork
  • Lamta Thangja @ Imphal : Gallery
  • Meira Paibis of Manipur
  • North East Film Festival #2 : Gallery
  • Students @ Class X Exam : Gallery
  • Saroi Khangba @ Kangla : Gallery
  • Protest for scrapping SoO #2 :Gallery
  • Shopping List for Shivaratri : Gallery
  • N Tombi Equestrian C'ships #1 : Gallery
  • Featured Front Page Photo 2024 #1: Gallery
  • Radio E-pao: Manipuri Film OST (130+ song)
  • Save Manipur : Protest [Feb 15] #3 : Gallery
  • Naorem Roshibina- Wushu Medallist : Gallery
  • GHOST of PEACE :: Download Booklet
  • List of Kings of Manipur: 33 - 1984 AD