- PRESENTED BY:
- Group No:13
- Jai Mashalkar 113050007
- Khushraj Madnani 113050041
- Lahari Poddar 113050029
- Semantic search seeks to improve search accuracy by understanding searcher’s intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.
- MOTIVATION FOR SEMANTIC SEARCH
- SEMANTIC SEARCH TECHNOLOGY
- Semantically Relatable Sets
- Query Expansion
- Relevance Feedback
- SEMANTICALLY RELATABLE SETS
- A semantically relatable set (SRS) of a sentence is a group of unordered words in the sentence (not necessarily consecutive) that appear in the semantic graph of the sentence as linked nodes.
- {CW,CW}
- {CW,FW,CW}
- {FW,CW}
- CW: Content Word or Clause
- FW: Function Words
- Example: The girl borrowed a book on AI from library.
- CW: girl, borrowed, book, AI, library
- FW: the, a, on, from
- THE GIRL BORROWED A BOOK ON AI FROM LIBRARY
- THE GIRL BORROWED A BOOK ON AI FROM LIBRARY
- Sets Formed:
- {the,girl}
- {girl,borrowed}
- {borrowed,book}
- {book,on,AI}
- {borrowed,from,library}
- {a,book}
- THE PROFESSOR ANNOUNCED THAT HE WILL CONDUCT AN EXTRA LECTURE ON SUNDAY
- THE PROFESSOR ANNOUNCED THAT HE WILL CONDUCT AN EXTRA LECTURE ON SUNDAY
- {the,professor}
- {professor,announced}
- {announced.that,SCOPE}
- SCOPE:{he,conduct}
- SCOPE:{will,conduct}
- SCOPE:{conduct,lecture}
- SCOPE:{conduct,on,sunday}
- SCOPE:{extra,lecture}
- SCOPE:{an,lecture}
- Rq(d) = Relevance of the document d to the query q
- |Sd| = Number of sentences in the document d
- rq(s) = Relevance of sentence s to the query q
- The relevance score for a document d:
- The relevance of the sentence s to the query q :
- weight(srs) = weight of the SRS srs.
- press(srs) = true if srs is present in sentence s, false otherwise.
- SRS based search technique gives very high precision value ( the fraction of retrieved instances that are relevant) compared to tf-idf based search.
- But falls short of tf-idf based search due to its low recall( the fraction of relevant instances that are retrieved).
- REASONS:
- Morphological Divergence
- Eg: Apparel for man: Clothes for men
- Synonymy/Hypernymy/Hyponymy Divergence
- Physical Separation Divergence
-
LOW RECALL - ENHANCEMENTS:
- Stemming
- Eg: Moving, moved, moves → move
- Word Similarity
- SRS Augmentation
- Query expansion is the process of reformulating a seed query to improve retrieval performance.
- Techniques involved:
- Finding synonyms of words.
- Finding all the various morphological forms of a word by stemming
-
- GLOBAL : Examine word occurrences and relationships using thesaurus. It can be constructed manually or automatically.
- LOCAL: Using the top ranked documents retrieved by the original query.
- Manual Thesaurus Generation:
- Use of a controlled vocabulary (maintained by human editors) that is built up from sets of synonymous names for concepts.
- Automatic Thesaurus Generation:
- Exploit word co occurrence.
- Exploit grammatical relations or grammatical dependencies.
- ANALYSIS OF QUERY EXPANSION
- Query expansion is effective in increasing recall of relevant documents.
- But it may significantly decrease precision, particularly when the query contains ambiguous terms.
- In general a domain specific thesaurus is required for better performance.
- Initially the given query by user is fired
- Some results are retrieved
- Analyze whether or not those results are relevant
- Perform a new query and then produce the final search results by firing this modified query.
- TYPES OF RELEVANCE FEEDBACK
- Explicit Feedback :
- Process of taking Feedback Taken By users for assessing a given output(Set of Documents).
- Eg: After a document is viewed, ask “Was this document helpful?”
- ADVANTAGE:
- It is able to depict the actual requirement and expectations of the user
- DISADVANTAGE:
- Large fraction of user may not be interested to participate in surveys and Feedbacks.
- These surveys may be biased based on personal choices of users.
- e.g. : When searched about inferno, most of the people may rank the pages of musical band named inferno over that of inferno OS
-
- Feedback which is inferred by the actions of user on output documents.
- Factors:
- Number of times document is visited
- Duration of visit on particular URL
- Depth and number of links from visited
- ADVANTAGE :
-
- The interaction time with user is eliminated as the system takes the feedback of the user implicitly.
- DISADVANTAGE:
- Number of Hits on Url: Users may tend to always click on the initial document received. Thus if the search was initially not upto the mark, it may continue performing poor.
- Time Spent on URL: Sometimes the time taken to reject a document may be substantial enough for the algorithm to believe that it is relevant.
- Number and Depth of links visited: This will definitely rank a relevant document as relevant. But this will fail to rank a good document without links as relevant.
- PSEUDO RELEVANCE FEEDBACK OR BLIND FEEDBACK :
- Takes a query as an input.
- From some top k ranked results on that query, some keywords (as per their weights) are selected and augmented to the query which results in further search process.
- ADVANTAGE :
- It is a completely automated process. Hence totally free from human biasness.
- DISADVANTAGE:
- The efficiency heavily depends on the ranking algorithm used. If the top documents retrieved by the initial query are not very relevant then the final result will also not be very impressive.
- The type of term associations obtained for QE is restricted to co-occurrence based relationships in the feedback documents, and thus other types of term associations such as lexical and semantic relations (morphological variants, synonyms) are not explicitly captured .
- Given a query in a language, we take the help of another language to ameliorate the well known problems of PRF.
- The steps are:
- Translation: L1 -> L2
- PRF performed in L2.
- Result back-translation: L2 -> L1
- Combination of feedback models of L1,L2.
- Fetch a new ranked list of documents.
- ANALYSIS OF MULTILINGUAL PRF
- Good Feedback from Assisting Language: If the feedback model in the assisting language contains good terms, then the back-translation process will introduce the corresponding feedback terms in the source language, thus leading to improved performance.
- Finding Synonyms/Morphological Variations: Another situation in which MultiPRF leads to large improvements is when it finds semantically/lexically related terms to the query terms which the original feedback model was unable to.
- Abundance of documents in the assisting language in the web compared to the base language.
- Semantic Search will be helpful in case of Research Search but won’t be much helpful for Navigational Search.
- Semantic Search performs better than traditional searching methods in case of semantically meaningful sentences or phrases but will fall short for keyword based search.
- To be able to use Semantic Search Engine to their full potential the users also need to get used to searching with meaningful queries instead of just keywords.
- Semantic search may not able to replace the traditional web completely but it has the power to enhance it.
- With semantic search the web will become more intelligent as it will be able to understand exactly what we mean instead of searching just the keywords.
- Rajat Mohanty, Anupama Dutta and Pushpak Bhattacharyya, Semantically Relatable Sets: Building Blocks for Repesenting Semantics, 10th Machine Translation Summit ( MT Summit 05), Phuket, September, 2005.
- Manoj Chinnakotla, Karthik Raman and Pushpak Bhattacharyya, Multilingual PRF: English Lends a Helping Hand, SIGIR 2010, Geneva, Switzerland, July, 2010.
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
- Query Expansion Using Local and Global Document Analysis Jinxi Xu and W. Bruce Croft Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts, Amherst, MA 01003-4610, USA.
- http://en.wikipedia.org/wiki/Semantic_search ,Last modified on 23 October 2011 at 14:11,Last Accessed on 02 November 2011 at 17:31
- http://en.wikipedia.org/wiki/Query_expansion, Last modified on 7 October 2011 at 20:43, Last Accessed on 04 November 2011 at 18:45
- http://en.wikipedia.org/wiki/Relevance_feedback,Last modified on 31 October 2011 at 03:46,Last Accessed on 04 November 2011 at 19:10
Do'stlaringiz bilan baham: |