Web search engines


Download 352.5 Kb.
bet7/10
Sana03.11.2023
Hajmi352.5 Kb.
#1742628
1   2   3   4   5   6   7   8   9   10
Bog'liq
search

Other issues

  • Spamming
    • Adding popular query terms to a page unrelated to those terms
    • E.g.: Adding “Hawaii vacation rental” to a page about “Internet gambling”
    • Little setback due to hyperlink-based ranking
  • Titles, headings, meta tags and anchor-text
    • TFIDF framework treats all terms the same
    • Meta search engines:
      • Assign weight age to text occurring in tags, meta-tags
    • Using anchor-text on pages u which link to v
      • Anchor-text on u offers valuable editorial judgment about v as well.

Other issues (contd..)

  • Including phrases to rank complex queries
    • Operators to specify word inclusions and exclusions
    • With operators and phrases queries/documents can no longer be treated as ordinary points in vector space
  • Dictionary of phrases
    • Could be cataloged manually
    • Could be derived from the corpus itself using statistical techniques
    • Two separate indices:
      • one for single terms and another for phrases

Corpus derived phrase dictionary

  • Two terms and
  • Null hypothesis = occurrences of and are independent
  • To the extent the pair violates the null hypothesis, it is likely to be a phrase
    • Measuring violation with likelihood ratio of the hypothesis
    • Pick phrases that violate the null hypothesis with large confidence
  • Contingency table built from statistics

Corpus derived phrase dictionary

  • Hypotheses
    • Null hypothesis
    • Alternative hypothesis
    • Likelihood ratio

Approximate string matching

  • Non-uniformity of word spellings
    • dialects of English
    • transliteration from other languages
  • Two ways to reduce this problem.
    • Aggressive conflation mechanism to collapse variant spellings into the same token
    • Decompose terms into a sequence of q-grams or sequences of q characters

Download 352.5 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7   8   9   10




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling