Web search engines


Download 352.5 Kb.
bet5/10
Sana03.11.2023
Hajmi352.5 Kb.
#1742628
1   2   3   4   5   6   7   8   9   10
Bog'liq
search

The vector space model

  • Documents represented as vectors in a multi-dimensional Euclidean space
    • Each axis = a term (token)
  • Coordinate of document d in direction of term t determined by:
    • Term frequency TF(d,t)
    • Inverse document frequency IDF(t)
      • to scale down the coordinates of terms that occur in many documents

Term frequency

Inverse document frequency

  • Given
    • D is the document collection and is the set of documents containing t
  • Formulae

Vector space model

  • Coordinate of document d in axis t
  • Query q
    • Interpreted as a document
    • Transformed to in the same TFIDF-space as d

Measures of proximity

  • Distance measure
    • Magnitude of the vector difference
      • .
    • Document vectors must be normalized to unit ( or ) length
      • Else shorter documents dominate (since queries are short)
  • Cosine similarity

Relevance feedback


Download 352.5 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7   8   9   10




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling