- Set-valued response
- Response set may be very large
- (E.g., by recent estimates, over 12 million Web pages contain the word java.)
- Demanding selective query from user
- Guessing user's information need and ranking responses
- Evaluating rankings
Evaluating procedure - Given benchmark
- Query submitted system
- Ranked list of documents retrieved
- compute a 0/1 relevance list
- Recall at rank
- Fraction of all relevant documents included in .
- .
- Precision at rank
- Fraction of the top k responses that are actually relevant.
- .
Other measures - Average precision
- Sum of precision at each relevant hit position in the response list, divided by the total number of relevant documents
- . .
- avg.precision =1 iff engine retrieves all relevant documents and ranks them ahead of any irrelevant document
- Interpolated precision
- To combine precision values from multiple queries
- Gives precision-vs.-recall curve for the benchmark.
- For each query, take the maximum precision obtained for the query for any recall greater than or equal to
- average them together for all queries
- Others like measures of authority, prestige etc
Precision-Recall tradeoff - Interpolated precision cannot increase with recall
- Interpolated precision at recall level 0 may be less than 1
- At level k = 0
- Precision (by convention) = 1, Recall = 0
- Inspecting more documents
- Can increase recall
- Precision may decrease
- we will start encountering more and more irrelevant documents
- Search engine with a good ranking function will generally show a negative relation between recall and precision.
- Precision and interpolated precision plotted against recall for the given relevance vector.
- Missing are zeroes.
Do'stlaringiz bilan baham: |