Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently


Download 445 b.
Sana08.12.2017
Hajmi445 b.
#21768


Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently

  • Xiaochun Yang, Bin Wang Chen Li


Approximate selection queries



Performance is a big issue

  • Answer queries interactively

  • Many queries on a server



Outline



q-grams



q-gram inverted lists



Query processing

  • ED(bingon, ?)≤1



VGRAM: variable-length grams [VLDB07]



Adopting VGRAM in algorithms



Contributions of this study



Outline

  • Motivation

  • Tightening lower bound of common strings

  • Effects of adding a gram on index and queries

  • Cost-based construction of gram dictionary

  • Experiments



Calculating lower bound



Calculating lower bound



Too pessimistic?



Tightening lower bound

  • Dynamic programming: tightening NAG(s,k)

    • Subproblems: NAG(s[1,j], i)


Dynamic programming

  • Recurrence function



Dynamic programming



Outline

  • Motivation

  • Tightening lower bound of common strings

  • Effects of adding a gram on index and queries

  • Cost-based construction of gram dictionary

  • Experiments



Effects on inverted lists



Effects on query performance

  • Decrease query’s inverted list

  • Change lower bound

  • Change # of candidates



Effects on query’s inverted lists



Effects on lower bound

  • Query: Q, ED(Q, ?)≤1



Effects on # of candidates

  • Change lower bound  change # of candidates



Outline

  • Motivation

  • Tightening lower bound of common strings

  • Effects of adding a gram on index and queries

  • Cost-based construction of gram dictionary

  • Experiments



Construct a gram dictionary [VLDB07]



Cost-base construction



Outline

  • Motivation

  • Tightening lower bound of common strings

  • Effects of adding a gram on index and queries

  • Cost-based construction of gram dictionary

  •  Experiments



Data sets



Effect of Tightening Lower Bound



Comparison with algorithm Prune [VLDB07]



Choosing qmin



Conclusions

  • Tightening lower bound

    • Dynamic programming
  • Analysis of adding a gram affects

  • Efficient algorithm

    • Automatically generating a high-quality gram dictionary


Thank you



Related work

  • Approximate String Matching

    • q-Grams, q-Samples
    • Inside DBMS
    • Substring matching
  • Set similarity join

  • Estimation

    • Selectivity of SQL LIKE substring queries
    • Approximate string answers


Download 445 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling