In this talk… What do we mean by Summarization


Download 460 b.
Sana13.08.2017
Hajmi460 b.
#13411


Document Summarization

  • Madhavi Ganapathiraju Graduate Student Language Technologies Institute Carnegie Mellon University


In this talk…

  • What do we mean by Summarization

  • Expectations

  • Intuitive guesses on “how to”

  • Current approaches

  • One specific method in detail



Objective of Summarization

  • Reduce length of document

  • But preserve:

    • Key Information
    • Style of writing
  • Expected Qualities:

    • Cohesiveness
    • Coherence
      • Readability


How can we say a summary is good or bad?

  • Be able to answer questions

  • Compression ratio

  • Preserve Chronology

  • No Redundancy



How is it done manually

  • Read document

  • Identify important “phrases”

  • Identify Chronology of events if any

  • Synthesis new sentences



How to do it automatically: Edmundson’s method

  • His work at IBM

    • In 1969!!
    • Forms major component even in today’s systems!!!!


Edmundson’s method





Scoring schemes derived

  • Keyword-occurrence

  • Title-keyword

  • Location heuristic

  • Indicative phrases

    • this report …”, “in conclusion…”
  • Short-length cutoff

  • Upper-case word feature



Graph theoretic method



How to put key information together?

  • Synthesis new sentences?

    • Too difficult… to synthesize accurately
    • Systems exist
    • Undesirable
      • Original style of writing lost
      • Subtle information like tone of presentation lost


Summary = collection of sentences

  • Take top most scoring sentences

  • Arrange them by descending scores

  • Preserve chronology if exists



Redundancy

  • Edmundson’s procedure:

  • Novel methods to avoid redundancy

    • Maximum “marginal relevance” (MMR)


Similarity between sentences

  • Semester begins tomorrow

  • New semester is beginning on Monday

  • S1 = [Semester(1) begin(1) tomorrow(1)]

  • S2 = [New(1) semester(1) begin(1) Monday(1)]

  • Similarity 



MMR features

  • Clusters of sentences

  • Candidature of a sentence to be in summary:

    • Similarity to query.
    • Coverage of the passage
    • Content in the passage, eg., proper nouns, dates, etc.
    • Time Sequence: more recent ones
  • Undesirable features in sentences:

    • Similarity to passages already included in the summary
    • Belonging to the cluster/document that has already contributed a sentence to the summary




MMR algorithm



Future presentations on Summarization & Contact persons for research in this area:

  • Future presentations on Summarization & Contact persons for research in this area:

  • Nikesh Garera (ng+@cs.cmu.edu)

  • Learning Methods

  • Ravindra G. (ravi@mmsl.serc.iisc.ernet.in)

  • Statistical Methods



Download 460 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling