“Summaries as short as 17% of the full text length
speed up decision making twice.”

[Mani et al., 2002]

Automatic Text Summarization : The Basics


Text summarization refers to the process of taking a text, extracting content from it, and presenting the most important content to the user in a condensed form and in a manner sensitive to the user’s or application’s needs [Mani, 2001].

Since this is done by a computer, it can be called Automatic Text Summarization (ATS).

A summary can be defined as a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) [Hovy, 2005].

Related Disciplines

There are many disciplines which are related to automatic summarization:

top

In the middle ages an average human in his/her whole life had to process as much information as there is today in one copy of the “Sunday Times”. The era of the Internet has brought a vast amount of information within millions of web documents. This information is published simultaneously on many media channels in different versions, i.e., a paper newspaper, web newspaper, SMS message, radio podcast, and a spoken newspaper for the visually impaired.

It became common to use Google to search through these documents and retrieve a list of thousands of web pages. To decide whether the information found satisfies his or her topic of interest the user has to download and read each document which is unnecessary and time-consuming.

Automatic text summarization can automate this routine or at least help by detecting the most relevant content from a source document and making a draft summary.

The application areas for automatic text summarization are extensive. Here are some of them:

  • Summarization of news articles or email messages down to SMS or WAP mobile devices which may need reduction of content.
  • Summarization of a foreign language document to obtain short translated version of a summary and establish the relevance of a document.
  • Summarization of a text before an automatic speech synthesizer reads it, thus reducing the time needed to absorb the key facts in a document.
  • For search engines to present short description of matching text.
  • Summarization of user’s own writing to analyze whether his or her topic idea comes across in the summary.

top

ATS is an active research area that deals with single- and multi-document summarization tasks. In single-document summarization, the summary of only one document is built, while in multi-document summarization the summary of a whole collection of documents (such as all today’s news or all search results for a query) is built.

The output of a summary system may be an extract or an abstract.

Abstract Summarization

This summary is an interpretation of the original text. It is a smaller text based on the results of understanding of the main concepts in an original document. For example: “They visited Germany, Italy, France and Spain during their vacation.” turns into an abstract “They visited some European countries”. This kind of summarization requires knowledge base which makes it hard in order to provide a good summary.

Extract Summarization

An extractive summary is composed with a selection of sentences (or phrases, paragraphs, etc.) from the original text, usually presented to the user in the same order—i.e., a copy of the source text with most sentences omitted. The main problem for generating an extractive automatic text summary is to detect the most relevant information in the source document. Extraction is the most used method to produce summaries.

Concerning the style of the output, a distinction is made between indicative and informative summaries.

Informative Summary

An informative summary is meant to represent (and often replace) the original document. Therefore it must contain all the pertinent information necessary to convey the core information and omit ancillary information.

Indicative summary

An indicative summary's main purpose is to suggest the contents of the article without giving away detail on the article content. It can give a brief idea of what the original text is about and serve to entice the user into retrieving the full form. Book jackets, card catalog entries and movie trailers are examples of indicative summaries.

Finally there are generic and query-based summaries

Generic Summary

A generic summary can serve as surrogate of the original text as it may try to represent all relevant features of a source text. They are text-driven and follow a bottom-up approach.

Query-based Summary

The query-based or user-focused summaries rely on a specification of a user information need, such a topic or query. They follow a top-down approach.

top

Following (Lin and Hovy, 1997), text summarization has been decomposed into three main steps: topic identification, interpretation and summary generation.

Topic Identification

This step includes the identification of the most prominent information in the text. There are different techniques for topic identification are used which are:

Position:

  • In some text types, certain parts of it hold an important topic. The title is always important, the first sentence, the last etc.

Cue Phrases:

  • Some words or phrases indicate where the essence of the text is, for example “in summary”, “in conclusion”, “to sum up”, “this paper” etc.

Word Frequency:

  • Some words, depending on the content of a text, tend to appear more often and this can determine the topic of it.

Interpretation

The interpretation step is performed only for abstract summaries. Here different subjects are merged into one general content, redundancies are removed etc.

For example: The students entered the class, sat, opened their books and listened to the teacher. It may be summarized down to: The students studied.

Summary Generation

In this final step the summarizer uses text generation methods to produce an output.

Some of these four methods can be used (Hovy & Lin 1997):

Extraction:

  • When summarization is done, the high scored sentences and phrases are gathered into a summary.

Topic lists:

  • The most frequent keywords are printed on the output.

Phrase concatenation:

  • Two or more similar phrases are merged together.

Sentence generation:

  • The result of sentence generator is new sentences produced out of a list of concepts and their related topics.

top

Extraction is the most used method to produce summaries.

It uses different statistical, surface-based, and machine learning techniques to determine which sentences are important. First attempts to implement extraction summarization were made in the 50s.

Surface-based summarization methods

  1. Term-based method:

  2. This technique relies on the assumption that sentences containing words that occur frequently in a text have higher weight than the rest. That means that these sentences are the important ones and they are to be extracted.

    The importance of a word is calculated using statistical measures:

    • Term frequency:
      • Boolean Frequency: tf(t,d) = 1 if t occurs in d and 0 otherwise.
      • Raw Frequency: The frequency f(t,d) of term t in document d is defined as the number of times that t occurs in d. Relevance does not increase proportionally with raw frequency. Relevance should go up with a number of word occurrences but not linearly. To achieve this log-frequency is used.
      • Log-frequency (log normalization): tf(t,d) = 1 + log f(t,d), or zero if f(t,d) is zero.
      • Augmented Frequency (double normalization 0.5): used to prevent a bias towards longer documents, e.g. raw frequency divided by the maximum raw frequency of any term in the document
    • TF-IDF: how frequent a term in a document is and in how many documents from a corpus of documents this term appears. The TF-IDF value increases proportionally to the number of times a term appears in the document and it also increases with the rarity of the term in the corpus.
    • TF-IDF

      with

      tf: frequency of term t in document d;

      N: total number of documents in the corpus

      n: number of documents where the term t appears at least once.

    Before calculating term weights, a filtering task must be done:

    • Stop-words elimination - Common words with no semantics such as pronouns, prepositions and articles have high frequencies in a text and could influence keywords extraction.
    • Stemming - The purpose of stemming is to obtain the stem of each word for grouping words with a similar basic meaning together, thus increasing the stem frequency.

    The algorithm of term-based summarization:

    1. Score all the words in the source according to the selected measure.
    2. Score all the sentences in the text by adding the scores of the words from these sentences.
    3. Extract the sentences with top N scores.
    4. Present the extracted sentences in the original order.

  3. Position-based method:

  4. It was noticed that in some genres important sentence (paragraph) appears in predefined positions.

    • Newspapers: the first few sentences from the text are the most important.
    • Scientific papers: the first/last sentences in the paragraph are relevant for the topic of the paragraph.
    • Scientific papers: important information occurs in specific paragraphs (sections) of the document (introduction/conclusion).

  5. Title-based method:

  6. It considers that words in the headings or titles are positive relevant to summarization.

    Edmundson (1969) noticed that can lead to an increase in performance of up to 8% if the score of sentences which include such words are increased.

  7. Cue words-based method:

  8. Cue words and phrases, such as "in conclusion", "important", "in this paper", etc. can be very useful to determine signals of relevance or irrelevance.

    Words or phrases classified as "positive" or "negative" may indicate the topicality and thus the sentence value in the text:

    • positive: significant, purpose, in this paper, we show
    • negative: Figure 1, believe, hardly, impossible, pronouns

  9. Sentence length method:

  10. Sentences of an average length are more likely to reflect the main idea of the text than the ones which are very short or very long.

    The score given to a sentence reflects the number of words in a sentence, normalized by the length of the text.


Combination of different methods:

There is a tendency in current systems to use a hybrid approach and combine some of the techniques mentioned, for example:

  • Cue words method + Position and Term frequency based methods.
  • or Position + Length weight of sentences + Similarity of these sentences with the title
  • or other combinations

By extracting sentences from a document using just statistical keyword approach often causes a lack of cohesion on the final summary. To make them better, additionally some extra algorithms and methods may be used:

  1. Sentence Selection Function for Extraction
  2. Knowledge-Based Concept Counting
  3. Lexical Chain Methods
  4. Latent Semantic Analysis (LSA)
  5. Vector-Based Semantic Analysis using Random Indexing
  6. Pronoun Resolution
  7. Machine Learning Techniques

top

Summaries can be evaluated using intrinsic or extrinsic measures.

Intrinsic method evaluates the results of a system directly, for example: quality, informativeness, sometimes does not give a very accurate view of how useful the output can be for another task.

Extrinsic method evaluates how summaries are good enough to accomplish the purpose of some other specific task, for example: filtering in information retrieval or report generation.

Evaluating the qualities like comprehensibility, coherence, and readability is really difficult.

So called target-based evaluation is the most used evaluation method, which intrinsically compares the automatic summary with a human reference or gold-standard summary (ideal summary). The drawback of this method is that it requires a gold standard which usually is not easy to produce. Usually annotated corpora are used as a gold standard, alternatively human provided summaries can be used in the evaluation.

Some automatic evaluation programs have been developed to facilitate the process of evaluation: SEE, ROUGE, BE etc.

Text summarization conferences: SUMMAC, DUC, NTCIR.

top


The Algorithm used in t-CONSPECTUS


Table of Contents

  1. Architecture
  2. Summarizer
  3. Evaluation

t-CONSPECTUS is a web-based single-document multilingual text summarizer that uses some linguistic and statistical extraction methods to try to find the most informative sentences. It is implemented in Python 2.7 and the area of its application is a newspaper article in English, German or Russian provided as a plain text inserted into the text box, loaded by a user as a txt file or grabbed from a URL.






The whole process is done in three stages.

  1. Preprocessing
    1. Title Identification
    2. Text into Paragraphs Splitting
    3. Paragraph to Sentences Decomposition
    4. Tokenization
      • Converting of Irregular Word Forms
      • Removing of Stopwords
      • Stemming
  2. Scoring
    1. Terms weighting
    2. Sentences weighting
  3. Generating
    1. Summary generating

I. Preprocessing

During the preprocessing stage the summarizer goes through the input text and performs four main procedures:

  1. Defines a title of an article. Title is considered a string till the first newline character without period at the end. Still a string with period can be analyzed as a title if it ends with an acronym or abbreviation (“U.S.”, “etc.”). Additionally, a string must be max. 17 tokens long.

    Title is used later for assigning extra weights to keywords. Therefore it is highly recommended to submit articles with headings.

  2. Splits text into paragraphs. The rest of the text is divided into paragraphs by newline characters.

    The summarizer needs to know paragraph boundaries to find its first and last sentence and implement some position-based scoring.

  3. Splits paragraphs to sentences. This procedure is performed in two steps: initial sentence decomposition, post-splitting correction.

  4. During the first step the following is done:

    • All potential sentence terminators ('.', '!', '?', ':', ';', '…') are checked against regular expressions, describing left and right contexts for these terminators. For ‘.’ terminator, cases with abbreviations are specially handled. For this purpose a list of common English abbreviations was compiled (e.g. Biol., coop., Sept.).

      Example: He adds that the government has been talking about making Mt. Kuanyin a national park for a long time.

    • Handling of simple cases when a space is omitted between two sentences (…in it.The...) is also provided.

    During the second step incorrectly splitted sentences are joined together.

    • Example 1: If the 20-point limit is triggered after 1:30 p.m. Chicago time, it would remain in effect.
    • Example 2: The U.S. Geological Survey reported that the quake occurred at around 8:23 a.m. local time (1423 GMT) Sunday.
    • Example 3: Teleconference to be Held at 8:00 a.m. EDT / 8:00 p.m. Beijing Time on March 31.

    After this stage the system returns the inputted text as a python list of paragraphs with nested lists of separate sentences.

  5. Tokenizes each sentence. The module splits sentences into words by matching a string against the regex pattern. While tokenizing it also transforms all irregular verb and noun forms into initial forms (e.g. did-done --> do, mice --> mouse etc.). For this purpose the module requires lists of these nouns and verbs. At this stage contractions like I’ve, you’d’ve, they’re, where’s, shouldn’t etc. are reduced to the first part (I, you, they, where, shouldn).

    After tokenizing, each sentence is represented as a python list of lowercase tokens (digits preserved) without punctuation marks.

    Next, those tokens which are not in a stop-words list are stemmed with Porter stemmer making a list of tuples (stem, token). Such data structure helps to easier extract keywords associated with frequent stems.

Now, when the preprocessing stage is over the inputted text is represented as a big python list of paragraphs, each of which contains nested lists of tokenized and stemmed sentences cleared of stop-words and punctuation marks with transformed irregular word forms and contractions reduced.


II. Scoring

During the scoring stage the summarizer assigns weights to terms thus dynamically building a dictionary of keywords. Based on the keywords it weights sentences of an article.

  1. Term Weighting

    Raw frequency count goes first. Stems whose frequencies are higher than the average frequency are taken for further weighting.

    For computing the importance of selected stems TF-IDF was chosen. To do the “IDF” part of the formula a corpus of ANC and BNC written texts was compiled.

    At the last stage of term weighting, extra weights are added to terms to retrieve keywords:

    • A term weight is doubled if the term is in the title.
    • A term receives an extra weight if it is found in first and last sentences of paragraphs.
    • A term receives an extra weight if it is found in interrogative and exclamatory sentences.
    • A term receives an extra weight if it is marked as a proper name.

    Finally, terms with weights higher than the mean weight, sorted in descending order are selected into a list of keywords. The resulting data structure is a python list of tuples containing stems and their weights.

  2. Sentence Weighting

    In order to determine the importance of every sentence in a text, a method of symmetrical summarization is used.

    For detailed description of the method, see: Яцко В.А. Симметричное реферирование: теоретические основы и методика // Научно-техническая информация. Сер.2. - 2002. - № 5.

    The main principle of this method is a principle of symmetric relation: if sentence X has n connections (that is shared words) with sentence Y then sentence Y has n connections with sentence X.

    Following this principle a number of shared words are counted for every sentence. To successfully implement this method a text must be at least 3 sentences long. The sentences with high number of connections can be treated as informative sentences.

    The algorithm of assigning weights to sentences:

    1. Summing up three weights:

      • Base weight: a number of symmetrical connections with other sentences.

      • Position weight: in newspaper text the first line is the most important and gets the highest score. The following formula is used for defining the position score:

        Position score = (1/line number)×10

      • Total keywords weight: a sum of weights of the keywords contained in a sentence.

    2. Multiplying this weight by a log-normalized frequency of proper names and numerical values contained in a sentence.

    3. Applying ASL penalty to the resulting weight.

      Due to adding weights of all sentence keywords to its own weight there is a risk that long sentences will be ranked higher. To avoid this overweighting, the sentence weight is multiplied by Average Sentence Length (ASL) and divided by number of words in the sentence, for normalization:

      ASL = WC / SC

      with

      WC = number of words in the text

      SC = number of sentences in text

      Final sentence weight = (ASL x sentence weight)/(number of words in sentence)


A new list is created and contains tuples of sentences and their weights sorted in descending order. To be selected into the list a sentence must be at least 7 words long.


III. Generating

At the third and final stage the summarizer selects n number of first sentences from the list generated before. The number of sentences to be used in the final summary is calculated depending to a user. By defaulted the compression rate is 20% of all sentences in the list.

Finally the extracted sentences are ordered by their position in the original text to create some kind of cohesion in the summary.

Depending on settings chosen by the user the final summary will contain:

  • only extracted salient sentences;
  • summary with keywords highlighted;
  • summary, table of keywords and some statistical information like the summary compression rate, total number of sentences and weights of keywords.


Evaluation of summaries has not yet been done due to lack of golden-standard summaries.


Examples of summaries


Original text.

Number of sentences: 45

Compression rate 20%

The Brontosaurus Would Like to Know: What Is a Species, Really?

A new paper reverses the 1903 demotion of the beloved dinosaur genus—and calls into question the way we classify the natural world.

In 1989, the U.S. Postal Service released a collection of 25-cent commemorative postage stamps celebrating a series of dinosaurs. The stamps featured the tyrannosaurus, and the stegosaurus, and the pteranodon. They also featured, however, the brontosaurus, or the "thunder lizard"—which had been reclassified under the genus apatosaurus ("deceptive lizard") in 1903.

This was an egregious mistake, but an understandable one. The brontosaurus—the gentle giant that ate plants and sneezed on children—has spent the past century-plus as, if not an actual genus, then a cultural one. Tyrannosaurus, stegosaurus, triceratops, ... and brontosaurus. The sauropod was like the fourth Beatle, only more beloved. Sure, the long-necked lizard might not have technically existed; in another sense, though, the brontosaurus was more real in the human imagination than the apatosaurus ever was.

So it was big news, this week, when a new paper brought some redemption—for brontosaurus fans, for Linnaean taxonomy, for the U.S. Postal Service. A team of scientists, cross-referencing the digital scans of bones from hundreds of long-necked dinosaurs, is claiming that the brontosaurus deserves to be reinstated as a genus unto itself. Deceptive lizards here; thunder lizards there. As Roger Benson, one of the study's co-authors, explained to Wired: “It was a number of small differences that were important, but probably the most obvious features that would help distinguish the two is that the Apatosaurus has an extremely wide neck, where brontosaurus' is more high than wide.”

You could say a lot of things about that little taxonomic shift, but one of them is that the brontosaurus—something that existed, and then didn’t exist, but then existed in a broader sense, and now exists in the actual sense again—is a tidy reminder of the ongoing churn of scientific knowledge. The sun, the center of it all and then very much not. Caffeine, both healthful and harmful. Jurassic Park’s velociraptor, which was, in reality—as far as we know—much more like a feather duster than like an agile T. Rex. And, of course, Pluto. The late, lamented Pluto.

From a scientometric perspective, occasional updates to the rambling facts of our world are to be expected: Facts are contingent, and we’re always learning new things, and science is a discursive discipline, and all that. From a more human perspective, though, the dynamism of scientific truths can be jarring: If you can just, one day, take away the “planet” from “My Very Educated Mother Just Showed Us Nine Planets,” what else can you do?

The good and the bad news is: a lot. The whole Brontus interruptus saga may be close to the heart of every kid who had a dinosaur phase growing up—and every adult for whom that phase continues—and it may have been made more interesting because of its roots in the fascinating "bone wars" of the 19th century. Beyond that, though, it isn’t unusual. Taxonomies, despite their promise of easy categorization, can be wonderfully, and also somewhat terrifyingly, fluid. That thing we think about as an animal unto itself—Ursus arctos, Homo sapiens sapiens, Aptostichus angelinajolieae—is a human construction more than anything else.

“Species delineation is more than meets the eye,” Paul Sereno, a professor of paleontology at the University of Chicago and a National Geographic "explorer-in-residence," told me. That’s in part because of the fact that genetic diversity isn’t always expressed phenotypically, in terms of animals’ appearances. It’s also because animals don’t just evolve, over long stretches of time; they also, of course, grow and change—in size, in color, sometimes in sex—over the course of their lives. Which means that understanding how species relate to each other, as environment-sharers and Darwinian competitors, requires more than one-off encounters with them. The biggest challenge to understanding what a species is, ultimately, may simply be one of human exposure. According to one paper, there are an estimated 8.7 million species on Earth; some 86 percent of those, the paper claims, have yet to be described.

There's reason to believe, too, that the percentage of those mystery species will soon drop drastically. Our abilities to observe animals in their habitats, for one thing, are improving. Digital technologies in particular mean that there are more people than ever walking around with cameras to record biological diversity. We’re living in the age of the “Internet naturalist,” as Atlantic contributor Rose Eveleth put it, and that means more information, and more nuance, about the genetic diversity of the natural world.

And that means: more species. New taxonomies. Amended classifications. As Sereno puts it, “we’re in the midst of a mini-revolution in understanding how many discrete genetic packets and species are present today.” And all that means, in turn, that we can expect more news like the resurrection of the brontosaurus—for animals both long-dead and still-living. “We’re constantly revising,” Sereno says, “because material is constantly being found.”


Summary.

Number of sentences: 8

A new paper reverses the 1903 demotion of the beloved dinosaur genus—and calls into question the way we classify the natural world.

In 1989, the U.S. Postal Service released a collection of 25-cent commemorative postage stamps celebrating a series of dinosaurs.

They also featured, however, the brontosaurus, or the "thunder lizard"—which had been reclassified under the genus apatosaurus ("deceptive lizard") in 1903.

Sure, the long-necked lizard might not have technically existed; in another sense, though, the brontosaurus was more real in the human imagination than the apatosaurus ever was.

So it was big news, this week, when a new paper brought some redemption—for brontosaurus fans, for Linnaean taxonomy, for the U.S. Postal Service.

A team of scientists, cross-referencing the digital scans of bones from hundreds of long-necked dinosaurs, is claiming that the brontosaurus deserves to be reinstated as a genus unto itself.

“Species delineation is more than meets the eye,” Paul Sereno, a professor of paleontology at the University of Chicago and a National Geographic "explorer-in-residence," told me.

According to one paper, there are an estimated 8.7 million species on Earth; some 86 percent of those, the paper claims, have yet to be described.

Original text.

Number of sentences: 22

Compression rate 20%

El Niño on its way to Australia, says Bureau of Meteorology

Australia will be hit by a “substantial” El Niño event for the first time in five years, heightening the chances of widespread drought and warmer temperatures, the Bureau of Meteorology has confirmed.

The BoM said the El Niño phase will become the “dominant influence on Australian climate during the second half of the year”. The tropical Pacific is already in the early stages of the event.

El Niño is a periodic climatic event that occurs when tropical Pacific waters warm, affecting wind circulation patterns. The winds push warm waters westwards, towards Australia.

Although no El Niño periods are the same, in Australia they are generally associated with less rainfall, warmer temperatures, shallower snow depths and higher fire risk. Of the 26 El Niño events since 1900, 17 have resulted in widespread drought.

The warmer ocean waters also pose a risk to the corals of the Great Barrier Reef, which can bleach and die in extreme temperatures.

“This will be quite a substantial El Niño event,” said David Jones, a climatologist at BoM. “This isn’t a weak one or a near miss as we saw last year.”

In 2014, the BoM anticipated an El Niño period due to warming oceans, but the atmospheric conditions did not tally. The confirmation of a full El Niño event is the first in Australia since March 2010.

“The most obvious thing we know is that El Niño events tend to lead to drier winter and spring periods,” Jones said. “There is an increased risk of drought which obviously isn’t good for people already in drought.

“Australian temperatures are already warming and El Niño tends to give those temperatures a boost, so we’d expect winter, spring and even early summer to have well above average daytime temperatures.”

El Niños have raised daytime temperatures on average by 1.5C in Australia in the past. However, they also bring cooler nights which means there are more frosts.

The pattern also makes floods less likely and it is expected to reduce the number of cyclones expected in northern Australia.

“We’d expect this El Niño to peak in intensity around spring and early summer, lasting to February perhaps,” Jones said. “We’d hope for a one-in-10 scenario where we’d fluke a bit of rain but it’s likely that dry conditions will emerge.”

Jones said that despite the El Niño event, it was “very unlikely” that 2015 would be Australia’s hottest year, due to cooler months in the first half of the year.

However, he said there was a “significant probability” that the world would have its warmest year, beating the mark set in 2014.


Summary.

Number of sentences: 5

Australia will be hit by a “substantial” El Niño event for the first time in five years, heightening the chances of widespread drought and warmer temperatures, the Bureau of Meteorology has confirmed.

Of the 26 El Niño events since 1900, 17 have resulted in widespread drought.

“This will be quite a substantial El Niño event,” said David Jones, a climatologist at BoM.

The confirmation of a full El Niño event is the first in Australia since March 2010.

El Niños have raised daytime temperatures on average by 1.5C in Australia in the past.

Original text.

Number of sentences: 35

Compression rate 20%

The Foreign Office has brushed off Russia's complaints that a remark by Prince Charles comparing Vladimir Putin to Adolf Hitler over Ukraine was "outrageous" and "low".

A British official told the Russian deputy ambassador, Alexander Kramarenko, at a meeting on Thursday afternoon that "the Foreign Office could not be expected to comment upon reports of private conversations".

The brevity of the response to what the Kremlin described as "unacceptable" remarks was underscored by the selection of a mid-ranking official – Sian MacLeod, the FCO's additional director for eastern Europe and central Asia – to deliver the message. No ministers were involved.

Russia had wanted clarification on exactly what Charles said, but instead MacLeod restated the government's hope that ahead of the Ukrainian presidential elections this weekend Russia would step back from comment or actions provoking instability in Ukraine.

The meeting had been called by the Russians over what the Kremlin described as "outrageous remarks made by Prince Charles in Canada".

The prince is reported to have made his comments during a private conversation with a Jewish survivor of the second world war about the dispute over Russia's annexation of Crimea. "Now Putin is doing just about the same as Hitler," he reportedly told Marianne Ferguson, a volunteer at the Canadian Museum of Immigration in Halifax.

Moscow's foreign ministry spokesman, Alexander Lukashevich, said: "If these words were truly spoken, then without doubt, they do not reflect well on the future British monarch. We view the use of the western press by members of the British royal family to spread the propaganda campaign against Russia on a pressing issue –that is, the situation in Ukraine – as unacceptable, outrageous and low."

Aides at Prince Charles's London home again declined to comment on the remarks, saying they were part of a private conversation.

Charles returned from the three-day royal visit to Canada on Wednesday, and his next engagement comes on Saturday at a concert in a church near his Gloucestershire manor house.

British diplomats played down the seriousness of the situation, suggesting the Kremlin was capitalising on the remarks to distract from the crisis in Ukraine.

Mark Malloch-Brown, a former Foreign Office minister, said he was sure there was "some eye-rolling in the Foreign Office" about Charles's remark but said that "this doesn't rise up the league of genuinely serious diplomatic incidents".

"It suits the Russian position to make this about antiquated bits of the British political system and distract from the real issue, which is their behaviour in Ukraine," he said. "An off-the-cuff comment to an elderly lady is not a public statement."

Tony Brenton, the former UK ambassador to Moscow, said Russia's outspoken reaction was predictable but the incident would not affect British-Russian relations.

"You can't say anything ruder about a Russian leader than comparing him to Hitler given what happened in the second world war," he said. "They were bound to have to make a lot of noise publicly. Their own people will have expected some sort of response. But the professionals in the FCO and the Kremlin know the Prince of Wales wasn't speaking for the government."

The Foreign Office showed no sign of being distracted following the meeting with Kramarenko.

"This weekend, the Ukrainian people will vote in one of the most important elections in their history," a spokesperson said. "As the Foreign Secretary has repeatedly made clear, they have the right to choose their own government in a free and fair election and Russia must exercise its influence to restrain those responsible for violence and disorder."

Until Thursday Russian officials had not responded publicly to the remarks, and Russian TV channels had remained unusually quiet on the issue.

After president Putin's spokesman Dmitry Peskov declined to comment on Prince Charles' statement, the incident did not even make the evening television news in Russia.

Many prominent pro-Kremlin pundits did not mention the matter, with editor-in-chief of the government channel Russia Today, Margarita Simonyan, tweeting that the media would do better to report on the recent release of a British RT contributor by Ukrainian authorities than Prince Charles.

The popular Russian daily paper Moskovskij Komsomolets said the remarks risked "triggering an international scandal" and complicating "clouded" UK-Russian relations.

Internet entrepreneur and former MP Konstantin Rykov tweeted the infamous photo of Prince Harry at a costume party with a Nazi armband and the caption: "From childhood, Prince Charles instilled in his son good manners and an intolerance for fascism."

In the popular newspaper Moskovsky Komsomolets, political columnist Mikhail Rostovsky argued that Prince Charles' comment was not just an example of the "stupid moves" the monarch is known for, but rather an indication of the West's diminished view of Russia.

"In demonstrating his sharply negative view of Russian policy, Prince Charles expressed an opinion that unfortunately is not only his … From the point of view of most political circles in the west, the incorporation of Crimea into Russia was a howling violation of international law," Rostovsky wrote.

"As long as a breakthrough in these relations hasn't happened, Ukraine's political crisis will always be with us, regardless of whether the action is taking place in Moscow, Beijing, Halifax or London," he concluded.


Summary.

Number of sentences: 7

The Foreign Office has brushed off Russia's complaints that a remark by Prince Charles comparing Vladimir Putin to Adolf Hitler over Ukraine was "outrageous" and "low".

Russia had wanted clarification on exactly what Charles said, but instead MacLeod restated the government's hope that ahead of the Ukrainian presidential elections this weekend Russia would step back from comment or actions provoking instability in Ukraine.

The meeting had been called by the Russians over what the Kremlin described as "outrageous remarks made by Prince Charles in Canada".

Aides at Prince Charles's London home again declined to comment on the remarks, saying they were part of a private conversation.

After president Putin's spokesman Dmitry Peskov declined to comment on Prince Charles' statement, the incident did not even make the evening television news in Russia.

In the popular newspaper Moskovsky Komsomolets, political columnist Mikhail Rostovsky argued that Prince Charles' comment was not just an example of the "stupid moves" the monarch is known for, but rather an indication of the West's diminished view of Russia.

"In demonstrating his sharply negative view of Russian policy, Prince Charles expressed an opinion that unfortunately is not only his …

Original text.

Number of sentences: 13

Compression rate 20%

Single Americans Now Comprise More Than Half the U.S. Population

Single Americans make up more than half of the adult population for the first time since the government began compiling such statistics in 1976.

Some 124.6 million Americans were single in August, 50.2 percent of those who were 16 years or older, according to data used by the Bureau of Labor Statistics in its monthly job-market report. That percentage had been hovering just below 50 percent since about the beginning of 2013 before edging above it in July and August. In 1976, it was 37.4 percent and has been trending upward since.

In a report to clients entitled “Selfies,” economist Edward Yardeni flagged the increase in the proportion of singles to more than 50 percent, calling it “remarkable.” The president of Yardeni Research Inc. in New York said the rise has “implications for our economy, society and politics.”

Singles, particularly younger ones, are more likely to rent than to own their dwellings. Never-married young singles are less likely to have children and previously married older ones, many of whom have adult children, are unlikely to have young kids, Yardeni wrote. That will influence how much money they spend and what they buy.

He argued the increase in single-person households also is exaggerating income inequality in the U.S.

“While they have less household earnings than married people, they also have fewer expenses, especially if there are no children in their households,” Yardeni wrote.

The percentage of adult Americans who have never married has risen to 30.4 percent from 22.1 percent in 1976, while the proportion that are divorced, separated or widowed increased to 19.8 percent from 15.3 percent, according to the economist.

Yardeni is known in the financial markets for coining the phrase “bond vigilantes” in the 1980s to describe investors who were selling Treasury securities because of fears about big U.S. budget deficits.


Summary.

Number of sentences: 3

Some 124.6 million Americans were single in August, 50.2 percent of those who were 16 years or older, according to data used by the Bureau of Labor Statistics in its monthly job-market report.

That percentage had been hovering just below 50 percent since about the beginning of 2013 before edging above it in July and August.

The percentage of adult Americans who have never married has risen to 30.4 percent from 22.1 percent in 1976, while the proportion that are divorced, separated or widowed increased to 19.8 percent from 15.3 percent, according to the economist.

Original text.

Number of sentences: 35

Compression rate 20%

File sync services provide covert way to control hacked computers

File synchronization services, used to accommodate roaming employees inside organizations, can also be a weak point that attackers could exploit to remain undetected inside compromised networks.

Researchers from security firm Imperva found that attackers could easily hijack user accounts for services from Dropbox, Google Drive, Microsoft OneDrive and Box if they gain limited access to computers where such programs run -- without actually stealing user names and passwords.

Once the accounts are hijacked, attackers could use them to grab the data stored in them, and to remotely control the compromised computers without using any malware programs that could be detected by antivirus and other security products.

The Imperva researchers found that all of the file synchronization applications they looked at provide continued access to users' cloud storage accounts via access tokens that are generated after users log in for the first time. These tokens are stored on users' computers in special files, in the Windows registry or in the Windows Credential Manager, depending on the application.

The researchers developed a simple tool they dubbed Switcher, whose role is to perform what they call a "double-switch" attack.

Switcher can be deployed on the system through a malicious email attachment or a drive-by download exploit that takes advantage of a vulnerability in a browser plug-in. If an exploit is used, the program doesn't even have to be written to disk. It can be loaded directly into the computer's memory and doesn't need high-level privileges to execute its routine.

The Switcher first makes a copy of the user's access token for the targeted file synchronization app and replaces it with one that corresponds to an account controlled by the attacker. It then restarts the application so that it synchronizes with the attacker's account.

The previously saved user token is copied to the synchronized folder so that the attacker receives a copy and then the Switcher app restores it back, forcing the app to be linked back to the user's real account -- hence the double-switch name.

However, since the attacker now has a copy of the user's access token, he can use the Switcher on his own computer and synchronize it with the user's real account, getting a copy of all of the files stored in it.

The attack can be taken to the next step by having the Switcher create a scheduled task or a Windows Management Instrumentation (WMI) event that would be triggered when a specific file appears in the synchronized folder. That file could be created by the attacker and could contain commands to be executed by the scheduled task.

This mechanism would give the attacker persistent remote access to the computer even after Switcher deletes itself or is removed from memory. After executing a command and saving its output to the synchronized folder, the attacker could delete it, as well as the trigger file in order to cover his tracks.

If the attacker is not looking for stealthiness and persistence, another possible attack scenario would be to encrypt all of the files in the user's account and ask for a ransom to decrypt them -- an approach used successfully in recent years by ransomware programs.

According to Amichai Shulman, the chief technology officer at Imperva, these attacks against file synchronization services would be very hard to detect by antivirus programs, because the Switcher is not performing any unusual activity that could be interpreted as malware behavior.

The program is made up of just 10 lines of code that read and write to files and registry keys that other applications also modify, he said. The WMI task that gets left behind is not unusual either because a lot of other applications create WMI tasks for various reasons, he added.

In addition, the Switcher might not even get stored on disk and would remove itself after setting up the conditions for the attack.

Security products operating at the network perimeter wouldn't be able to block the traffic because it's encrypted by default and it's generated by known, legitimate file synchronization applications organizations have approved.

Right now none of the tested services notify users that their accounts have been accessed from a new location, like some websites do. Some of them allow users to view the recent activity for their accounts which could reveal the unauthorized access from an unusual location or IP address, but they don't actually alert users via email when that happens, according to the Imperva researchers.

Even if such a compromise would be detected, recovering from it could be problematic because in some cases the access tokens remain valid even if users change their passwords. The only way to recover in those situations is to actually delete the account and create a new one, the researchers said in a report that will be released Wednesday at the Black Hat security conference in Las Vegas.

Attackers have already shown an interest in abusing trusted cloud services or social media sites, both to exfiltrate data and for command and control. In December, security researchers from Blue Coat reported an attack campaign against military, diplomatic and business targets that used a Swedish file synchronization service called CloudMe for command and control. FireEye recently reported that a Russian cyberespionage group known as Hammertoss used cloud storage services to exfiltrate data from organizations.

At the BSides security conference this week, also in Las Vegas, software developers Gabriel Butterick, Dakota Nelson and Byron Wasti released a framework that can create an encrypted covert communication channel for malware by using images, audio clips and text messages posted on social media sites like Twitter, SoundCloud and Tumblr.

Maybe some of the cloud storage providers will improve things in the future, but that doesn't change the underlying issue: Whatever is useful for users can also be useful for attackers, Shulman said. Attackers will eventually find a way to compromise endpoint systems, but most of the time their goal will be to use them as launchpads for attacks against the organization's databases and file servers, where the interesting information is stored. Because of that, it's important for companies to monitor and strictly control access to their important data, he said.


Summary.

Number of sentences: 8

Researchers from security firm Imperva found that attackers could easily hijack user accounts for services from Dropbox, Google Drive, Microsoft OneDrive and Box if they gain limited access to computers where such programs run -- without actually stealing user names and passwords.

These tokens are stored on users' computers in special files, in the Windows registry or in the Windows Credential Manager, depending on the application.

The Switcher first makes a copy of the user's access token for the targeted file synchronization app and replaces it with one that corresponds to an account controlled by the attacker.

However, since the attacker now has a copy of the user's access token, he can use the Switcher on his own computer and synchronize it with the user's real account, getting a copy of all of the files stored in it.

The attack can be taken to the next step by having the Switcher create a scheduled task or a Windows Management Instrumentation (WMI) event that would be triggered when a specific file appears in the synchronized folder.

If the attacker is not looking for stealthiness and persistence, another possible attack scenario would be to encrypt all of the files in the user's account and ask for a ransom to decrypt them -- an approach used successfully in recent years by ransomware programs.

According to Amichai Shulman, the chief technology officer at Imperva, these attacks against file synchronization services would be very hard to detect by antivirus programs, because the Switcher is not performing any unusual activity that could be interpreted as malware behavior.

In December, security researchers from Blue Coat reported an attack campaign against military, diplomatic and business targets that used a Swedish file synchronization service called CloudMe for command and control.