text search: restricting the number of parsed words in headline generation

Поиск

Список

Период

Сортировка

От	Sushant Sinha
Тема	text search: restricting the number of parsed words in headline generation
Дата	23 августа 2011 г. 17:19:35
Msg-id	1314117620.3700.12.camel@dragflick обсуждение исходный текст
Ответы	Re: text search: restricting the number of parsed words in headline generation (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

Given a document and a query, the goal of headline generation is to
produce text excerpts in which the query appears. Currently the headline
generation in postgres follows the following steps:

1. Tokenize the documents and obtain the lexemes
2. Decide on lexemes that should be the part of the headline
3. Generate the headline

So the time taken by the headline generation is directly dependent on
the size of the document. The longer the document, the more time taken
to tokenize and more lexemes to operate on.

Most of the time is taken during the tokenization phase and for very big
documents, the headline generation is very expensive.

Here is a simple patch that limits the number of words during the
tokenization phase and puts an upper-bound on the headline generation.
The headline function takes a parameter MaxParsedWords. If this
parameter is negative or not supplied, then the entire document is
tokenized  and operated on (the current behavior). However, if the
supplied MaxParsedWords is a positive number, then the tokenization
stops after MaxParsedWords is obtained. The remaining headline
generation happens on the tokens obtained till that point.

The current patch can be applied to 9.1rc1. It lacks changes to the
documentation and test cases. I will add them if you folks agree on the
functionality.

-Sushant.

Вложения

9.1rc1_max_parsed_words.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Dimitri Fontaine
Дата: 23 августа 2011 г., 17:04:21
Сообщение: Re: cheaper snapshots redux

Следующее

От: Dave Cramer
Дата: 23 августа 2011 г., 17:20:57
Сообщение: Why doesn't psql use the information schema to get ACL description ?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

text search: restricting the number of parsed words in headline generation

Вложения

Предыдущее

Следующее