Re: Performance issues when the number of records are around 10 Million

Поиск
Список
Период
Сортировка
От venu madhav
Тема Re: Performance issues when the number of records are around 10 Million
Дата
Msg-id AANLkTiliM8miKiQsd5eqtw8JhN7wAm7bR0z40zUweI1D@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance issues when the number of records are around 10 Million  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance


On Wed, May 12, 2010 at 5:25 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
venu madhav  wrote:

>>> AND e.timestamp >= '1270449180'
>>> AND e.timestamp < '1273473180'
>>> ORDER BY.
>>> e.cid DESC,
>>> e.cid DESC
>>> limit 21
>>> offset 10539780

> The second column acts as a secondary key for sorting if the
> primary sorting key is a different column. For this query both of
> them are same.

Any chance you could just leave the second one off in that case?
[Venu] Yes, that can be ignored. But am not sure that removing it would reduce the time drastically.

> This query is part of an application which allows user to select
> time ranges and retrieve the data in that interval. Hence the time
> stamp.

Which, of course, is going to affect the number of rows.  Which
leaves me wondering how you know that once you select and sequence
the result set you need to read past and ignore exactly 10539780
rows to get to the last page.
[Venu]For Ex:  My database has 10539793 records. My application first calculates the count of number of records in that interval. And then based on user request to display 10/20/30/40 records in one page, it calculates how many records to be displayed when the last link is clicked.

> To have it in some particular order we're doing order by.

Which will affect which rows are at any particular offset.
[Venu]Yes, by default it has the primary key for order by.

> If the records are more in the interval,

How do you know that before you run your query?
 [Venu] I calculate the count first.

> we display in sets of 20/30 etc. The user also has the option to
> browse through any of those records hence the limit and offset.

Have you considered alternative techniques for paging?  You might
use values at the edges of the page to run a small query (limit, no
offset) when they page.  You might generate all the pages on the
first pass and cache them for a while.

[Venu] If generate all the pages at once, to retrieve all the 10 M records at once, it would take much longer time and since the request from the browser, there is a chance of browser getting timed out.
> When the user asks for the last set of 20 records, this query gets
> executed.

The DESC on the ORDER BY makes it look like you're trying to use the
ORDER BY to get to the end, but then your offset tells PostgreSQL to
skip the 10.5 million result rows with the highest keys.  Is the
"last page" the one with the highest or lowest values for cid?

[Venu] The last page contains the lowest values of cid. By default we get the records in the decreasing order of cid and then get the last 10/20.

Thank you,
Venu.
-Kevin



В списке pgsql-performance по дате отправления:

Предыдущее
От: venu madhav
Дата:
Сообщение: Re: Performance issues when the number of records are around 10 Million
Следующее
От: venu madhav
Дата:
Сообщение: Re: Performance issues when the number of records are around 10 Million