Обсуждение: Index to help ordering?

Поиск
Список
Период
Сортировка

Index to help ordering?

От
James David Smith
Дата:
Dear all,

I have a table of some 100m rows of data. There are 5 columns of data.
When I want to look at this data I typically want to sort it by ppid
(numeric) an then by time (timestamp). This simple select can often
take 15-20 seconds. Would adding an index to these two columns make it
quicker? What is the best way to achieve increased speed for this
common select query?

Thanks

James


Re: Index to help ordering?

От
Daniel Staal
Дата:
--As of January 17, 2014 4:41:26 PM +0000, James David Smith is alleged to
have said:

> I have a table of some 100m rows of data. There are 5 columns of data.
> When I want to look at this data I typically want to sort it by ppid
> (numeric) an then by time (timestamp). This simple select can often
> take 15-20 seconds. Would adding an index to these two columns make it
> quicker? What is the best way to achieve increased speed for this
> common select query?

--As for the rest, it is mine.

An index is likely to make it faster, although I wouldn't quite want to
guarantee it.  (I assume you are pulling in all the data?  Otherwise an
index on your *conditions* might make sense.  Sorting the data likely takes
less time than retrieving all of it.)  Luckily enough it's easy to just try
it and check.

Since we are working on that one query in specific, I'd probably create an
index on those two columns, in that order: `CREATE INDEX ixd_name ON table
(ppid, time)`.  I'm of course assuming that ppid's aren't unique.

But again, if you are retrieving the entire table this may not help - what
would help more is to limit the number of records you are retrieving and
put an index on that condition.  (Of course, if you are using LIMIT, then
the index will definitely help.)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


Re: Index to help ordering?

От
Sameer Kumar
Дата:

On Sat, Jan 18, 2014 at 4:47 AM, Daniel Staal <DStaal@usa.net> wrote:
--As of January 17, 2014 4:41:26 PM +0000, James David Smith is alleged to have said:

I have a table of some 100m rows of data. There are 5 columns of data.
When I want to look at this data I typically want to sort it by ppid
(numeric) an then by time (timestamp). This simple select can often
take 15-20 seconds. Would adding an index to these two columns make it
quicker? What is the best way to achieve increased speed for this
common select query?

--As for the rest, it is mine.

An index is likely to make it faster, although I wouldn't quite want to guarantee it.  (I assume you are pulling in all the data?  Otherwise an index on your *conditions* might make sense.  Sorting the data likely takes less time than retrieving all of it.)  Luckily enough it's easy to just try it and check.

Since we are working on that one query in specific, I'd probably create an index on those two columns, in that order: `CREATE INDEX ixd_name ON table (ppid, time)`.  I'm of course assuming that ppid's aren't unique.

But again, if you are retrieving the entire table this may not help - what would help more is to limit the number of records you are retrieving and put an index on that condition.  (Of course, if you are using LIMIT, then the index will definitely help.)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------



 
In addition to what has been already said, CLUSTERing the table by given index could be of some more help. 



Best Regards,
Sameer Kumar | Database Consultant
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
M : +65 8110 0350 T: +65 6438 3504 | www.ashnik.com
www.facebook.com/ashnikbiz | www.twitter.com/ashnikbiz

email patch

This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).

Вложения

Re: Index to help ordering?

От
James David Smith
Дата:
On 17 January 2014 20:47, Daniel Staal <DStaal@usa.net> wrote:
> --As of January 17, 2014 4:41:26 PM +0000, James David Smith is alleged to
> have said:
>
>> I have a table of some 100m rows of data. There are 5 columns of data.
>> When I want to look at this data I typically want to sort it by ppid
>> (numeric) an then by time (timestamp). This simple select can often
>> take 15-20 seconds. Would adding an index to these two columns make it
>> quicker? What is the best way to achieve increased speed for this
>> common select query?
>
>
> --As for the rest, it is mine.
>
> An index is likely to make it faster, although I wouldn't quite want to
> guarantee it.  (I assume you are pulling in all the data?  Otherwise an
> index on your *conditions* might make sense.  Sorting the data likely takes
> less time than retrieving all of it.)  Luckily enough it's easy to just try
> it and check.
>
> Since we are working on that one query in specific, I'd probably create an
> index on those two columns, in that order: `CREATE INDEX ixd_name ON table
> (ppid, time)`.  I'm of course assuming that ppid's aren't unique.
>
> But again, if you are retrieving the entire table this may not help - what
> would help more is to limit the number of records you are retrieving and put
> an index on that condition.  (Of course, if you are using LIMIT, then the
> index will definitely help.)

Hi Daniel,

ppid is not unique.
I am often not pulling in all of the day. Far from it.

Basically I have about 75,000 people (ppid) and a record for each
minute of the day for each person. So 24 hours x 60 minutes x 75,000
people.

I 'normally' want to just view one person's 'day' at a time. So I
might do something like this:

SELECT ppid, point_time, mode, concentration FROM table WHERE ppid =
'43' ORDER BY point_time;

Does this revise how you think I should go about speeding up this query?

Thanks for your help,

James


Re: Index to help ordering?

От
Daniel Staal
Дата:
--As of January 20, 2014 10:35:41 AM +0000, James David Smith is alleged to
have said:

> Basically I have about 75,000 people (ppid) and a record for each
> minute of the day for each person. So 24 hours x 60 minutes x 75,000
> people.
>
> I 'normally' want to just view one person's 'day' at a time. So I
> might do something like this:
>
> SELECT ppid, point_time, mode, concentration FROM table WHERE ppid =
> '43' ORDER BY point_time;
>
> Does this revise how you think I should go about speeding up this query?

--As for the rest, it is mine.

Quite a bit.  ;)  First off: If you only want to view one day at a time,
I'd put that into the WHERE clause.  This could be done several ways:
BETWEEN start and end time of the day, or you could reformat to exclude the
hours/minutes, etc.  (If it's only the most recent day, easiest is probably
to ask for anything newer than midnight last night - or whenever your day
begins.)

Secondly, I'd probably advise two indexes: One on ppid, and one on however
you decide to select the time.  (Note that you can create an index on a
function: if you wanted to use 'date_trunc' to get the day, you could
create an index on `date_trunc('day', point_time)`, and a WHERE clause
would use that index instead of computing it for each item at query time.)

Once I'd done that (and run EXPLAIN, etc. to see how things are going.), if
things were still too slow I might think about other options.  You might
think about partitioning the table on either the ppid or the point_time,
depending on how easy it is to set up and how good you are at automating
the partitioning process.  (You have to create a sub-table for each
partition, and how you divide things best would take some fiddling.)

But I think if you put the time into the query and create those two
indexes, you'll find things are likely fast enough.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------