Обсуждение: query by partial timestamp

Поиск
Список
Период
Сортировка

query by partial timestamp

От
Kirk Wythers
Дата:
I have a column of type TIMESTAMP, I'd like to query all records from 2011. If it were text I could use a partial such
as:

WHERE
    text ~ '2011'

There must be a simple way to pull the year part out of a timestamp format. Thanks in advance.




Re: query by partial timestamp

От
Raymond O'Donnell
Дата:
On 08/01/2013 22:19, Kirk Wythers wrote:
> I have a column of type TIMESTAMP, I'd like to query all records from
> 2011. If it were text I could use a partial such as:
>
> WHERE text ~ '2011'
>
> There must be a simple way to pull the year part out of a timestamp
> format. Thanks in advance.

You want the extract() function.

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: query by partial timestamp

От
Raymond O'Donnell
Дата:
On 08/01/2013 22:26, Raymond O'Donnell wrote:
> On 08/01/2013 22:19, Kirk Wythers wrote:
>> I have a column of type TIMESTAMP, I'd like to query all records from
>> 2011. If it were text I could use a partial such as:
>>
>> WHERE text ~ '2011'
>>
>> There must be a simple way to pull the year part out of a timestamp
>> format. Thanks in advance.
>
> You want the extract() function.

Sorry, meant to include the reference:

http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: query by partial timestamp

От
Rob Sargent
Дата:
On 01/08/2013 03:39 PM, Raymond O'Donnell wrote:
> On 08/01/2013 22:26, Raymond O'Donnell wrote:
>> On 08/01/2013 22:19, Kirk Wythers wrote:
>>> I have a column of type TIMESTAMP, I'd like to query all records from
>>> 2011. If it were text I could use a partial such as:
>>>
>>> WHERE text ~ '2011'
>>>
>>> There must be a simple way to pull the year part out of a timestamp
>>> format. Thanks in advance.
>>
>> You want the extract() function.
>
> Sorry, meant to include the reference:
>
> http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
>
> Ray.
>
Ray, I thought you were simply encouraging the OP to learn to fish :)


Re: query by partial timestamp

От
Steve Crawford
Дата:
On 01/08/2013 02:19 PM, Kirk Wythers wrote:
> I have a column of type TIMESTAMP, I'd like to query all records from 2011. If it were text I could use a partial
suchas: 
>
> WHERE
>     text ~ '2011'
>
> There must be a simple way to pull the year part out of a timestamp format. Thanks in advance.
>
>
>
>
As others pointed out, you can do this with "extract(...).

BUT, if your timestamp column is indexed (and if the index will
constrain your records to a sufficiently small subset of the table that
use of indexes is warranted) you may be better off using date
comparisons. I doubt the planner will use the indexes otherwise.

Cheers,
Steve



Re: query by partial timestamp

От
Gavan Schneider
Дата:
On Tuesday, January 8, 2013 at 09:26, Raymond O'Donnell wrote:

>On 08/01/2013 22:19, Kirk Wythers wrote:
>>I have a column of type TIMESTAMP, I'd like to query all records from
>>2011. If it were text I could use a partial such as:
>>
>>WHERE text ~ '2011'
>>
>>There must be a simple way to pull the year part out of a timestamp
>>format. Thanks in advance.
>
>You want the extract() function.
>
 From my perspective there are at least three ways to attack
this problem:

(I have not tested these, so apologies for the stupid syntax errors.)

1.  SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp);

2.  SELECT ... WHERE
         '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
     AND                            col_of_type_timestamp <= '2011-12-31'::TIMESTAMP;

3.  SELECT ... WHERE
         (col_of_type_timestamp, col_of_type_timestamp) OVERLAPS
         (DATE '2011-01-01', DATE '2012-01-01');

Is this the full list?

So... generalizing the original question: which approach would
yield the best performance and/or compliance with SQL standards?

I note Steve Crawford has (strongly) hinted that direct date
comparison is more likely to use an index (when available) so I
suspect this is the way to go, but would an index based on
extract(YEAR...) negate this difference?

Regards
Gavan Schneider



Re: query by partial timestamp

От
Tom Lane
Дата:
Gavan Schneider <pg-gts@snkmail.com> writes:
> From my perspective there are at least three ways to attack
> this problem:

> (I have not tested these, so apologies for the stupid syntax errors.)

> 1.  SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp);

> 2.  SELECT ... WHERE
>          '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
>      AND                            col_of_type_timestamp <= '2011-12-31'::TIMESTAMP;

> 3.  SELECT ... WHERE
>          (col_of_type_timestamp, col_of_type_timestamp) OVERLAPS
>          (DATE '2011-01-01', DATE '2012-01-01');

> Is this the full list?

Another possibility is date_trunc, viz

4. SELECT ... WHERE date_trunc(col_of_type_timestamp, 'year') = '2011-01-01'

You could also use BETWEEN, but that's just syntactic sugar for method 2.

Note that in both methods 2 and 3 it's easy to get the edge cases wrong;
in particular I think your version of method 2 gives the wrong answer
for later-than-midnight times on 2011-12-31, while #3 might (not sure)
give the wrong answer for exactly midnight on 2012-01-01.  These things
are fixable of course with a bit of care.  Personally I'd go with

    col >= '2011-01-01' AND col < '2012-01-01'

> So... generalizing the original question: which approach would
> yield the best performance and/or compliance with SQL standards?

> I note Steve Crawford has (strongly) hinted that direct date
> comparison is more likely to use an index (when available) so I
> suspect this is the way to go, but would an index based on
> extract(YEAR...) negate this difference?

Method 3 is not indexable at all and is unlikely to become so --- the
SQL standard's definition of OVERLAPS is squirrely enough that people
haven't bothered to think about optimizing it.  Method 2 works well with
a plain btree index on the timestamp column.  You can get method 1 to be
indexed if you create a functional index on "extract(year from col)";
but since the index would have pretty much no other use than answering
this exact type of query, that's not a very attractive alternative.
Method 4 is like method 1 --- you'd need a specialized index.

Note that in any case an index is not going to be helpful if the query
would need to fetch more than a few percent of the table.  The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.

Another thing to think about is whether you'll have related sorts of
queries that aren't about full years --- maybe sometimes you need a
month's worth of data, for example.  The BETWEEN-style query and a btree
index will adapt easily to non-year intervals, while the EXTRACT
approach will not, and date_trunc is rather limited as well.

            regards, tom lane


Re: query by partial timestamp

От
Kirk Wythers
Дата:

On Jan 8, 2013, at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.

There are only a few years worth of data, 2008 - 2012. However, the data consists of 15 min measurements and when renormalized (un-pivoted) is several hundred million records. It is conceivable that someone will want to query by month, or even hour of the day. 

Re: query by partial timestamp

От
Nathan Clayton
Дата:


On Jan 8, 2013 6:15 PM, "Kirk Wythers" <wythe001@umn.edu> wrote:
>
>
> On Jan 8, 2013, at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> The OP didn't
>> suggest how many years his data covers, but it's quite possible that
>> pulling a full year's worth of data will read enough of the table that
>> there's no point in worrying about whether an index could be used
>> anyway.
>
>
> There are only a few years worth of data, 2008 - 2012. However, the data consists of 15 min measurements and when renormalized (un-pivoted) is several hundred million records. It is conceivable that someone will want to query by month, or even hour of the day. 

If that's the case, you may want to look at creating a date dimension and possibly a time dimension for your data analysis (there's a good one to start with on the PostgreSQL wiki). I would highly recommend that you take a look at some dimensional modeling concepts (Kimball is a good place to start).

Also, you may want to look at partitioning the data if it's several hundred million rows.

Re: query by partial timestamp

От
Steve Crawford
Дата:
On 01/08/2013 06:15 PM, Kirk Wythers wrote:

On Jan 8, 2013, at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.

There are only a few years worth of data, 2008 - 2012. However, the data consists of 15 min measurements and when renormalized (un-pivoted) is several hundred million records. It is conceivable that someone will want to query by month, or even hour of the day. 

As another poster mentioned, you may want to consider partitioning the table not only for performance but also for eventual archiving/purging of the data.

As long as we are looking at a variety of alternatives, appropriate construction of partial indexes and the query *might* be of value but at Tom and I mentioned previously, indexes become more of a hindrance than a help once you start writing queries that access too much of the table so the planner won't use them in those cases.

Cheers,
Steve

Re: query by partial timestamp

От
Michael Nolan
Дата:
On 1/8/13, Gavan Schneider <pg-gts@snkmail.com> wrote:

> 2.  SELECT ... WHERE
>          '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
>      AND                            col_of_type_timestamp <=
> '2011-12-31'::TIMESTAMP;

This won't  quite work, because '2011-12-31'::TIMESTAMP
is the same as 2011-12-31 00:00:00.00000
so records timestamped later in the day on the 31st would not get selected

  SELECT ... WHERE
         '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
     AND                 col_of_type_timestamp < '2012-01:01'::TIMESTAMP;

would get all records with a 2011 timestamp.
--
Mike Nolan


Re: query by partial timestamp

От
Michael Nolan
Дата:
It is probably not the most efficient, but I often use this syntax,
which reads better.

Select ..... where col_type_timestamp::date between '2011-01-01' and
'2011-12-31'

This will use a timestamp index.
--
Mike Nolan


Re: query by partial timestamp

От
hubert depesz lubaczewski
Дата:
On Tue, Jan 08, 2013 at 04:19:59PM -0600, Kirk Wythers wrote:
> I have a column of type TIMESTAMP, I'd like to query all records from 2011. If it were text I could use a partial
suchas: 
>
> WHERE
>     text ~ '2011'
>
> There must be a simple way to pull the year part out of a timestamp format. Thanks in advance.

using partial checks (like extract, date_part, or even casting field to
date) will have problem with index usage.
the best way to handle it, is to write the parameters using date
arithmetic.
like:
where column >= '2011-01-01' and column < '2012-01-01'

do not be tempted to do:
where column >= '2011-01-01' and column <='2011-12-31'
which is very bad idea, and will cause data loss.

More on index usage:
http://www.depesz.com/2010/09/09/why-is-my-index-not-being-used/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


Re: query by partial timestamp

От
Gavan Schneider
Дата:
On Wednesday, January 9, 2013 at 04:42, Michael Nolan wrote:

>On 1/8/13, Gavan Schneider wrote:
>
>>2.  SELECT ... WHERE
>>'2011-01-01'::TIMESTAMP <= col_of_type_timestamp
>>AND                            col_of_type_timestamp <=
>>'2011-12-31'::TIMESTAMP;
>
>This won't  quite work, because '2011-12-31'::TIMESTAMP
>is the same as 2011-12-31 00:00:00.00000
>so records timestamped later in the day on the 31st would not get selected
>
>SELECT ... WHERE
>'2011-01-01'::TIMESTAMP <= col_of_type_timestamp
>AND                 col_of_type_timestamp < '2012-01:01'::TIMESTAMP;
>
>would get all records with a 2011 timestamp.
>
Thank you. I was wondering where Tom and Depesz were coming from
when they both said less than or equal to the
'2011-12-31'::TIMESTAMP would miss data. I was giving it a rest
before re-reading, testing, and/or asking 'the right question'.
You have supplied the missing part to my puzzle.

Mostly I use DATE so have not had much practice wrestling the
TIMESTAMP edge cases. I also prefer the closed-open equality
tests as you suggest especially as they are the 'only way to go'
when grouping data on a monthly basis. My only 'defense' is that
I tried to craft my examples as close as possible to the OP
statement and not introduce the 'next year' unless forced...
lame I know. :)

Regards
Gavan Schneider