Filtering before join with date_trunc()

Поиск

Список

Период

Сортировка

От	Phil Endecott
Тема	Filtering before join with date_trunc()
Дата	15 октября 2018 г. 18:57:41
Msg-id	1539619061614@dmwebmail.dmwebmail.chezphil.org обсуждение исходный текст
Ответы	Re: Filtering before join with date_trunc() (Francisco Olarte <folarte@peoplecall.com>) Re: Filtering before join with date_trunc() (Adrian Klaver <adrian.klaver@aklaver.com>) Re: Filtering before join with date_trunc() (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-general

Дерево обсуждения

Dear Experts,

I have a few tables with "raw" timestamsps like this:

+-------------------------------+----------+
|             time              | pressure |
+-------------------------------+----------+
| 2018-09-14 00:00:07.148378+00 |  1007.52 |
| 2018-09-14 00:10:07.147506+00 |  1007.43 |
| 2018-09-14 00:20:07.147533+00 |  1007.28 |
+-------------------------------+----------+

For each of these tables I have a view which rounds the timestamp 
to the nearest minute, and ensures there is only one row per minute:

 SELECT date_trunc('minute'::text, tbl."time") AS "time",
    max(tbl.pressure) AS pressure
   FROM tbl
  GROUP BY (date_trunc('minute'::text, tbl."time"))
  ORDER BY (date_trunc('minute'::text, tbl."time"));

I then join these tables on the rounded time:

 SELECT COALESCE(rain."time", pressures."time", temperatures."time") AS "time",
    rain.rain,
    pressures.pressure,
    temperatures.temperature
   FROM rain
     FULL JOIN pressures USING ("time")
     FULL JOIN temperatures USING ("time");

+------------------------+------+----------+-------------+
|          time          | rain | pressure | temperature |
+------------------------+------+----------+-------------+
| 2018-09-14 00:00:00+00 |    0 |  1007.52 |      11.349 |
| 2018-09-14 00:10:00+00 |    0 |  1007.43 |     11.2317 |
| 2018-09-14 00:20:00+00 |    0 |  1007.28 |     11.2317 |
+------------------------+------+----------+-------------+

The COALESCE for time and the full joins are needed because some 
columns may be missing for some minutes.

Now I'd like to find the values for a particular short time period:

SELECT * FROM readings
WHERE "time" BETWEEN '2018-10-01T00:00:00' AND '2018-10-01T24:00:00'

This works, but it is inefficient; it seems to create all the rounded 
data, do the join on all of it, and then filter on the time period.  
Ideally it would filter the raw data, and then need to round and join 
far fewer rows.

It would not be difficult for me to round the timestamps when inserting 
the data, and also ensure that there is only one row for each minute. 
But I've done some experiments and even if I remove all the rounding and 
replace the full joins with regular joins, it still does sequential 
scans on at least one of the tables:

Nested Loop  (cost=12.95..144.99 rows=135 width=20)
   Join Filter: (x_rain."time" = x_pressures."time")
   ->  Hash Join  (cost=12.67..97.83 rows=135 width=24)
         Hash Cond: (x_temperatures."time" = x_rain."time")
         ->  Seq Scan on x_temperatures  (cost=0.00..67.50 rows=4350 width=12)
         ->  Hash  (cost=10.98..10.98 rows=135 width=12)
               ->  Index Scan using x_rain_by_time on x_rain  (cost=0.28..10.98 rows=135 width=12)
                     Index Cond: (("time" >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND ("time" <=
'2018-10-0200:00:00+00'::timestamp with time zone))
 
   ->  Index Scan using x_pressures_by_time on x_pressures  (cost=0.28..0.34 rows=1 width=12)
         Index Cond: ("time" = x_temperatures."time")

Maybe that is because the tables are currently relatively small (a 
few thousands rows) and it believes that sequential scans are faster. 
(I have sometimes wished for an "explain" variant that tells me what 
query plan it would choose if all the tables were 100X larger.)

Is there anything I can do to make this more efficient when the tables 
are larger?


Thanks for any suggestions.


Regards, Phil.

В списке pgsql-general по дате отправления:

Предыдущее

От: Durgamahesh Manne
Дата: 15 октября 2018 г., 18:56:12
Сообщение: Re: Regarding varchar max length in postgres

Следующее

От: Benoit Lobréau
Дата: 15 октября 2018 г., 19:09:55
Сообщение: Re: Setting up continuous archiving

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Filtering before join with date_trunc()

Предыдущее

Следующее