Обсуждение: [9.4] SELECT repeatedly rewrites a table

Поиск
Список
Период
Сортировка

[9.4] SELECT repeatedly rewrites a table

От
Marko Kreen
Дата:
We recently upgraded 9.3 to 9.4 in live environment and one batch-job
query floods system with write requests.  Query itself is basically
seq-scan on insert-only table.

DB: Postgres 9.4.4
OS: Ubuntu 12.04, 64-bit
HW: AWS EC2, EBS, 16G RAM

Story:

1) New VM is created, fresh initdb
2) Data is replicated in, with COPY (Londiste), indexes are created.
3) ANALYZE
4) Let machine run under normal load couple hours.  It is not reproducible
   immediately on fresh VM.
5) Run query (with DECLARE data_maint_cur NO SCROLL CURSOR WITH HOLD FOR).
   The table has seen only inserts thus far.  Here is EXPLAIN of normal run:

    chatdb=# explain analyze select account_id, max(event_nr) as first_event_nr from account.event where store_time <
now()- '1 month'::interval group by 1; 
                                                             QUERY PLAN
         

-----------------------------------------------------------------------------------------------------------------------------
     HashAggregate  (cost=843632.92..843634.31 rows=139 width=24) (actual time=11938.390..12201.234 rows=285135
loops=1)
       Group Key: account_id
       ->  Seq Scan on event  (cost=0.00..837636.04 rows=1199378 width=24) (actual time=166.772..10883.828 rows=1049812
loops=1)
             Filter: (store_time < (now() - '1 mon'::interval))
             Rows Removed by Filter: 24698500
     Planning time: 1.194 ms
     Execution time: 12406.133 ms

6) So good run is 12s, bad run is ~1h with huge write load.  The table is ~3G,
   but Postgres seems to repeatedly rewrite those files.

7) It seems when I cancel bad run and VACUUM the table, it will proceed normally.


Attached is strace during bad run and backtrace.

fds for strace:

    lrwx------ 1 postgres postgres 64 Jul 16 04:01 17 -> /var/lib/postgresql/9.4/main/base/16385/19297
    lrwx------ 1 postgres postgres 64 Jul 16 04:01 18 -> /var/lib/postgresql/9.4/main/base/16385/19297.1
    lrwx------ 1 postgres postgres 64 Jul 16 04:01 19 -> /var/lib/postgresql/9.4/main/base/16385/19297.2

--
marko


Вложения

Re: [9.4] SELECT repeatedly rewrites a table

От
Marko Kreen
Дата:
On Fri, Jul 17, 2015 at 02:22:59PM +0300, Marko Kreen wrote:
> We recently upgraded 9.3 to 9.4 in live environment and one batch-job
> query floods system with write requests.  Query itself is basically
> seq-scan on insert-only table.

I managed to get EXPLAIN of bad run.  Seems bug is dependant on "DECLARE
.. NO SCROLL CURSOR WITH HOLD FOR ...".

1) On evening I launced 2 VM's, let's call then VMSELECT and VMCURSOR.
2) In the morning I run "EXPLAIN SELECT .." on VMSELECT twice - one after other.
   And on VMCURSOR "EXPLAIN DECLATE CURSOR .." twice - one after other.

VMCURSOR run #1 - the bad run
-----------------------------

chatdb=# explain (analyze true, buffers true, timing true)
chatdb-# DECLARE data_maint_cur NO SCROLL CURSOR WITH HOLD FOR
chatdb-# select account_id, max(event_nr) as first_event_nr from account.event where store_time < now() - '1
month'::intervalgroup by 1; 
                                                                   QUERY PLAN
                        

------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..8965079.90 rows=82 width=24) (actual time=37.752..9941575.182 rows=301857 loops=1)
   Group Key: account_id
   Buffers: shared hit=1417642 read=16900923 dirtied=15670349 written=11141824
   I/O Timings: read=1826689.835 write=8017738.089
   ->  Index Scan using event_pkey on event  (cost=0.56..8960823.53 rows=851110 width=24) (actual
time=30.470..9940359.172rows=859710 loops=1) 
         Filter: (store_time < (now() - '1 mon'::interval))
         Rows Removed by Filter: 28963464
         Buffers: shared hit=1417642 read=16900923 dirtied=15670349 written=11141824
         I/O Timings: read=1826689.835 write=8017738.089
 Planning time: 1.453 ms
 Execution time: 9941803.330 ms
(11 rows)

VMCURSOR run #2 - no so bad
---------------------------

chatdb=# explain (analyze true, buffers true, timing true) DECLARE data_maint_cur NO SCROLL CURSOR WITH HOLD FOR
chatdb-# select account_id, max(event_nr) as first_event_nr from account.event where store_time < now() - '1
month'::intervalgroup by 1; 
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..9291947.71 rows=89 width=24) (actual time=0.055..126290.079 rows=304188 loops=1)
   Group Key: account_id
   Buffers: shared hit=1803475 read=16580860 dirtied=13723 written=10176
   I/O Timings: read=82540.306 write=1187.116
   ->  Index Scan using event_pkey on event  (cost=0.56..9287343.81 rows=920602 width=24) (actual
time=0.040..125176.023rows=957393 loops=1) 
         Filter: (store_time < (now() - '1 mon'::interval))
         Rows Removed by Filter: 28999419
         Buffers: shared hit=1803475 read=16580860 dirtied=13723 written=10176
         I/O Timings: read=82540.306 write=1187.116
 Planning time: 0.071 ms
 Execution time: 126510.051 ms
(11 rows)

VMSELECT run #1
---------------

For some reason it rewrites the table, once.

chatdb=# explain (analyze true, buffers true, timing true)
chatdb-# select account_id, max(event_nr) as first_event_nr from account.event
chatdb-# where store_time < now() - '1 month'::interval group by 1;
                                                         QUERY PLAN
    

----------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=974789.39..974790.27 rows=88 width=24) (actual time=276382.073..276652.235 rows=301857 loops=1)
   Group Key: account_id
   Buffers: shared hit=358 read=448265 dirtied=418624 written=418225
   I/O Timings: read=246250.610 write=15802.735
   ->  Seq Scan on event  (cost=0.00..970328.57 rows=892164 width=24) (actual time=627.299..275438.301 rows=855133
loops=1)
         Filter: (store_time < (now() - '1 mon'::interval))
         Rows Removed by Filter: 28955031
         Buffers: shared hit=358 read=448265 dirtied=418624 written=418225
         I/O Timings: read=246250.610 write=15802.735
 Planning time: 1.301 ms
 Execution time: 276869.555 ms
(11 rows)

VMSELECT run #2
---------------

chatdb=# explain (analyze true, buffers true, timing true)
select account_id, max(event_nr) as first_event_nr from account.event
where store_time < now() - '1 month'::interval group by 1;
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=975156.40..975157.28 rows=88 width=24) (actual time=26036.503..26311.292 rows=301857 loops=1)
   Group Key: account_id
   Buffers: shared hit=553 read=448232 dirtied=117
   I/O Timings: read=14424.903
   ->  Seq Scan on event  (cost=0.00..970678.96 rows=895488 width=24) (actual time=833.044..25130.745 rows=858416
loops=1)
         Filter: (store_time < (now() - '1 mon'::interval))
         Rows Removed by Filter: 28962467
         Buffers: shared hit=553 read=448232 dirtied=117
         I/O Timings: read=14424.903
 Planning time: 0.102 ms
 Execution time: 26518.603 ms

VMSELECT run #3 - now with cursor here too - looks ok
-----------------------------------------------------

chatdb=# explain (analyze true, buffers true, timing true) DECLARE data_maint_cur NO SCROLL CURSOR WITH HOLD FOR
chatdb-# select account_id, max(event_nr) as first_event_nr from account.event where store_time < now() - '1
month'::intervalgroup by 1; 
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..9230874.03 rows=94 width=24) (actual time=0.220..444906.246 rows=304184 loops=1)
   Group Key: account_id
   Buffers: shared hit=1790170 read=16588868 dirtied=78190 written=32718
   I/O Timings: read=398411.728 write=2608.564
   ->  Index Scan using event_pkey on event  (cost=0.56..9226058.00 rows=963019 width=24) (actual
time=0.183..443823.884rows=946397 loops=1) 
         Filter: (store_time < (now() - '1 mon'::interval))
         Rows Removed by Filter: 29000509
         Buffers: shared hit=1790170 read=16588868 dirtied=78190 written=32718
         I/O Timings: read=398411.728 write=2608.564
 Planning time: 12.062 ms
 Execution time: 445120.327 ms
(11 rows)


Does this ring any bells?  What more info is needed?

--
marko