Обсуждение: The fastest way to update thousands of rows in moderately sized table
Hello, I have a table with 30 million records in which I need to update a single column for a couple of thousands of rows, let's say 10 000. The new column value is identical for all matching rows. Doing
takes 45 seconds. I understand that
This got me to about 37 seconds. Still pretty slow. The
Is there anything else worth trying? Are these numbers something to be expected, from your experience?
I have Postgres 9.4, the database is on SSD.
Thank you very much for any suggestions.
Standa
View this message in context: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
update "TRANSLATION" set fk_assignment where fk_job = 1000;
takes 45 seconds. I understand that
UPDATE
is basically an INSERT
followed by DELETE
but I was hoping I could do better than that. I found a suggestion to use a temporary table to speed things up, so now I have this: create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;
update "temp_table" set fk_assignment = null;
update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;
drop table "temp_table";
This got me to about 37 seconds. Still pretty slow. The
TRANSLATION
has an index and a foreign key constraint on fk_assignment
. Removing the constraint brought very little benefit. Removing the index is probably out of question as these kind of operations are very frequent and the table itself is used heavily, including the index. Execution plan: Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983 width=405) (actual time=43262.266..43262.266 rows=0 loops=1)
-> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual time=0.566..146.084 rows=8920 loops=1)
-> Seq Scan on temp_segs _source (cost=0.00..218.83 rows=13983 width=22) (actual time=0.457..13.994 rows=8920 loops=1)
-> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" _target (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1 loops=8920)
Index Cond: (id = _source.id)
Planning time: 1.167 ms
Execution time: 43262.577 ms
Is there anything else worth trying? Are these numbers something to be expected, from your experience?
I have Postgres 9.4, the database is on SSD.
Thank you very much for any suggestions.
Standa
View this message in context: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 07/23/2015 01:17 PM, twoflower wrote: > Hello, I have a table with 30 million records in which I need to update > a single column for a couple of thousands of rows, let's say 10 000. The > new column value is identical for all matching rows. Doing > > |update "TRANSLATION" set fk_assignment where fk_job = 1000; | > > takes 45 seconds. I understand that |UPDATE| is basically an |INSERT| > followed by |DELETE| but I was hoping I could do better than that. I > found a suggestion to use a temporary table to speed things up, so now I > have this: > > |create unlogged table "temp_table" as > select id, fk_assignment > from "TRANSLATION" > where fk_job = 1000; > > update "temp_table" set fk_assignment = null; > > update "TRANSLATION" _target > set fk_assignment = _source.fk_assignment > from "temp_table" _source > where _target.id = _source.id; > > drop table "temp_table"; Have you tried wrapping the above in a BEGIN/COMMIT block? So: BEGIN; create unlogged table "temp_table" as select id, fk_assignment from "TRANSLATION" where fk_job = 1000; update "temp_table" set fk_assignment = null; update "TRANSLATION" _target set fk_assignment = _source.fk_assignment from "temp_table" _source where _target.id = _source.id; drop table "temp_table"; COMMIT; > | > > This got me to about 37 seconds. Still pretty slow. The |TRANSLATION| > has an index and a foreign key constraint on |fk_assignment|. Removing > the constraint brought very little benefit. Removing the index is > probably out of question as these kind of operations are very frequent > and the table itself is used heavily, including the index. Execution plan: > > |Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983 > width=405) (actual time=43262.266..43262.266 rows=0 loops=1) > -> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual > time=0.566..146.084 rows=8920 loops=1) > -> Seq Scan on temp_segs _source (cost=0.00..218.83 rows=13983 > width=22) (actual time=0.457..13.994 rows=8920 loops=1) > -> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" _target > (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1 > loops=8920) > Index Cond: (id = _source.id) > > Planning time: 1.167 ms > Execution time: 43262.577 ms > | > Is there anything else worth trying? Are these numbers something to be > expected, from your experience? > > I have Postgres 9.4, the database is on SSD. > > Thank you very much for any suggestions. > > Standa > ------------------------------------------------------------------------ > View this message in context: The fastest way to update thousands of > rows in moderately sized table > <http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144.html> > Sent from the PostgreSQL - general mailing list archive > <http://postgresql.nabble.com/PostgreSQL-general-f1843780.html> at > Nabble.com. -- Adrian Klaver adrian.klaver@aklaver.com
Yes, I am running the tests inside aAdrian Klaver-4 wroteHave you tried wrapping the above in a BEGIN/COMMIT block?
BEGIN TRANSACTION
/ ROLLBACK
block. View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Try creating an index on TRANSLATION fk_job.
From the Execution plan you posted,
->(actual time=43262.266..43262.266 rows=0 loops=1)
actually, there is no row to be updated.
So, if you have a index on TRANSLATION fk_job, the update sql as behind should be finished within several seconds.
->update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;
The index is already there.林士博 wroteTry creating an index on TRANSLATION fk_job.
View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Sorry, my mistake.
Execution plan of an update sql seems always get rows=0.
Can you post execution plan of the original update sql.
EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;
Here it is:林士博 wroteCan you post execution plan of the original update sql. EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;
Update on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=35091.036..35091.036 rows=0 loops=1)
Buffers: shared hit=74842343 read=7242 dirtied=7513
-> Index Scan using "TRANSLATION_idx_composite_job_last_revision" on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=0.042..24.147 rows=8920 loops=1)
Index Cond: (fk_job = 59004)
Buffers: shared hit=626
Planning time: 0.362 ms Execution time: 35091.192 ms
View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
OK.
In your case, I can not see any reasons that
using a temp table to do joint-update is faster than directly updating.
And from the execution plan, index scan just takes very little time.
Most of the time is doing insert/delete.
As you mentioned, fk_assignmentwhere is updated frequently,
and disabling indexes before updating is not an option you can choose,
try setting fillfactor of this table to a smaller value.
And maybe you can also check the time of inserting 8920 rows to this table.
If it is far less than 35s.
you can consider to do update in this way:
1.create a temp table with all columns of fk_assignmentwhere,
likes
create unlogged table "temp_table" as
select id, ..... , 1000 as fk_job
from "TRANSLATION"
where fk_job = 1000;
2.delete rows from original table.
3.inert rows from temp table to original table.
At the end , may be you can check if postgresql can insert start from the position of HWM(High Water Mark).
2015-07-24 15:58 GMT+09:00 twoflower <standa.kurik@gmail.com>:
Here it is:林士博 wroteCan you post execution plan of the original update sql. EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;Update on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=35091.036..35091.036 rows=0 loops=1)
Buffers: shared hit=74842343 read=7242 dirtied=7513
-> Index Scan using "TRANSLATION_idx_composite_job_last_revision" on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual time=0.042..24.147 rows=8920 loops=1)
Index Cond: (fk_job = 59004)
Buffers: shared hit=626
Planning time: 0.362 ms Execution time: 35091.192 ms
View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
─repica group──────────────────
▼ポイント×電子マネー×メールで店舗販促に必要な機能を全て提供!
【point+plus】http://www.repica.jp/pointplus/
▼フォローアップメールや外部連携に対応!
【mail solution】http://ms.repica.jp/
▼9年連続シェアNo.1 個人情報漏えい対策ソフト
【P-Pointer】http://ppointer.jp/
▼単月導入可能!AR動画再生アプリ
▼ITビジネスを創造しながら未来を創る
【VARCHAR】http://varchar.co.jp/
───────────────────────────
Thank you, I will look into those suggestions. Meanwhile, I started experimenting with partitioning the table into smaller tables, each holding rows with ID spanning 1 million values and using this approach, the UPDATE takes 300ms. I have to check if all the SELECTs I am issuing against the original table keep their performance, but so far it seems they do, if the appropriate indexes are present on the child tables. I was worried about the overhead of each query having to go through all (currently) 58 partition tables, but it seems like it's not that big of a deal. -- View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859203.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Yes, partitioning by fk_job can significantly improve performance of this update.
And all the SELECTs with definited fk_job can be faster.
All you should check carefully is those SELECTs without definited fk_job.
2015-07-24 17:18 GMT+09:00 twoflower <standa.kurik@gmail.com>:
Thank you, I will look into those suggestions.
Meanwhile, I started experimenting with partitioning the table into smaller
tables, each holding rows with ID spanning 1 million values and using this
approach, the UPDATE takes 300ms. I have to check if all the SELECTs I am
issuing against the original table keep their performance, but so far it
seems they do, if the appropriate indexes are present on the child tables. I
was worried about the overhead of each query having to go through all
(currently) 58 partition tables, but it seems like it's not that big of a
deal.
--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859203.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
─repica group──────────────────
▼ポイント×電子マネー×メールで店舗販促に必要な機能を全て提供!
【point+plus】http://www.repica.jp/pointplus/
▼フォローアップメールや外部連携に対応!
【mail solution】http://ms.repica.jp/
▼9年連続シェアNo.1 個人情報漏えい対策ソフト
【P-Pointer】http://ppointer.jp/
▼単月導入可能!AR動画再生アプリ
▼ITビジネスを創造しながら未来を創る
【VARCHAR】http://varchar.co.jp/
───────────────────────────
In fact I did not partition by fk_job but by the id (primary key) instead though partitioning by fk_job was my first idea.
I use various columns when querying the table, fk_job is not always there.
View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
I use various columns when querying the table, fk_job is not always there.
View this message in context: Re: The fastest way to update thousands of rows in moderately sized table
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Thanks for sharing the results of your experiment !