Обсуждение: are there any methods to disable updating index before inserting large number tuples?
hi, friends,
Now each hour I want to insert about 20 millions tuples into table A, which has a btree index.
How to disable index update before inserting those 20 millions tuples, and then enable it after those tuples having being inserted? Just as the followings:
>disable index update
>insert into A ....//here will insert 20 millions tuples
> enable index update
The purpose is to improve the performance.
Thanks!
peng
Now each hour I want to insert about 20 millions tuples into table A, which has a btree index.
How to disable index update before inserting those 20 millions tuples, and then enable it after those tuples having being inserted? Just as the followings:
>disable index update
>insert into A ....//here will insert 20 millions tuples
> enable index update
The purpose is to improve the performance.
Thanks!
peng
Re: are there any methods to disable updating index before inserting large number tuples?
От
Andrew Sullivan
Дата:
On Tue, Nov 22, 2011 at 11:47:15PM +0800, sunpeng wrote: > >disable index update > >insert into A ....//here will insert 20 millions tuples > > enable index update > BEGIN; DROP INDEX; INSERT INTO A; CREATE INDEX; But I think performance on that table is going to be pretty bad. I suspect that COPY is going to be your friend here. A -- Andrew Sullivan ajs@crankycanuck.ca
Re: are there any methods to disable updating index before inserting large number tuples?
От
John R Pierce
Дата:
On 11/22/11 7:52 AM, Andrew Sullivan wrote: > But I think performance on that table is going to be pretty bad. I > suspect that COPY is going to be your friend here. indeed. 20M rows/hour is 5500 rows/second. you'd better have a seriously fast disk system, say, 20 15k RPM SAS drives in a RAID10 with a decent SAS raid controller that has 1GB of writeback battery-or-flash backed cache. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Re: are there any methods to disable updating index before inserting large number tuples?
От
Andres Freund
Дата:
Hi, On Tuesday 22 Nov 2011 19:01:02 John R Pierce wrote: > On 11/22/11 7:52 AM, Andrew Sullivan wrote: > > But I think performance on that table is going to be pretty bad. I > > suspect that COPY is going to be your friend here. > > indeed. 20M rows/hour is 5500 rows/second. you'd better have a > seriously fast disk system, say, 20 15k RPM SAS drives in a RAID10 with > a decent SAS raid controller that has 1GB of writeback battery-or-flash > backed cache. 20M rows inserted inside one transaction doesn't cause *that* many writes. I guess the bigger problem than the actual disk throughput because of heap/wal writes will be the index size once the table gets bigger. As soon as that reaches a size bigger than the available shared buffers the performance will suffer rather much. For that you probably need a sensible partitioning strategy... Which is likely to be important anyway to be able to throw away old data efficiently. Using COPY is advantageous in to using INSERT because it can do some operation in a bulk mode which INSERT cannot do. How wide will those rows be, how long do you plan to store the data, how are you querying it? Andres
Re: are there any methods to disable updating index before inserting large number tuples?
От
John R Pierce
Дата:
On 11/22/11 10:53 AM, Andres Freund wrote: > 20M rows inserted inside one transaction doesn't cause*that* many writes. indeed, I just ran a test on a fairly beefy server, a 2U HP DL180G6 with dual Sandy Bridge E5660 CPUs (12 cores, 24 threads), 48GB, and 20 x 15k SAS RAID10 on a 1GB SAS2 raid card for the database... its capable of fairly high IOPS. => create table x (id serial primary key, key integer, val text); NOTICE: CREATE TABLE will create implicit sequence "x_id_seq" for serial column "x.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x_pkey" for table "x" CREATE TABLE => \timing Timing is on. => insert into x (key,val) values (generate_series(1,20000000), to_char(trunc(random()*1000000),'999999')); INSERT 0 20000000 Time: 160205.678 ms => select * from x limit 10; id | key | val ----+-----+--------- 1 | 1 | 211882 2 | 2 | 161866 3 | 3 | 785931 4 | 4 | 673363 5 | 5 | 342551 6 | 6 | 753181 7 | 7 | 474393 8 | 8 | 309599 9 | 9 | 958952 10 | 10 | 382413 (10 rows) Time: 0.123 ms => select count(*) from x; count ---------- 20000000 (1 row) Time: 2014.837 ms => select count(*) from x; count ---------- 20000000 (1 row) Time: 1590.407 ms => create index on x (key); CREATE INDEX Time: 25392.231 ms and just for the heck of it some more inserts with and without the 2nd index... => insert into x (key,val) values (generate_series(1,20000000), to_char(trunc(random()*1000000),'999999')); INSERT 0 20000000 Time: 227619.123 ms => drop index x_key_idx; DROP INDEX Time: 1217.443 ms => insert into x (key,val) values (generate_series(1,20000000), to_char(trunc(random()*1000000),'999999')); INSERT 0 20000000 Time: 165521.898 ms => insert into x (key,val) values (generate_series(1,20000000), to_char(trunc(random()*1000000),'999999')); INSERT 0 20000000 Time: 164767.536 ms => select count(*) from x; count ---------- 80000000 (1 row) Time: 9048.421 ms => create index on x (key); CREATE INDEX Time: 116479.548 ms => insert into x (key,val) values (generate_series(1,20000000), to_char(trunc(random()*1000000),'999999')); INSERT 0 20000000 Time: 254630.371 ms so, with one index (primary key), 20M inserts of a couple small fields is taking 160 seconds. with two indexes, 225-250 seconds... creating that 2nd index is taking 115 seconds now. after this last step, the table is 9800MB, of which about 2GB is the PK and 3.4GB is the 2nd index. -- john r pierce N 37, W 122 santa cruz ca mid-left coast