Обсуждение: are there any methods to disable updating index before inserting large number tuples?

Поиск
Список
Период
Сортировка

are there any methods to disable updating index before inserting large number tuples?

От
sunpeng
Дата:
hi, friends,
  Now each hour I want to insert about 20 millions tuples into table A,  which has a  btree index.
  How to disable index update before inserting those 20 millions tuples, and then enable it after those tuples having being inserted? Just as the followings:
  >disable index update
  >insert into A ....//here will insert 20 millions tuples
  > enable index update

  The purpose is to improve the performance.
  Thanks!
peng

Re: are there any methods to disable updating index before inserting large number tuples?

От
Andrew Sullivan
Дата:
On Tue, Nov 22, 2011 at 11:47:15PM +0800, sunpeng wrote:
>   >disable index update
>   >insert into A ....//here will insert 20 millions tuples
>   > enable index update
>

BEGIN;
DROP INDEX;
INSERT INTO A;
CREATE INDEX;

But I think performance on that table is going to be pretty bad.  I
suspect that COPY is going to be your friend here.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: are there any methods to disable updating index before inserting large number tuples?

От
John R Pierce
Дата:
On 11/22/11 7:52 AM, Andrew Sullivan wrote:
> But I think performance on that table is going to be pretty bad.  I
> suspect that COPY is going to be your friend here.

indeed.  20M rows/hour is 5500 rows/second.  you'd better have a
seriously fast disk system, say, 20 15k RPM SAS drives in a RAID10 with
a decent SAS raid controller that has 1GB of writeback battery-or-flash
backed cache.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast


Re: are there any methods to disable updating index before inserting large number tuples?

От
Andres Freund
Дата:
Hi,

On Tuesday 22 Nov 2011 19:01:02 John R Pierce wrote:
> On 11/22/11 7:52 AM, Andrew Sullivan wrote:
> > But I think performance on that table is going to be pretty bad.  I
> > suspect that COPY is going to be your friend here.
>
> indeed.  20M rows/hour is 5500 rows/second.  you'd better have a
> seriously fast disk system, say, 20 15k RPM SAS drives in a RAID10 with
> a decent SAS raid controller that has 1GB of writeback battery-or-flash
> backed cache.
20M rows inserted inside one transaction doesn't cause *that* many writes. I
guess the bigger problem than the actual disk throughput because of heap/wal
writes will be the index size once the table gets bigger. As soon as that
reaches a size bigger than the available shared buffers the performance will
suffer rather much.
For that you probably need a sensible partitioning strategy... Which is likely
to be important anyway to be able to throw away old data efficiently.

Using COPY is advantageous in to using INSERT because it can do some operation
in a bulk mode which INSERT cannot do.

How wide will those rows be, how long do you plan to store the data, how are
you querying it?
Andres

Re: are there any methods to disable updating index before inserting large number tuples?

От
John R Pierce
Дата:
On 11/22/11 10:53 AM, Andres Freund wrote:
> 20M rows inserted inside one transaction doesn't cause*that*  many writes.

indeed, I just ran a test on a fairly beefy server, a 2U HP DL180G6 with
dual Sandy Bridge E5660 CPUs (12 cores, 24 threads), 48GB, and 20 x 15k
SAS RAID10 on a 1GB SAS2 raid card for the database...   its capable of
fairly high IOPS.


=> create table x (id serial primary key, key integer, val text);
NOTICE:  CREATE TABLE will create implicit sequence "x_id_seq" for
serial column "x.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "x_pkey"
for table "x"
CREATE TABLE

=> \timing
Timing is on.

=> insert into x (key,val) values (generate_series(1,20000000),
to_char(trunc(random()*1000000),'999999'));
INSERT 0 20000000
Time: 160205.678 ms

=> select * from x limit 10;
  id | key |   val
----+-----+---------
   1 |   1 |  211882
   2 |   2 |  161866
   3 |   3 |  785931
   4 |   4 |  673363
   5 |   5 |  342551
   6 |   6 |  753181
   7 |   7 |  474393
   8 |   8 |  309599
   9 |   9 |  958952
  10 |  10 |  382413
(10 rows)

Time: 0.123 ms

=>     select count(*) from x;
   count
----------
  20000000
(1 row)

Time: 2014.837 ms

=>     select count(*) from x;
   count
----------
  20000000
(1 row)

Time: 1590.407 ms

=> create index on x (key);
CREATE INDEX
Time: 25392.231 ms


and just for the heck of it some more inserts with and without the 2nd
index...

=> insert into x (key,val) values (generate_series(1,20000000),
to_char(trunc(random()*1000000),'999999'));
INSERT 0 20000000
Time: 227619.123 ms

=> drop index x_key_idx;
DROP INDEX
Time: 1217.443 ms

=> insert into x (key,val) values (generate_series(1,20000000),
to_char(trunc(random()*1000000),'999999'));
INSERT 0 20000000
Time: 165521.898 ms
=> insert into x (key,val) values (generate_series(1,20000000),
to_char(trunc(random()*1000000),'999999'));
INSERT 0 20000000
Time: 164767.536 ms
=> select count(*) from x;
   count
----------
  80000000
(1 row)

Time: 9048.421 ms
=> create index on x (key);
CREATE INDEX
Time: 116479.548 ms
=> insert into x (key,val) values (generate_series(1,20000000),
to_char(trunc(random()*1000000),'999999'));
INSERT 0 20000000
Time: 254630.371 ms


so, with one index (primary key), 20M inserts of a couple small fields
is taking 160 seconds.  with two indexes, 225-250 seconds...   creating
that 2nd index is taking 115 seconds now.   after this last step, the
table is 9800MB, of which about 2GB is the PK and 3.4GB is the 2nd index.


--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast