Обсуждение: slow inserts and updates on large tables

Поиск
Список
Период
Сортировка

slow inserts and updates on large tables

От
jim@reptiles.org (Jim Mercer)
Дата:
i'm developing a database for one of my clients.

unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't
seem to be giving me the performance i would expect.

i have a table which has some 60 fields, largely fixed length strings
(of a variety of sizes) and indexed on a 20 character string field.

currently i have some 5,866,667 records in the table and when i use
a c program to insert records, i get a max of 102 inserts per second.

updates average about 40/second, sometimes much slower.

i intend to have a sliding group of about 10-15 million records in this
table (deleting some each night, while constantly adding new records).

if 100 inserts/second and 40 updates/second is all i can realistically
expect, then i will have to investigate alternate databases (likely
commercial ones like Progress on SCO, yech!).

if anyone has any pointers as to why this is so slow, lemme know.

the system is:
FreeBSD 3.0-RELEASE
CPU: Pentium II (299.17-MHz 686-class CPU)
avail memory = 62836736 (61364K bytes)
ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 15 on pci1.4.0
da1: <IBM DDRS-34560W S71D> Fixed Direct Access SCSI2 device
da1: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: <IBM DDRS-34560W S71D> Fixed Direct Access SCSI2 device
da2: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da3: <QUANTUM VIKING 4.5 WSE 880P> Fixed Direct Access SCSI2 device
da3: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da4: <QUANTUM VIKING 4.5 WSE 880P> Fixed Direct Access SCSI2 device
da4: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled

postgres v6.4 (not 6.4.2 yet)

the databases live on a 16 gig striped (not RAID) array across the
above 4 drives.

the operating system lives on a separate drive.

we are going to upgrade to 512M RAM soon, but i don't think that RAM is the
issue on this beast.

bonnie and iozone show that the array is kicking ass as far as throughput goes.

--
[ Jim Mercer    Reptilian Research      jim@reptiles.org   +1 416 410-5633 ]
[ The telephone, for those of you who  have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail.            ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

Re: [GENERAL] slow inserts and updates on large tables

От
Vadim Mikheev
Дата:
Jim Mercer wrote:
>
> i'm developing a database for one of my clients.
>
> unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't
> seem to be giving me the performance i would expect.
>
> i have a table which has some 60 fields, largely fixed length strings
> (of a variety of sizes) and indexed on a 20 character string field.
>
> currently i have some 5,866,667 records in the table and when i use
> a c program to insert records, i get a max of 102 inserts per second.
>
> updates average about 40/second, sometimes much slower.
>
> i intend to have a sliding group of about 10-15 million records in this
> table (deleting some each night, while constantly adding new records).
>
> if 100 inserts/second and 40 updates/second is all i can realistically
> expect, then i will have to investigate alternate databases (likely
> commercial ones like Progress on SCO, yech!).
>
> if anyone has any pointers as to why this is so slow, lemme know.

What's -B ?
Do you use BEGIN/END to run _many_ updates/inserts in
_single_ transaction ?

Vadim


Re: [GENERAL] slow inserts and updates on large tables

От
jim@reptiles.org (Jim Mercer)
Дата:
> > if 100 inserts/second and 40 updates/second is all i can realistically
> > expect, then i will have to investigate alternate databases (likely
> > commercial ones like Progress on SCO, yech!).
> >
> > if anyone has any pointers as to why this is so slow, lemme know.
>
> What's -B ?

it is whatever the default is.

> Do you use BEGIN/END to run _many_ updates/inserts in
> _single_ transaction ?

i tried that and it didn't seem to make much difference.

--
[ Jim Mercer    Reptilian Research      jim@reptiles.org   +1 416 410-5633 ]
[ The telephone, for those of you who  have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail.            ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

Re: [GENERAL] slow inserts and updates on large tables

От
Herouth Maoz
Дата:
At 5:02 +0200 on 17/2/99, Jim Mercer wrote:


> if anyone has any pointers as to why this is so slow, lemme know.

Have you checked the usual stuff:

1) Each insert and update creates its own transaction. That is, an
   insert is in fact:

   BEGIN TRANSACTION;
   INSERT...;
   COMMIT;

   So, to make things faster you should BEGIN TRANSACTION explicitly
   before all the inserts and COMMIT after them. Or separate into
   manageable bulks if you run into a memory problem.

2) Indexing tables mean faster SELECTs at the expense of slower INSERTs
   and UPDATEs. There is no magic. The reasoning is that normally you
   query the data a lot more than you change it.

   Thus, it is preferable, before doing bulk inserts, to drop the
   indices and recreate them afterwards. This is true when you are not
   expectind the database to be queried at the same time the inserts are
   made.

   As for updates, it's trickier, because you actually use the index
   for the WHERE part of the update. If speed is of an essence, I would
   probably try the following:

   SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition;
   DELETE FROM your_table WHERE update_condition;
   DROP INDEX...;
   INSERT INTO your_table SELECT ... FROM temp_table; -- update within select
   CREATE INDEX...;

3) Back to the issue of INSERTS - copies are faster. If you can transform
   the data into tab-delimited format as required by COPY, you save a lot
   of time on parsing, planning etc.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



Re: [GENERAL] slow inserts and updates on large tables

От
jim@reptiles.org (Jim Mercer)
Дата:
> > if anyone has any pointers as to why this is so slow, lemme know.
>
> Have you checked the usual stuff:
>
> 1) Each insert and update creates its own transaction. That is, an
>    insert is in fact:
>
>    BEGIN TRANSACTION;
>    INSERT...;
>    COMMIT;
>
>    So, to make things faster you should BEGIN TRANSACTION explicitly
>    before all the inserts and COMMIT after them. Or separate into
>    manageable bulks if you run into a memory problem.

i have a script which times various methods:

------
data is 6047 records
insertama time was 56 seconds
psqlinsert time was 51 seconds
traninsert time was 51 seconds
psqlselect time was 0
------

the script drops and recreates the testtable before each test.

insertama read the data from a text file and uses PQexec to insert each record.
it does not use begin/commit.

psqlinsert is "psql -f insert.sql" where insert.sql is a file of insert
commands (created by printf'ing the PQexec args from insertama).

traninsert is the same as psqlinsert, except it has "begin transaction" and
"commit transaction" at the start and end.

psqlselect is a "psql -c select count(callid) from testtable".

so, according to my tests, the begin/commit stuff doesn't do me any good.

> 2) Indexing tables mean faster SELECTs at the expense of slower INSERTs
>    and UPDATEs. There is no magic. The reasoning is that normally you
>    query the data a lot more than you change it.
>
>    Thus, it is preferable, before doing bulk inserts, to drop the
>    indices and recreate them afterwards. This is true when you are not
>    expectind the database to be queried at the same time the inserts are
>    made.

the production database currently has over 5 million records, and is expected
to run at 10-15 million is full production.

the inserts happen in batches, every 10 minutes or so.

recreating the index takes some 30 minutes as it is.

however, if i use the above bench test script, but with the index removed,
i get:

------
data is 6047 records
insertama time was 53 seconds
psqlinsert time was 47 seconds
traninsert time was 48 seconds
psqlselect time was 0 seconds
------

as you can see, it didn't make much difference.

>    As for updates, it's trickier, because you actually use the index
>    for the WHERE part of the update. If speed is of an essence, I would
>    probably try the following:
>
>    SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition;
>    DELETE FROM your_table WHERE update_condition;
>    DROP INDEX...;
>    INSERT INTO your_table SELECT ... FROM temp_table; -- update within select
>    CREATE INDEX...;

as stated above, the size of the table means i can't do this on every update.

> 3) Back to the issue of INSERTS - copies are faster. If you can transform
>    the data into tab-delimited format as required by COPY, you save a lot
>    of time on parsing, planning etc.

this sorta defeats the purpose of putting the data in an SQL database. 8^)

--
[ Jim Mercer    Reptilian Research      jim@reptiles.org   +1 416 410-5633 ]
[ The telephone, for those of you who  have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail.            ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

Re: [GENERAL] slow inserts and updates on large tables

От
Herouth Maoz
Дата:
At 16:10 +0200 on 17/2/99, Jim Mercer wrote:


>
> > 3) Back to the issue of INSERTS - copies are faster. If you can transform
> >    the data into tab-delimited format as required by COPY, you save a lot
> >    of time on parsing, planning etc.
>
> this sorta defeats the purpose of putting the data in an SQL database. 8^)

You probably didn't understand me. If you convert it to tab delimited text
and then use COPY table_name FROM filename/stdin instead of INSERT, it will
be much faster, because you don't have to do the parsing and planning on
each line, but only on the whole copy.

I didn't tell you to use the data directly from those text files...

In fact, it doesn't require using text files at all, just reformatting your
program. If until now it did

- - - -

while (data_still_coming) {

       sprintf( command, "INSERT INTO table1 VALUES( %s, %s, %s )",
            item1, item2, item3 );

   PQexec( con, command );
}

- - - -

Now you have to do instead

- - - -

PQexec( con, "COPY table1 FROM stdin" );

while (data_still_coming) {

   sprintf( line, "%s\t%s\t%s\n" , item1, item2, item3 );
   PQputline( con, line );

}

PQputline( con, ".\n" );
PQendcopy(con);

- - - -

It's simply a different formatting to your data insertion.

Herouth



Re: [GENERAL] slow inserts and updates on large tables

От
jim@reptiles.org (Jim Mercer)
Дата:
> At 16:10 +0200 on 17/2/99, Jim Mercer wrote:
> You probably didn't understand me. If you convert it to tab delimited text
> and then use COPY table_name FROM filename/stdin instead of INSERT, it will
> be much faster, because you don't have to do the parsing and planning on
> each line, but only on the whole copy.
>
> I didn't tell you to use the data directly from those text files...
>
> PQexec( con, "COPY table1 FROM stdin" );
>
> while (data_still_coming) {
>
>    sprintf( line, "%s\t%s\t%s\n" , item1, item2, item3 );
>    PQputline( con, line );
>
> }
>
> PQputline( con, ".\n" );
> PQendcopy(con);

i will test this with my insertama program, but i see some problems with this.

firstly, it assumes that all of your applications programs are updated each
time you modify the structure of the table.

i am using "insert into testtable (fieldname1, fieldname2) values ('1', '2');"

this allows the applications to remain unchanged if new fields are added.

also, it doesn't seem to address the issue of updates, which suffer from worse
performance than inserts.

what is a realistic number of inserts per second under postgresql, with or
without an index?

--
[ Jim Mercer    Reptilian Research      jim@reptiles.org   +1 416 410-5633 ]
[ The telephone, for those of you who  have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail.            ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

Re: [GENERAL] slow inserts and updates on large tables

От
Herouth Maoz
Дата:
At 16:47 +0200 on 17/2/99, Jim Mercer wrote:


> i will test this with my insertama program, but i see some problems with
>this.
>
> firstly, it assumes that all of your applications programs are updated each
> time you modify the structure of the table.

This is true. That's the sacrifice you get for COPY's fast transfers.

> also, it doesn't seem to address the issue of updates, which suffer
>from worse performance than inserts.

Did you try my trick, but without removing the indices? Move the data over
to a temporary table, delete from the original, insert updated data back?
(Assuming you don't have a separate update for each line).

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma




Re: [GENERAL] slow inserts and updates on large tables

От
Vadim Mikheev
Дата:
Jim Mercer wrote:
>
> i'm developing a database for one of my clients.
>
> unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't
> seem to be giving me the performance i would expect.
>
> i have a table which has some 60 fields, largely fixed length strings
> (of a variety of sizes) and indexed on a 20 character string field.
>
> currently i have some 5,866,667 records in the table and when i use
> a c program to insert records, i get a max of 102 inserts per second.
>
> updates average about 40/second, sometimes much slower.
>
> i intend to have a sliding group of about 10-15 million records in this
> table (deleting some each night, while constantly adding new records).
>
> if 100 inserts/second and 40 updates/second is all i can realistically
> expect, then i will have to investigate alternate databases (likely
> commercial ones like Progress on SCO, yech!).
>
> if anyone has any pointers as to why this is so slow, lemme know.

What's -B ?
Do you use BEGIN/END to run _many_ updates/inserts in
_single_ transaction ?

Vadim


Re: [GENERAL] slow inserts and updates on large tables

От
Herouth Maoz
Дата:
At 16:47 +0200 on 17/2/99, Jim Mercer wrote:


> i will test this with my insertama program, but i see some problems with
>this.
>
> firstly, it assumes that all of your applications programs are updated each
> time you modify the structure of the table.

This is true. That's the sacrifice you get for COPY's fast transfers.

> also, it doesn't seem to address the issue of updates, which suffer
>from worse performance than inserts.

Did you try my trick, but without removing the indices? Move the data over
to a temporary table, delete from the original, insert updated data back?
(Assuming you don't have a separate update for each line).

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma