Обсуждение: Postgres batch write very slow - what to do
Folks ! I have a batch application that writes approx. 4 million rows into a narrow table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off. Batch size is 100. So far I am seeing Postgres take roughly five times the time it takes to do this in the Oracle. I have played with many parameters. Only one that seems to have any affect is fsync - but thats only 10% or so. Initially I got the warning that checkpoints were happening too often so I increased the segments to 24. Warnings stopped, but no real improvement in performance. Is postgres really that slow ? What am I missing ? Here are the changes to my postgressql.cong file. shared_buffers = 768MB work_mem = 256MB maintenance_work_mem = 128MB fsync = off checkpoint_segments = 24 autovacuum = on Thank you, -Sanjay -- View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9452092 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
femski wrote: > I have a batch application that writes approx. 4 million rows into a narrow > table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off. > Batch size is 100. So far I am seeing Postgres take roughly five times the > time it takes to do this in the Oracle. The usual tricks are: - Drop indexes before loading, and rebuild them afterwards. - Disable foreign key references while loading - Use COPY instead of JDBC Is the loading I/O or CPU bound? How long does the load actually take? Are you running the latest version of PostgreSQL and the JDBC driver? Have you read the "Populating a Database" chapter in the manual: http://www.postgresql.org/docs/8.2/interactive/populate.html -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 3/13/07, femski <hypertree@yahoo.com> wrote: > > Folks ! > > I have a batch application that writes approx. 4 million rows into a narrow > table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off. > Batch size is 100. So far I am seeing Postgres take roughly five times the > time it takes to do this in the Oracle. > > I have played with many parameters. Only one that seems to have any affect > is fsync - but thats only 10% or so. > Initially I got the warning that checkpoints were happening too often so I > increased the segments to 24. Warnings stopped, but no real improvement in > performance. > > Is postgres really that slow ? What am I missing ? how many inserts/sec are you getting approximately. Maybe you have some type of network issue. merlin
I am runing Postgres 8.2 on OpenSuse 10.2 with latest jdbc driver. I moved the app to be collocated with the server. Oracle takes 60 sec. Postgres 275 sec. For 4.7 million rows. There are 4 CPUs on the server and one is runing close to 100% during inserts. Network history shows spikes of upto 60% of the bandwidth (Gnome System monitor graph). I have a gigabit card - but should not enter into picture since its on local host. thanks, -Sanjay -- View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9453712 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
femski <hypertree@yahoo.com> writes: > I am runing Postgres 8.2 on OpenSuse 10.2 with latest jdbc driver. I moved > the app to be collocated with the server. Oracle takes 60 sec. Postgres 275 > sec. For 4.7 million rows. > There are 4 CPUs on the server and one is runing close to 100% during > inserts. > Network history shows spikes of upto 60% of the bandwidth (Gnome System > monitor graph). It sounds like you're incurring a network round trip for each row, which will be expensive even for a co-located application. Perhaps Oracle's JDBC driver is smart enough to avoid that. I'm not sure what tricks are available for bulk loading with our JDBC driver --- the page Heikki mentioned explains things from a server perspective but I dunno how that translates into JDBC. The folks who hang out on pgsql-jdbc could probably give you some tips. regards, tom lane
Tom Lane wrote: > It sounds like you're incurring a network round trip for each row, which > will be expensive even for a co-located application. Perhaps Oracle's > JDBC driver is smart enough to avoid that. I'm not sure what tricks are > available for bulk loading with our JDBC driver --- the page Heikki > mentioned explains things from a server perspective but I dunno how that > translates into JDBC. The folks who hang out on pgsql-jdbc could > probably give you some tips. OP said he's using addBatch/executeBatch with a batch size of 100. The JDBC driver sends the whole batch before waiting for responses. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > OP said he's using addBatch/executeBatch with a batch size of 100. The > JDBC driver sends the whole batch before waiting for responses. Perhaps a bit of checking with a packet sniffer would be warranted. If it's really working like that he shouldn't see the network utilization load he reported ... regards, tom lane
Ok, I turned off XDMCP and network bandwidth utilization dropped to less than 5%. Timings remained the same. Curiously five times faster time for Oracle came from a client running on a different host than the server. To make things worse for Postgres, when I replace "hostname" in jdbc string to "localhost" or 127.0.0.1 it runs another 60% slower (446 sec vs 275 sec). Strange. Before I take this discussion to jdbc list, why is CPU utilization 100% during insert ? could that be a bottleneck. How to eliminate it ? These are Intel WordCrest 5110 Xeon cores. thank you -Sanjay femski wrote: > > I am runing Postgres 8.2 on OpenSuse 10.2 with latest jdbc driver. I moved > the app to be collocated with the server. Oracle takes 60 sec. Postgres > 275 sec. For 4.7 million rows. > > There are 4 CPUs on the server and one is runing close to 100% during > inserts. > Network history shows spikes of upto 60% of the bandwidth (Gnome System > monitor graph). I have a gigabit card - but should not enter into picture > since its on local host. > > thanks, > > -Sanjay > -- View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9473692 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 3/14/07, femski <hypertree@yahoo.com> wrote: > > Ok, I turned off XDMCP and network bandwidth utilization dropped to less than > 5%. > Timings remained the same. > > Curiously five times faster time for Oracle came from a client running on a > different host than the server. > To make things worse for Postgres, when I replace "hostname" in jdbc string > to "localhost" or 127.0.0.1 > it runs another 60% slower (446 sec vs 275 sec). Strange. > > Before I take this discussion to jdbc list, why is CPU utilization 100% > during insert ? could that be a bottleneck. How to eliminate it ? These are > Intel WordCrest 5110 Xeon cores. when loading to oracle, does it utilize more than one core? istm your best bet would be to split load process to 4+ backends... merlin
I am using Oracle XE so its using only one core and doing just fine. How do I split backend to 4+ processes ? I don't want to write a multithreaded loader app. And I didn't think Postgres can utilize multiple cores for the same insert statement. thanks, -Sanjay On 3/14/07, femski <hypertree@yahoo.com> wrote: when loading to oracle, does it utilize more than one core? istm your best bet would be to split load process to 4+ backends... merlin -- View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9486986 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 3/15/07, femski <hypertree@yahoo.com> wrote: > > I am using Oracle XE so its using only one core and doing just fine. > How do I split backend to 4+ processes ? > I don't want to write a multithreaded loader app. > And I didn't think Postgres can utilize multiple cores for the > same insert statement. well, what sql is the jdbc driver creating exactly? It is probably running inserts in a transaction. your load is about 17k inserts/sec which about right for postgres on your hardware. you have the following options to play increase insert performance: * tweak postgresql.conf fsync: off it is not already wal_segments: bump to at least 24 or so maintenance_work_mem: if you create key after insert, bump this high (it speeds create index) bgwriter settings: you can play with these, try disabling bgwriter first (maxpages=0) full_page_writes=off might help, not 100% sure about this * distribute load make load app multi thrreaded. * use copy for bulk load [is there a way to make jdbc driver do this?] * use multi-line inserts (at least 10 rows/insert)...nearly as fast as copy * if jdbc driver is not already doing so, prepare your statements and execute. merlin
I tried maxpages = 0 and full_page_writes=off and it seemed to be taking forever. All other tricks I have already tried. At this point I wondering if its a jdbc client side issue - I am using the latest 8.1. (as I said in an earlier post - I am using addBatch with batch size of 100). But just in case - I am missing something. If 17k record/sec is right around expected then I must say I am little disappointed from the "most advanced open source database". thanks for all your help. -Sanjay Merlin Moncure-2 wrote: > > On 3/15/07, femski <hypertree@yahoo.com> wrote: > well, what sql is the jdbc driver creating exactly? It is probably > running inserts in a transaction. your load is about 17k inserts/sec > which about right for postgres on your hardware. you have the > following options to play increase insert performance: > > * tweak postgresql.conf > fsync: off it is not already > wal_segments: bump to at least 24 or so > maintenance_work_mem: if you create key after insert, bump this high > (it speeds create index) > bgwriter settings: you can play with these, try disabling bgwriter > first (maxpages=0) > full_page_writes=off might help, not 100% sure about this > > * distribute load > make load app multi thrreaded. > > * use copy for bulk load > [is there a way to make jdbc driver do this?] > > * use multi-line inserts (at least 10 rows/insert)...nearly as fast as > copy > > * if jdbc driver is not already doing so, prepare your statements and > execute. > > merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > -- View this message in context: http://www.nabble.com/Postgres-batch-write-very-slow---what-to-do-tf3395195.html#a9492938 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
femski <hypertree@yahoo.com> writes: > If 17k record/sec is right around expected then I must say I am little > disappointed from the "most advanced open source database". Well, the software is certainly capable of much more than that; for instance, on a not-too-new Dell x86_64 machine: regression=# \timing Timing is on. regression=# create table t1(f1 int); CREATE TABLE Time: 3.614 ms regression=# insert into t1 select * from generate_series(1,1000000); INSERT 0 1000000 Time: 3433.483 ms which works out to something a little shy of 300K rows/sec. Of course the main difference from what I think you're trying to do is the lack of any per-row round trips to the client code. But you need to look into where the bottleneck is, not just assume it's insoluble. regards, tom lane
On 3/15/07, femski <hypertree@yahoo.com> wrote: > > I tried maxpages = 0 and full_page_writes=off and it seemed to be taking > forever. > All other tricks I have already tried. > > At this point I wondering if its a jdbc client side issue - I am using the > latest 8.1. > (as I said in an earlier post - I am using addBatch with batch size of 100). > But just in case - I am missing something. > > If 17k record/sec is right around expected then I must say I am little > disappointed from the "most advanced open source database". Be careful...you are just testing one very specific thing and it its extremely possible that the Oracle JDBC batch insert is more optimized than PostgreSQL's. On my little pentium 4 workstation, by inserting 10 rows per insert: insert values ([...]), ([...]), [8 more rows]; I got a 5x speedup in insert performance using this feature (which is unfortunately new for 8.2). Oracle is most likely pulling similar tricks inside the driver. PostgreSQL is much faster than you think... merlin
On 3/16/07, Bob Dusek <bob@copienttech.com> wrote: > This may or may not be related to what you're seeing... but, when we > changed from Postgres 7.4.2 to 7.4.8, our batch processing slowed down > fairly significantly. > > Here's what we were doing: > > Step 1) Build a larg file full of SQL insert statements. > Step 2) Feed the file directly to "psql" using "psql dbname < > insertfile". > > The time of execution for step 2 seemed like it nearly doubled from > 7.4.2 to 7.4.8, for whatever reason (could have been the way Suse > compiled the binaries). Perhaps the slowdown was something we could > have/should have tweaked with config options. > At any rate, what we did to speed it up was to wrap the entire file in a > transaction, as such: "BEGIN; ..filecontents.. COMMIT;" > > Apparently the autocommit stuff in the version of 7.4.8 we were using > was just *doggedly* slow. > > Perhaps you're already using a transaction for your batch, though. Or, > maybe the problem isn't with Postgres. Just thought I'd share. If you are inserting records one by one without transaction (and no fsync), i/o is going to determine your insertion speed. not really sure what was happening in your case...it looks like quite a different type of issue from the OP. anyways, to the OP some quick googling regarding postgresql jdbc driver showed that the batch insert case is just not as optimized (in the driver) as it could be. The driver could do multi statement inserts or use the libpq copy api, either of which would result in huge performance gain. merlin
On 3/16/07, Merlin Moncure <mmoncure@gmail.com> wrote: > Be careful...you are just testing one very specific thing and it its > extremely possible that the Oracle JDBC batch insert is more optimized > than PostgreSQL's. On my little pentium 4 workstation, by inserting > 10 rows per insert: > insert values ([...]), ([...]), [8 more rows]; small correction here, I actually went and looked at the JDBC api and realized 'addBatch' means to run multiple stmts at once, not batch inserting. femski, your best bet is to lobby the JDBC folks to build support for 'copy' into the driver for faster bulk loads (or help out in that regard). merlin
femski wrote: > Folks ! > > I have a batch application that writes approx. 4 million rows into a narrow > table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off. > Batch size is 100. So far I am seeing Postgres take roughly five times the > time it takes to do this in the Oracle. If you are using 8.2 could you try with the multi value inserts? insert into foo(bar) values (bang) (bong) (bing) ...? > > I have played with many parameters. Only one that seems to have any affect > is fsync - but thats only 10% or so. > Initially I got the warning that checkpoints were happening too often so I > increased the segments to 24. Warnings stopped, but no real improvement in > performance. > > Is postgres really that slow ? What am I missing ? > > Here are the changes to my postgressql.cong file. > > shared_buffers = 768MB > work_mem = 256MB > maintenance_work_mem = 128MB > fsync = off > > checkpoint_segments = 24 > autovacuum = on > > Thank you, > > -Sanjay -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: >insert into foo(bar) values (bang) (bong) (bing) ...? > > > Nit pick (with a "correct me if I'm wrong" disclaimer :-)) : Wouldn't that be (bang), (bong), (bing) .... ?? Carlos --
Carlos Moreno wrote: > Joshua D. Drake wrote: > >> insert into foo(bar) values (bang) (bong) (bing) ...? >> >> >> > > Nit pick (with a "correct me if I'm wrong" disclaimer :-)) : > > Wouldn't that be (bang), (bong), (bing) .... ?? Yes. J > > Carlos > -- > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On 3/13/07, femski <hypertree@yahoo.com> wrote: > I have a batch application that writes approx. 4 million rows into a narrow > table. I am using JDBC addBatch/ExecuteBatch with auto commit turned off. > Batch size is 100. So far I am seeing Postgres take roughly five times the > time it takes to do this in the Oracle. you can try to use pg_bulkload. since it is called as standard function you shouldn't have problems with jdbc. and it's apparently fast. depesz http://pgfoundry.org/projects/pgbulkload/