Обсуждение: Hardware performance for large updates
Folks, I'm doing some massive data transformations on Postgresql, and they're a lot slower than they should be. I'm looking for some tips on improving things. If the PGSQL-PERFORMANCE list was ever created, please tell me and I'll go over there. The update: A series of 7 update statements which cull data from a 1.5 million row table to update a 120,000 row table. The Machine: A dual-processor RAID 5 UW SCSI server. The postgresql.conf settings: Connections: 128 Shared Buffers: 256 Sort Mem: 1024 Checkpoint Segments: 16 Stats on. Light debug logging. The problem: The update series (done as a function) takes 10-15 minutes. During this time, the CPU is never more than 31% busy, only 256mb of 512 is in use, and the disk channel is only 25% - 50% saturated. As such, is seems like we could run things faster. What does everybody suggest tweaking? -Josh Berkus
Josh Berkus wrote: > The problem: The update series (done as a function) takes 10-15 > minutes. During this time, the CPU is never more than 31% busy, only > 256mb of 512 is in use, and the disk channel is only 25% - 50% > saturated. As such, is seems like we could run things faster. > > What does everybody suggest tweaking? > I think we'd need more information to be of any help -- schema, functions, explain output, etc. I do think you probably could increase Shared Buffers, as 256 is pretty small. There's been a lot of debate over the best setting. The usual guidance is start at 25% of physical RAM (16384 == 128MB if you have 512MB RAM), then tweak to optimize performance for your application and hardware. You might also bump sort mem up a bit (maybe to 2048). Again, I would test using my app and hardware to get the best value. Are you on a Linux server -- if so I found that fdatasync works better than (the default) fsync for wal_sync_method. HTH, Joe
Joe, > I think we'd need more information to be of any help -- schema, > functions, explain output, etc. Yeah, I know. I'm just looking for general tips here ... I need to do the actual optimization interactively. Particularly, the difficulty is that this application gets many small requests during the day (100 simultaneous uses) and shares a server with Apache. So I have to be concerned about how much memory each connection soaks up, during the day. At night, the maintainence tasks run a few, really massive procedures. So I should probably restart Postgres with different settings at night, hey? > I do think you probably could increase Shared Buffers, as 256 is > pretty small. There's been a lot of debate over the best setting. The > usual guidance is start at 25% of physical RAM (16384 == 128MB if you > have 512MB RAM), then tweak to optimize performance for your > application and hardware. Hmmm... how big is a shared buffer, anyway? I'm having trouble finding actual numbers in the docs. > You might also bump sort mem up a bit > (maybe to 2048). Again, I would test using my app and hardware to get > the best value. > Are you on a Linux server -- if so I found that > fdatasync works better than (the default) fsync for wal_sync_method. Yes, I am. Any particular reason why fdatasync works better? Thanks a lot! -Josh Berkus
Josh Berkus wrote: > Particularly, the difficulty is that this application gets many small > requests during the day (100 simultaneous uses) and shares a server > with Apache. So I have to be concerned about how much memory each > connection soaks up, during the day. At night, the maintainence tasks > run a few, really massive procedures. > > So I should probably restart Postgres with different settings at night, > hey? Actually, if you can afford the twice daily changes, it sounds like a great idea. I think you can get new conf settings to take by sending a SIGHUP to the postmaster, so you don't even really need any downtime to do it. Yup, here it is: http://www.postgresql.org/idocs/index.php?runtime-config.html >>I do think you probably could increase Shared Buffers, as 256 is >>pretty small. There's been a lot of debate over the best setting. The >>usual guidance is start at 25% of physical RAM (16384 == 128MB if you >>have 512MB RAM), then tweak to optimize performance for your >>application and hardware. > > > Hmmm... how big is a shared buffer, anyway? I'm having trouble > finding actual numbers in the docs. By default it is 8K. It's mentioned here: http://www.postgresql.org/idocs/index.php?kernel-resources.html So, as I mentioned above, Shared Buffers of 16384 == 128MB if you have a default 8K block size. >>Are you on a Linux server -- if so I found that >>fdatasync works better than (the default) fsync for wal_sync_method. > > Yes, I am. Any particular reason why fdatasync works better? I can't remember the technical reason (although I've seen one on the list before), but I have determined it empirically true, at least for my setup. Ahh, here we go: http://archives.postgresql.org/pgsql-hackers/1998-04/msg00326.php Joe