Обсуждение: PostgreSQL underestimates sorting
Hi, PostgreSQL 8.1 (and, back then, 7.4) have the tendency to underestimate the costs of sort operations, compared to index scans. The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and then starts spilling out more Gigs of temporary data to the disk. So the execution gets - in the end - much slower compared to an index scan, and wastes lots of disk space. We did not manage to tune the config values appropriately, at least not without causing other query plans to suffer badly. Are there some nice ideas how to shift the planners preferences slightly towards index scans, without affecting other queries? There's one thing that most of those queries have in common: They include TOAST data (large strings, PostGIS geometries etc.), and I remember that there are known problems with estimating the TOAST costs. This may be part of the problem, or may be irrelevant. Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote: > The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and > then starts spilling out more Gigs of temporary data to the disk. How much RAM is in the server? Remember that sort_mem is _per sort_, so if you have multiple sorts, it might allocate several multiples of the amount you set up. /* Steinar */ -- Homepage: http://www.sesse.net/
Hi, Steinar, Steinar H. Gunderson wrote: > On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote: >> The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and >> then starts spilling out more Gigs of temporary data to the disk. > > How much RAM is in the server? Remember that sort_mem is _per sort_, so if > you have multiple sorts, it might allocate several multiples of the amount > you set up. That one machine has 16 Gigs of ram, and about 10 Gigs tend to be "free" / part of the Linux blocklayer cache. The temporary data is not swapping, it's the Postgres on-disk sort algorithm. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Hi, Frank, Frank Wiles wrote: >> The temporary data is not swapping, it's the Postgres on-disk sort >> algorithm. > > Are you actually running a query where you have a GB of data > you need to sort? If not I fear you may be causing the system > to swap by setting it this high. Yes, the table itself is about 40 Gigs in size, thus much larger than the memory. The machine has 16 Gigs of ram, and 10-12 Gigs are available for PostgreSQL + Disk Cache. There's no swapping, only 23 MB of swap are used (40 Gigs are available). That's one example configuration, there are others on different machines where it turns out that forcing index usage leads to faster queries, and less overall ressource consumption. (Or, at least, faster delivery of the first part of the result so the application can begin to process it asynchroneously). Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
On Wed, 22 Nov 2006 15:28:12 +0100 Markus Schaber <schabi@logix-tt.com> wrote: > Hi, Steinar, > > Steinar H. Gunderson wrote: > > On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote: > >> The Backend allocates gigs of memory (we've set sort_mem to 1 > >> gig), and then starts spilling out more Gigs of temporary data to > >> the disk. > > > > How much RAM is in the server? Remember that sort_mem is _per > > sort_, so if you have multiple sorts, it might allocate several > > multiples of the amount you set up. > > That one machine has 16 Gigs of ram, and about 10 Gigs tend to be > "free" / part of the Linux blocklayer cache. > > The temporary data is not swapping, it's the Postgres on-disk sort > algorithm. Are you actually running a query where you have a GB of data you need to sort? If not I fear you may be causing the system to swap by setting it this high. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
On Wed, 2006-11-22 at 11:17 +0100, Markus Schaber wrote: > PostgreSQL 8.1 (and, back then, 7.4) have the tendency to underestimate > the costs of sort operations, compared to index scans. > > The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and > then starts spilling out more Gigs of temporary data to the disk. So the > execution gets - in the end - much slower compared to an index scan, and > wastes lots of disk space. > > We did not manage to tune the config values appropriately, at least not > without causing other query plans to suffer badly. 8.2 has substantial changes to sort code, so you may want to give the beta version a try to check for how much better it works. That's another way of saying that sort in 8.1 and before has some performance problems when you are sorting more than 6 * 2 * work_mem (on randomly sorted data) and the cost model doesn't get this right, as you observe. Try enabling trace_sort (available in both 8.1 and 8.2) and post the results here please, which would be very useful to have results on such a large real-world sort. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com