Обсуждение: Re: [HACKERS] Index creation takes for ever
On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >ohp@pyrenet.fr writes: >> it took 69 minutes to finish, 75% of this time was devoted to create 2 >> indexes on varchar(2) with value being 'O', 'N' or null; > >I still say it's either strcoll or qsort's fault. If qsort is to blame, then maybe this patch could help. It sorts equal key values on item pointer. And if it doesn't help index creation speed, at least the resulting index has better correlation. Test script: CREATE TABLE t (i int NOT NULL, t text NOT NULL); INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf'); INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf'); INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; INSERT INTO t SELECT * FROM t; ANALYZE t; CREATE INDEX t_i ON t(i); SET enable_seqscan = 0; SELECT ctid FROM t WHERE i=100 LIMIT 10; Result without patch: ctid ---------- (153,14) (306,23) (305,80) (152,91) (76,68) (38,34) (153,34) (305,50) (9,62) (305,40) (10 rows) Result with patch: ctid -------- (0,5) (0,10) (0,15) (0,20) (0,25) (0,30) (0,35) (0,40) (0,45) (0,50) (10 rows) For testing purposes I have made a second patch that provides a boolean GUC variable sort_index. It is available here: http://www.pivot.at/pg/23.test-IdxTupleSort.diff Servus Manfred diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200 +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200 @@ -2071,6 +2071,33 @@ (errcode(ERRCODE_UNIQUE_VIOLATION), errmsg("could not create unique index"), errdetail("Table contains duplicated values."))); + else + { + /* + * If key values are equal, we sort on ItemPointer. This might help + * for some bad qsort implementation having performance problems + * with many equal items. OTOH I wouldn't trust such a weak qsort + * to handle pre-sorted sequences very well ... + * + * Anyway, this code doesn't hurt much, and it helps produce indices + * with better index correlation which is a good thing per se. + */ + ItemPointer tid1 = &tuple1->t_tid; + ItemPointer tid2 = &tuple2->t_tid; + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1); + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2); + + if (blk1 != blk2) + return (blk1 < blk2) ? -1 : 1; + else + { + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1); + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2); + + if (pos1 != pos2) + return (pos1 < pos2) ? -1 : 1; + } + } return 0; }
I assume this completes this TODO: * Order duplicate index entries by tid for faster heap lookups and you will submit it for 7.5? If you want to post it now, I can get it into the 7.5 queue so we don't forget it. --------------------------------------------------------------------------- Manfred Koizar wrote: > On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl@sss.pgh.pa.us> > wrote: > >ohp@pyrenet.fr writes: > >> it took 69 minutes to finish, 75% of this time was devoted to create 2 > >> indexes on varchar(2) with value being 'O', 'N' or null; > > > >I still say it's either strcoll or qsort's fault. > > If qsort is to blame, then maybe this patch could help. It sorts > equal key values on item pointer. And if it doesn't help index > creation speed, at least the resulting index has better correlation. > > Test script: > CREATE TABLE t (i int NOT NULL, t text NOT NULL); > INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf'); > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf'); > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > ANALYZE t; > CREATE INDEX t_i ON t(i); > SET enable_seqscan = 0; > SELECT ctid FROM t WHERE i=100 LIMIT 10; > > Result without patch: > ctid > ---------- > (153,14) > (306,23) > (305,80) > (152,91) > (76,68) > (38,34) > (153,34) > (305,50) > (9,62) > (305,40) > (10 rows) > > Result with patch: > ctid > -------- > (0,5) > (0,10) > (0,15) > (0,20) > (0,25) > (0,30) > (0,35) > (0,40) > (0,45) > (0,50) > (10 rows) > > For testing purposes I have made a second patch that provides a > boolean GUC variable sort_index. It is available here: > http://www.pivot.at/pg/23.test-IdxTupleSort.diff > > Servus > Manfred > diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c > --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200 > +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200 > @@ -2071,6 +2071,33 @@ > (errcode(ERRCODE_UNIQUE_VIOLATION), > errmsg("could not create unique index"), > errdetail("Table contains duplicated values."))); > + else > + { > + /* > + * If key values are equal, we sort on ItemPointer. This might help > + * for some bad qsort implementation having performance problems > + * with many equal items. OTOH I wouldn't trust such a weak qsort > + * to handle pre-sorted sequences very well ... > + * > + * Anyway, this code doesn't hurt much, and it helps produce indices > + * with better index correlation which is a good thing per se. > + */ > + ItemPointer tid1 = &tuple1->t_tid; > + ItemPointer tid2 = &tuple2->t_tid; > + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1); > + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2); > + > + if (blk1 != blk2) > + return (blk1 < blk2) ? -1 : 1; > + else > + { > + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1); > + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2); > + > + if (pos1 != pos2) > + return (pos1 < pos2) ? -1 : 1; > + } > + } > > return 0; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I assume this completes this TODO: > * Order duplicate index entries by tid for faster heap lookups I don't know why that TODO entry exists, but I think the idea is counterproductive. The existing btree code will tend to put newer versions of a row earlier (because it puts a new entry in front of any with duplicate keys), which usually reduces the time spent skipping dead rows. Forcing tid ordering will cost us more in dead-row skipping than it's likely to save elsewhere. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I assume this completes this TODO: > > * Order duplicate index entries by tid for faster heap lookups > > I don't know why that TODO entry exists, but I think the idea is > counterproductive. The existing btree code will tend to put newer > versions of a row earlier (because it puts a new entry in front of any > with duplicate keys), which usually reduces the time spent skipping dead > rows. Forcing tid ordering will cost us more in dead-row skipping than > it's likely to save elsewhere. I assume you are talking about a unique index that probably only has a few non-expired rows (in which case the newer rows first is better). The TODO deals with cases where you have lots of valid duplicate index rows, and you want to spin through all the matching rows in heap order rather than randomly. This is related to our CLUSTER capability. The idea originally came from Vadim. At what point does this patch do the sorting? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >>> * Order duplicate index entries by tid for faster heap lookups > >> I don't know why that TODO entry exists, but I think the idea is >> counterproductive. > I assume you are talking about a unique index that probably only has a > few non-expired rows (in which case the newer rows first is better). > The TODO deals with cases where you have lots of valid duplicate index > rows, and you want to spin through all the matching rows in heap order > rather than randomly. Maybe so, but it would degrade the performance in the unique-index case if we do it as the TODO is worded. My own opinion is that the bitmap-index-lookup approach will be superior to trying to keep the index entries in TID order. (That's the idea we've been discussing for awhile of separating the heap-fetch stage from the index-scan stage: scan the index, make a sparse bitmap of the TIDs we need to visit, possibly AND or OR this bitmap with maps derived from other indexes, and finally visit the rows in heap order.) regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >>> * Order duplicate index entries by tid for faster heap lookups > > > >> I don't know why that TODO entry exists, but I think the idea is > >> counterproductive. > > > I assume you are talking about a unique index that probably only has a > > few non-expired rows (in which case the newer rows first is better). > > The TODO deals with cases where you have lots of valid duplicate index > > rows, and you want to spin through all the matching rows in heap order > > rather than randomly. > > Maybe so, but it would degrade the performance in the unique-index case > if we do it as the TODO is worded. Yes, the wording is just a guide. > My own opinion is that the bitmap-index-lookup approach will be superior > to trying to keep the index entries in TID order. (That's the idea > we've been discussing for awhile of separating the heap-fetch stage from > the index-scan stage: scan the index, make a sparse bitmap of the TIDs > we need to visit, possibly AND or OR this bitmap with maps derived from > other indexes, and finally visit the rows in heap order.) Oh, yes. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sun, 7 Sep 2003 11:43:42 -0400 (EDT), Bruce Momjian <pgman@candle.pha.pa.us> wrote: >I assume this completes this TODO: > > * Order duplicate index entries by tid for faster heap lookups I don't think so, because the patch does nothing to keep the sort order once the index is initially created. > If you want to post it now, [...] I did already post it. It's only the last page or so of the original message. The link in that message points to a testing aid which is not part of what I would like to see committed. Servus Manfred
On Sun, 07 Sep 2003 12:23:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Maybe so, but it would degrade the performance in the unique-index case >if we do it as the TODO is worded. The patch would only hurt with a unique index, if there are lots of duplicate tuples at CREATE INDEX time. >My own opinion is that the bitmap-index-lookup approach will be superior So is mine, but I was not able to do this in 30 lines. Sorry ;-) >to trying to keep the index entries in TID order. ^^^^ ... which the patch does not. I see its main advantage in creating better b-tree indices when you restore a large database with many duplicate index entries. It is intended to be only effective at CREATE INDEX or REINDEX time. I don't believe it is activated when you insert a single new entry, otherwise it wouldn't pass regression tests ... Servus Manfred
Manfred Koizar wrote: > On Sun, 7 Sep 2003 11:43:42 -0400 (EDT), Bruce Momjian > <pgman@candle.pha.pa.us> wrote: > >I assume this completes this TODO: > > > > * Order duplicate index entries by tid for faster heap lookups > > I don't think so, because the patch does nothing to keep the sort > order once the index is initially created. As Tom mentioned, we might not want to keep the tid's in order after the index is created because he wants the most recent tid's first, so the expired ones migrate to the end. Seems this patch does what we want currently because it only affects the initial index structure. > > If you want to post it now, [...] > > I did already post it. It's only the last page or so of the original > message. The link in that message points to a testing aid which is > not part of what I would like to see committed. OK, in 7.5 queue: http:/momjian.postgresql.org/cgi-bin/pgpatches2 -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Manfred Koizar wrote: > On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl@sss.pgh.pa.us> > wrote: > >ohp@pyrenet.fr writes: > >> it took 69 minutes to finish, 75% of this time was devoted to create 2 > >> indexes on varchar(2) with value being 'O', 'N' or null; > > > >I still say it's either strcoll or qsort's fault. > > If qsort is to blame, then maybe this patch could help. It sorts > equal key values on item pointer. And if it doesn't help index > creation speed, at least the resulting index has better correlation. > > Test script: > CREATE TABLE t (i int NOT NULL, t text NOT NULL); > INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf'); > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf'); > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > ANALYZE t; > CREATE INDEX t_i ON t(i); > SET enable_seqscan = 0; > SELECT ctid FROM t WHERE i=100 LIMIT 10; > > Result without patch: > ctid > ---------- > (153,14) > (306,23) > (305,80) > (152,91) > (76,68) > (38,34) > (153,34) > (305,50) > (9,62) > (305,40) > (10 rows) > > Result with patch: > ctid > -------- > (0,5) > (0,10) > (0,15) > (0,20) > (0,25) > (0,30) > (0,35) > (0,40) > (0,45) > (0,50) > (10 rows) > > For testing purposes I have made a second patch that provides a > boolean GUC variable sort_index. It is available here: > http://www.pivot.at/pg/23.test-IdxTupleSort.diff > > Servus > Manfred > diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c > --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200 > +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200 > @@ -2071,6 +2071,33 @@ > (errcode(ERRCODE_UNIQUE_VIOLATION), > errmsg("could not create unique index"), > errdetail("Table contains duplicated values."))); > + else > + { > + /* > + * If key values are equal, we sort on ItemPointer. This might help > + * for some bad qsort implementation having performance problems > + * with many equal items. OTOH I wouldn't trust such a weak qsort > + * to handle pre-sorted sequences very well ... > + * > + * Anyway, this code doesn't hurt much, and it helps produce indices > + * with better index correlation which is a good thing per se. > + */ > + ItemPointer tid1 = &tuple1->t_tid; > + ItemPointer tid2 = &tuple2->t_tid; > + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1); > + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2); > + > + if (blk1 != blk2) > + return (blk1 < blk2) ? -1 : 1; > + else > + { > + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1); > + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2); > + > + if (pos1 != pos2) > + return (pos1 < pos2) ? -1 : 1; > + } > + } > > return 0; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> If qsort is to blame, then maybe this patch could help. It sorts >> equal key values on item pointer. And if it doesn't help index >> creation speed, at least the resulting index has better correlation. > I will try to apply it within the next 48 hours. I think this is a *very* dubious idea. It introduces a low-level implementation dependency into our sort behavior on the strength of no more than an unfounded speculation that some platform's broken qsort might run faster. Even if the speculation were proven true, I'd be hesistant to apply it. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> If qsort is to blame, then maybe this patch could help. It sorts > >> equal key values on item pointer. And if it doesn't help index > >> creation speed, at least the resulting index has better correlation. > > > I will try to apply it within the next 48 hours. > > I think this is a *very* dubious idea. It introduces a low-level > implementation dependency into our sort behavior on the strength of no > more than an unfounded speculation that some platform's broken qsort > might run faster. Even if the speculation were proven true, I'd be > hesistant to apply it. Roger --- patch removed. Thanks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Patch removed from queue. --------------------------------------------------------------------------- Manfred Koizar wrote: > On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl@sss.pgh.pa.us> > wrote: > >ohp@pyrenet.fr writes: > >> it took 69 minutes to finish, 75% of this time was devoted to create 2 > >> indexes on varchar(2) with value being 'O', 'N' or null; > > > >I still say it's either strcoll or qsort's fault. > > If qsort is to blame, then maybe this patch could help. It sorts > equal key values on item pointer. And if it doesn't help index > creation speed, at least the resulting index has better correlation. > > Test script: > CREATE TABLE t (i int NOT NULL, t text NOT NULL); > INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf'); > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf'); > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > INSERT INTO t SELECT * FROM t; > ANALYZE t; > CREATE INDEX t_i ON t(i); > SET enable_seqscan = 0; > SELECT ctid FROM t WHERE i=100 LIMIT 10; > > Result without patch: > ctid > ---------- > (153,14) > (306,23) > (305,80) > (152,91) > (76,68) > (38,34) > (153,34) > (305,50) > (9,62) > (305,40) > (10 rows) > > Result with patch: > ctid > -------- > (0,5) > (0,10) > (0,15) > (0,20) > (0,25) > (0,30) > (0,35) > (0,40) > (0,45) > (0,50) > (10 rows) > > For testing purposes I have made a second patch that provides a > boolean GUC variable sort_index. It is available here: > http://www.pivot.at/pg/23.test-IdxTupleSort.diff > > Servus > Manfred > diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c > --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200 > +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200 > @@ -2071,6 +2071,33 @@ > (errcode(ERRCODE_UNIQUE_VIOLATION), > errmsg("could not create unique index"), > errdetail("Table contains duplicated values."))); > + else > + { > + /* > + * If key values are equal, we sort on ItemPointer. This might help > + * for some bad qsort implementation having performance problems > + * with many equal items. OTOH I wouldn't trust such a weak qsort > + * to handle pre-sorted sequences very well ... > + * > + * Anyway, this code doesn't hurt much, and it helps produce indices > + * with better index correlation which is a good thing per se. > + */ > + ItemPointer tid1 = &tuple1->t_tid; > + ItemPointer tid2 = &tuple2->t_tid; > + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1); > + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2); > + > + if (blk1 != blk2) > + return (blk1 < blk2) ? -1 : 1; > + else > + { > + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1); > + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2); > + > + if (pos1 != pos2) > + return (pos1 < pos2) ? -1 : 1; > + } > + } > > return 0; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Mon, 1 Dec 2003 00:02:54 -0500 (EST), Bruce Momjian <pgman@candle.pha.pa.us> wrote: >Tom Lane wrote: >> Bruce Momjian <pgman@candle.pha.pa.us> writes: >> >> And if it doesn't help index >> >> creation speed, at least the resulting index has better correlation. ... which has been shown by the example in the original message: > Result without patch: > ctid > ---------- > (153,14) > (306,23) > (305,80) > (152,91) > (76,68) > (38,34) > (153,34) > (305,50) > (9,62) > (305,40) > (10 rows) > > Result with patch: > ctid > -------- > (0,5) > (0,10) > (0,15) > (0,20) > (0,25) > (0,30) > (0,35) > (0,40) > (0,45) > (0,50) > (10 rows) And I think we all agree, that better index correlation leads to better performance. >> I think this is a *very* dubious idea. It introduces a low-level >> implementation dependency into our sort behavior The patch modifies the static function comparetup_index() in tuplesort.c. The comment above this function says /* * Routines specialized for IndexTuple case * * NOTE: actually, these are specialized for the btree case; [...] */ comparetup_index() compares two IndexTuples. The structure IndexTupleData consists basically of not much more than an ItemPointer, and the patch is not much more than adding a comparison of two ItemPointers. So how does the patch introduce a new low level implementation dependency? >Roger --- patch removed. Thanks. Could we agree on only removing the first five a half lines of the comment in the patch? Servus Manfred
Manfred Koizar <mkoi-pg@aon.at> writes: > comparetup_index() compares two IndexTuples. The structure > IndexTupleData consists basically of not much more than an ItemPointer, > and the patch is not much more than adding a comparison of two > ItemPointers. So how does the patch introduce a new low level > implementation dependency? Because it sorts on tuple position, which is certainly about as low level as you can get. More to the point, though, no evidence has been provided that this is a good idea. regards, tom lane
On Mon, 01 Dec 2003 13:32:10 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Manfred Koizar <mkoi-pg@aon.at> writes: >> comparetup_index() compares two IndexTuples. The structure >> IndexTupleData consists basically of not much more than an ItemPointer, >> and the patch is not much more than adding a comparison of two >> ItemPointers. So how does the patch introduce a new low level >> implementation dependency? > >Because it sorts on tuple position, which is certainly about as low >level as you can get. The patch affects only the sort during index creation. Mapping key values to tuple positions is the sole purpose of an index. The notion that an index should not care for tuple positions looks a bit strange to me. > More to the point, though, no evidence has been >provided that this is a good idea. The test script I posted with the patch shows that the patch produces more efficient b-tree indices when there are lots of duplicates. Servus Manfred
Where are we on this? It seems like a win to me. --------------------------------------------------------------------------- Manfred Koizar wrote: > On Mon, 01 Dec 2003 13:32:10 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >Manfred Koizar <mkoi-pg@aon.at> writes: > >> comparetup_index() compares two IndexTuples. The structure > >> IndexTupleData consists basically of not much more than an ItemPointer, > >> and the patch is not much more than adding a comparison of two > >> ItemPointers. So how does the patch introduce a new low level > >> implementation dependency? > > > >Because it sorts on tuple position, which is certainly about as low > >level as you can get. > > The patch affects only the sort during index creation. Mapping key > values to tuple positions is the sole purpose of an index. The notion > that an index should not care for tuple positions looks a bit strange to > me. > > > More to the point, though, no evidence has been > >provided that this is a good idea. > > The test script I posted with the patch shows that the patch produces > more efficient b-tree indices when there are lots of duplicates. > > Servus > Manfred > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Here is more detail on the patch. --------------------------------------------------------------------------- Manfred Koizar wrote: > On Mon, 1 Dec 2003 00:02:54 -0500 (EST), Bruce Momjian > <pgman@candle.pha.pa.us> wrote: > >Tom Lane wrote: > >> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> >> And if it doesn't help index > >> >> creation speed, at least the resulting index has better correlation. > > ... which has been shown by the example in the original message: > > Result without patch: > > ctid > > ---------- > > (153,14) > > (306,23) > > (305,80) > > (152,91) > > (76,68) > > (38,34) > > (153,34) > > (305,50) > > (9,62) > > (305,40) > > (10 rows) > > > > Result with patch: > > ctid > > -------- > > (0,5) > > (0,10) > > (0,15) > > (0,20) > > (0,25) > > (0,30) > > (0,35) > > (0,40) > > (0,45) > > (0,50) > > (10 rows) > > And I think we all agree, that better index correlation leads to better > performance. > > >> I think this is a *very* dubious idea. It introduces a low-level > >> implementation dependency into our sort behavior > > The patch modifies the static function comparetup_index() in > tuplesort.c. > The comment above this function says > /* > * Routines specialized for IndexTuple case > * > * NOTE: actually, these are specialized for the btree case; [...] > */ > > comparetup_index() compares two IndexTuples. The structure > IndexTupleData consists basically of not much more than an ItemPointer, > and the patch is not much more than adding a comparison of two > ItemPointers. So how does the patch introduce a new low level > implementation dependency? > > >Roger --- patch removed. Thanks. > > Could we agree on only removing the first five a half lines of the > comment in the patch? > > Servus > Manfred > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Where are we on this? It seems like a win to me. I thought it was a bad idea, although I no longer remember the details. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Where are we on this? It seems like a win to me. > > I thought it was a bad idea, although I no longer remember the details. If I remember correctly, you didn't like the index routines reading the tuple information, or something like that, but there was a performance benefit for duplicate keys, so I think we should re-investigate this. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > If I remember correctly, you didn't like the index routines reading the > tuple information, or something like that, but there was a performance > benefit for duplicate keys, so I think we should re-investigate this. I don't see the actual patch either in the hackers or patches archives, nor on your to-apply pages, making it a bit difficult to re-investigate. Where was it posted anyway? regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > If I remember correctly, you didn't like the index routines reading the > > tuple information, or something like that, but there was a performance > > benefit for duplicate keys, so I think we should re-investigate this. > > I don't see the actual patch either in the hackers or patches archives, > nor on your to-apply pages, making it a bit difficult to re-investigate. > Where was it posted anyway? Found it: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200312010450.hB14ovH16330%40candle.pha.pa.us&rnum=8 Personally, because frequently accessed duplicates appear more forward in the duplicate index, I think the sorting is only valuable when creating a new index. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Where was it posted anyway? > Found it: > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200312010450.hB14ovH16330%40candle.pha.pa.us&rnum=8 Thanks. The original patch is much older than I thought --- I was looking in the November/December part of the archives. > Personally, because frequently accessed duplicates appear more forward > in the duplicate index, I think the sorting is only valuable when > creating a new index. Yes, and that's what this does. Looking back, the original discussion got a little confused because the TODO item about "order duplicate index entries by tid" got brought into the mix. Actually this patch has nothing to do with that, because it only acts during btree creation not during index updates. On inspection I have no problem with the patch, only with the comments ;-) If you like I'll revise the comments and apply. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Where was it posted anyway? > > > Found it: > > > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200312010450.hB14ovH16330%40candle.pha.pa.us&rnum=8 > > Thanks. The original patch is much older than I thought --- I was > looking in the November/December part of the archives. > > > Personally, because frequently accessed duplicates appear more forward > > in the duplicate index, I think the sorting is only valuable when > > creating a new index. > > Yes, and that's what this does. Looking back, the original discussion > got a little confused because the TODO item about "order duplicate index > entries by tid" got brought into the mix. Actually this patch has > nothing to do with that, because it only acts during btree creation not > during index updates. > > On inspection I have no problem with the patch, only with the comments ;-) > If you like I'll revise the comments and apply. Great. Seems harmless and he showed good performance with it. I agree the discussion got confused, and that is why I kept it in my mailbox to revisit. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073