Обсуждение: [PGSQL 8.3.5] Use of a partial indexes
HI all. I have a 8M+ rows table over which I run a query with a and-only WHERE condition. The table has been periodically VACUUMed and ANALYZEd. In the attempt of speeding that up I added a partial index in order to limit the size of the index. Of course that index is modeled after a "slowly variable" part of the WHERE condition I have in my query. And timings actually dropped dramatically (I do know the problems with caching etc. and I paid attention to that) to about 1/20th (from about 800ms to average 40ms, actually). So I turned to EXPLAIN to see how the partial index was used. Incredibly, the partial index was not used! So I tried to drop the new index and incredibly the performances where still very good. While I can understand that the planner can decide not to use a partial index (despite in my mind it'd make a lot of sense), I'd like to understand how it comes that I get benefits from an enhancement not used! What'd be the explanation (if any) for this behavior? Thanks. -- Fahrbahn ist ein graues Band weisse Streifen, grüner Rand
Not sure if this applies to your case, but I've seen cases where an initial run of a particular query is a lot slower thansubsequent runs even though no changes were made between the two. I suspect that the initial run did all the disk IOneeded to get the data (slow), and that the subsequent runs were just reading the data out of memory (fast) as it was leftover in the PG data buffer cache, the server's caches, the disk server's cache, etc... . Try the same query only with different search criteris. IOW, force it to go back out to disk. You may find that the slowperformance returns. Good Luck ! -dave -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Reg Me Please Sent: Monday, December 29, 2008 9:09 AM To: pgsql-general@postgresql.org Subject: [GENERAL] [PGSQL 8.3.5] Use of a partial indexes HI all. I have a 8M+ rows table over which I run a query with a and-only WHERE condition. The table has been periodically VACUUMed and ANALYZEd. In the attempt of speeding that up I added a partial index in order to limit the size of the index. Of course that index is modeled after a "slowly variable" part of the WHERE condition I have in my query. And timings actually dropped dramatically (I do know the problems with caching etc. and I paid attention to that) to about 1/20th (from about 800ms to average 40ms, actually). So I turned to EXPLAIN to see how the partial index was used. Incredibly, the partial index was not used! So I tried to drop the new index and incredibly the performances where still very good. While I can understand that the planner can decide not to use a partial index (despite in my mind it'd make a lot of sense), I'd like to understand how it comes that I get benefits from an enhancement not used! What'd be the explanation (if any) for this behavior? Thanks. -- Fahrbahn ist ein graues Band weisse Streifen, grüner Rand -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hi. The WHERE condition can be divided into a "slowly changing" part and in a "random" one. The random part is the one I change at every query to avoid result caching. The planner seems to be smart enough to "learn" while working but then I should see a change in the EXPLAIN output, which never happens. I also tried to restart PostgreSQL in order to force a cache flush, but again, once the new performances are in the don't get out! Disk cache could explain the thing, but then why I got the high performances after the partial index has been created? By chance? On Monday December 29 2008 15:24:33 Gauthier, Dave wrote: > Not sure if this applies to your case, but I've seen cases where an initial > run of a particular query is a lot slower than subsequent runs even though > no changes were made between the two. I suspect that the initial run did > all the disk IO needed to get the data (slow), and that the subsequent runs > were just reading the data out of memory (fast) as it was left over in the > PG data buffer cache, the server's caches, the disk server's cache, etc... > . > > Try the same query only with different search criteris. IOW, force it to > go back out to disk. You may find that the slow performance returns. > > Good Luck ! > > -dave > > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Reg Me Please > Sent: Monday, December 29, 2008 9:09 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] [PGSQL 8.3.5] Use of a partial indexes > > HI all. > > I have a 8M+ rows table over which I run a query with a and-only WHERE > condition. > The table has been periodically VACUUMed and ANALYZEd. > In the attempt of speeding that up I added a partial index in order to > limit the size of the index. Of course that index is modeled after a > "slowly variable" part of the WHERE condition I have in my query. > > And timings actually dropped dramatically (I do know the problems with > caching etc. and I paid attention to that) to about 1/20th (from about > 800ms to average 40ms, actually). > So I turned to EXPLAIN to see how the partial index was used. > Incredibly, the partial index was not used! > So I tried to drop the new index and incredibly the performances where > still very good. > > While I can understand that the planner can decide not to use a partial > index (despite in my mind it'd make a lot of sense), I'd like to understand > how it comes that I get benefits from an enhancement not used! > What'd be the explanation (if any) for this behavior? > > Thanks. > > -- > Fahrbahn ist ein graues Band > weisse Streifen, grüner Rand -- Fahrbahn ist ein graues Band weisse Streifen, grüner Rand
On Mon, Dec 29, 2008 at 7:41 AM, Reg Me Please <regmeplease@gmail.com> wrote: > Hi. > > The WHERE condition can be divided into a "slowly changing" part and in > a "random" one. The random part is the one I change at every query to avoid > result caching. > > The planner seems to be smart enough to "learn" while working but then > I should see a change in the EXPLAIN output, which never happens. > > I also tried to restart PostgreSQL in order to force a cache flush, but > again, once the new performances are in the don't get out! I'm guessing that what happened is that when you created the partial index, pgsql read the whole table in, and the OS cached it. Subsequent accesses hit either pgsql's shared_buffers or the OS cache. You could try unmounting and remounting the partition in addition to restarting pgsql and see if that helps, or for sure, reboot. Also, for the partial index to be chosen, it has to match pretty much exactly the where clause.
> The WHERE condition can be divided into a "slowly changing" part and in > a "random" one. The random part is the one I change at every query to avoid > result caching. The first query will leave in cache at least many of the index pages needed by the second query, and likely actual rows needed by the second query. > Disk cache could explain the thing, but then why I got the high performances > after the partial index has been created? By chance? Creating the partial index reads rows, and the pages are left in the disk cache. The only way to do proper comparisons is to reboot between trials in order to compare queries with cold caches, or use the latter of multiple runs in order to compare queries with hot caches. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On Mon, Dec 29, 2008 at 8:36 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote: > Creating the partial index reads rows, and the pages are left in the disk > cache. The only way to do proper comparisons is to reboot between trials in > order to compare queries with cold caches, or use the latter of multiple > runs in order to compare queries with hot caches. There are two other ways, one is to unmount and remount the partition on which pgsql is running. On many db servers this is possible because pgsql gets its own disk array / mount point. The other is to use drop caches: smarlowe@abasin:/home/smarlowe$ cd /proc/sys/vm smarlowe@abasin:/proc/sys/vm$ free total used free shared buffers cached Mem: 4016300 2257688 1758612 0 152060 1106400 -/+ buffers/cache: 999228 3017072 Swap: 1068280 45712 1022568 smarlowe@abasin:/proc/sys/vm$ echo 1|sudo tee drop_caches 1 smarlowe@abasin:/proc/sys/vm$ free total used free shared buffers cached Mem: 4016300 1046788 2969512 0 188 94628 -/+ buffers/cache: 951972 3064328 Swap: 1068280 45712 1022568 voila! cache dumped.
> voila! cache dumped. What about read caches in the disk devices themselves? -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On Mon, Dec 29, 2008 at 9:28 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote: >> voila! cache dumped. > > What about read caches in the disk devices themselves? Given that most drives have caches that are in the 16 to 32Meg range, I doubt it makes a big difference. But you can always just dd a file both ways that are 100Meg or so and dump it out.
Only one question remains in my mind: why the planner is not using the partial index? The partial index is covering 2 predicates out of the 3 used in the where condition. Actually there is a boolean flag (to exclude "disabled" rows), a timestamp (for row age) and an int8 (a FK to another table). The first two are in the partial index in order to exclude "disabled" and older rows. The int8 is the "random" key I mentioned earlier. So the WHERE condition reads like: flag AND tstz >= '2008-01-01'::timestamptz and thekey=42 I can see in the EXPLAIN that there is no mention to the partial index. Please keep in mind that the table has 8+M rows, few of which are flagged, about 70% don't match the age limit and few dozens match the key. In my opinion the partial index should help a lot. -- Fahrbahn ist ein graues Band weisse Streifen, grüner Rand On Monday December 29 2008 16:36:49 Scott Ribe wrote: > > The WHERE condition can be divided into a "slowly changing" part and in > > a "random" one. The random part is the one I change at every query to > > avoid result caching. > > The first query will leave in cache at least many of the index pages needed > by the second query, and likely actual rows needed by the second query. > > > Disk cache could explain the thing, but then why I got the high > > performances after the partial index has been created? By chance? > > Creating the partial index reads rows, and the pages are left in the disk > cache. The only way to do proper comparisons is to reboot between trials in > order to compare queries with cold caches, or use the latter of multiple > runs in order to compare queries with hot caches.
On Tue, Dec 30, 2008 at 2:02 AM, Reg Me Please <regmeplease@gmail.com> wrote: > Only one question remains in my mind: > > why the planner is not using the partial index? > > The partial index is covering 2 predicates out of the 3 used in the where > condition. Actually there is a boolean flag (to exclude "disabled" rows), > a timestamp (for row age) and an int8 (a FK to another table). > The first two are in the partial index in order to exclude "disabled" and > older rows. The int8 is the "random" key I mentioned earlier. > > So the WHERE condition reads like: > > flag AND tstz >= '2008-01-01'::timestamptz and thekey=42 > > I can see in the EXPLAIN that there is no mention to the partial index. > Please keep in mind that the table has 8+M rows, few of which are flagged, > about 70% don't match the age limit and few dozens match the key. > In my opinion the partial index should help a lot. Can you show us the DDL for the index creation and the select query as well?
Reg Me Please wrote: > Only one question remains in my mind: > > why the planner is not using the partial index? > > The partial index is covering 2 predicates out of the 3 used in the where > condition. Actually there is a boolean flag (to exclude "disabled" rows), > a timestamp (for row age) and an int8 (a FK to another table). > The first two are in the partial index in order to exclude "disabled" and > older rows. The int8 is the "random" key I mentioned earlier. > > So the WHERE condition reads like: > > flag AND tstz >= '2008-01-01'::timestamptz and thekey=42 > > I can see in the EXPLAIN that there is no mention to the partial index. > Please keep in mind that the table has 8+M rows, few of which are flagged, > about 70% don't match the age limit and few dozens match the key. > In my opinion the partial index should help a lot. > > -- > Fahrbahn ist ein graues Band > weisse Streifen, grüner Rand > > For an index to be used the where clause must match the index. As the index gets more complicated its less likely to be used. I have 5 indexes on one table to answer the 5 possible ways the where clause can look like.
Here it comes: -- DDL CREATE TABLE gm_t_movimenti_magazzini ( gm_movi_unic INT8 NOT NULL REFERENCES gm_t_movimenti, gm_moti_unic TEXT NOT NULL REFERENCES gm_t_movimenti_tipi, ap_prod_unic INT8 NOT NULL REFERENCES ap_t_prodotti, gm_maga_unic TEXT NOT NULL REFERENCES gm_t_magazzini, gm_moma_qnta NUMERIC NOT NULL, gm_moma_flag BOOL NOT NULL, gm_moma_vali TIMESTAMPTZ NOT NULL ); CREATE INDEX i_gm_t_movimenti_magazzini_gm_movi_unic ON gm_t_movimenti_magazzini( gm_movi_unic ); CREATE INDEX i_gm_t_movimenti_magazzini_gm_moti_unic ON gm_t_movimenti_magazzini( gm_moti_unic ); CREATE INDEX i_gm_t_movimenti_magazzini_ap_prod_unic ON gm_t_movimenti_magazzini( ap_prod_unic ); CREATE INDEX i_gm_t_movimenti_magazzini_gm_maga_unic ON gm_t_movimenti_magazzini( gm_maga_unic ); CREATE INDEX i_gm_t_movimenti_magazzini_gm_moma_flag ON gm_t_movimenti_magazzini( gm_moma_flag ); CREATE INDEX i_gm_t_movimenti_magazzini_gm_moma_vali ON gm_t_movimenti_magazzini( gm_moma_vali ); CREATE INDEX i_gm_t_movimenti_magazzini_partial ON gm_t_movimenti_magazzini( (gm_moma_flag AND gm_moma_vali>='2008-01-01'::TIMESTAMPTZ) ) WHERE gm_moma_flag AND gm_moma_vali>='2008-01-01'::TIMESTAMPTZ; -- DML SELECT SUM( gm_moma_qnta ) FROM gm_t_movimenti_magazzini WHERE gm_moma_flag AND gm_moma_vali >= '2008-01-01'::TIMESTAMPTZ AND ap_prod_unic = <a value> ; where <a value> changes from query to query. -- Fahrbahn ist ein graues Band weisse Streifen, grüner Rand On Tuesday December 30 2008 15:12:33 Scott Marlowe wrote: > On Tue, Dec 30, 2008 at 2:02 AM, Reg Me Please <regmeplease@gmail.com> wrote: > > Only one question remains in my mind: > > > > why the planner is not using the partial index? > > > > The partial index is covering 2 predicates out of the 3 used in the where > > condition. Actually there is a boolean flag (to exclude "disabled" rows), > > a timestamp (for row age) and an int8 (a FK to another table). > > The first two are in the partial index in order to exclude "disabled" and > > older rows. The int8 is the "random" key I mentioned earlier. > > > > So the WHERE condition reads like: > > > > flag AND tstz >= '2008-01-01'::timestamptz and thekey=42 > > > > I can see in the EXPLAIN that there is no mention to the partial index. > > Please keep in mind that the table has 8+M rows, few of which are > > flagged, about 70% don't match the age limit and few dozens match the > > key. In my opinion the partial index should help a lot. > > Can you show us the DDL for the index creation and the select query as > well?
Hello, Why do you index a boolean of your condition ? Isn't it better to have the partial index like the following ? : CREATE INDEX i_gm_t_movimenti_magazzini_partial ON gm_t_movimenti_magazzini( ap_prod_unic ) WHERE gm_moma_flag AND gm_moma_vali>='2008-01-01'::TIMESTAMPTZ; Hope this helps, Vincent > -----Message d'origine----- > De : pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] De la part de Reg > Me Please > Envoyé : mardi 30 décembre 2008 17:09 > À : Scott Marlowe > Cc : Scott Ribe; Gauthier, Dave; pgsql-general@postgresql.org > Objet : Re: [GENERAL] [PGSQL 8.3.5] Use of a partial indexes > > Here it comes: > > -- DDL > > CREATE TABLE gm_t_movimenti_magazzini > ( > gm_movi_unic INT8 NOT NULL REFERENCES gm_t_movimenti, > gm_moti_unic TEXT NOT NULL REFERENCES gm_t_movimenti_tipi, > ap_prod_unic INT8 NOT NULL REFERENCES ap_t_prodotti, > gm_maga_unic TEXT NOT NULL REFERENCES gm_t_magazzini, > gm_moma_qnta NUMERIC NOT NULL, > gm_moma_flag BOOL NOT NULL, > gm_moma_vali TIMESTAMPTZ NOT NULL > ); > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_movi_unic > ON gm_t_movimenti_magazzini( gm_movi_unic ); > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_moti_unic > ON gm_t_movimenti_magazzini( gm_moti_unic ); > > CREATE INDEX i_gm_t_movimenti_magazzini_ap_prod_unic > ON gm_t_movimenti_magazzini( ap_prod_unic ); > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_maga_unic > ON gm_t_movimenti_magazzini( gm_maga_unic ); > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_moma_flag > ON gm_t_movimenti_magazzini( gm_moma_flag ); > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_moma_vali > ON gm_t_movimenti_magazzini( gm_moma_vali ); > > CREATE INDEX i_gm_t_movimenti_magazzini_partial > ON gm_t_movimenti_magazzini( (gm_moma_flag AND > gm_moma_vali>='2008-01-01'::TIMESTAMPTZ) ) > WHERE gm_moma_flag AND gm_moma_vali>='2008-01-01'::TIMESTAMPTZ; > > -- DML > > SELECT SUM( gm_moma_qnta ) > FROM gm_t_movimenti_magazzini > WHERE > gm_moma_flag AND > gm_moma_vali >= '2008-01-01'::TIMESTAMPTZ AND > ap_prod_unic = <a value> > ; > > where <a value> changes from query to query. > > -- > Fahrbahn ist ein graues Band > weisse Streifen, grüner Rand > > On Tuesday December 30 2008 15:12:33 Scott Marlowe wrote: > > On Tue, Dec 30, 2008 at 2:02 AM, Reg Me Please > <regmeplease@gmail.com> > wrote: > > > Only one question remains in my mind: > > > > > > why the planner is not using the partial index? > > > > > > The partial index is covering 2 predicates out of the 3 > used in the > > > where condition. Actually there is a boolean flag (to exclude > > > "disabled" rows), a timestamp (for row age) and an int8 > (a FK to another table). > > > The first two are in the partial index in order to exclude > > > "disabled" and older rows. The int8 is the "random" key I > mentioned earlier. > > > > > > So the WHERE condition reads like: > > > > > > flag AND tstz >= '2008-01-01'::timestamptz and thekey=42 > > > > > > I can see in the EXPLAIN that there is no mention to the > partial index. > > > Please keep in mind that the table has 8+M rows, few of which are > > > flagged, about 70% don't match the age limit and few dozens match > > > the key. In my opinion the partial index should help a lot. > > > > Can you show us the DDL for the index creation and the > select query as > > well? > > > > -- > Sent via pgsql-general mailing list > (pgsql-general@postgresql.org) To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Well it should look like the one you suggest. But maybe I've missed some important concept in the partial indexes theory! :-) As soon as I read your posting I understood the problem. I was thinking to create a big cut on the index containing the flag and the timestamp, while the concept is to cut the other index using the flag+timestamp part of the condition. Thanks a lot for the "satori". -- Fahrbahn ist ein graues Band weisse Streifen, grüner Rand On Tuesday December 30 2008 17:20:05 Picavet Vincent wrote: > Hello, > Why do you index a boolean of your condition ? > Isn't it better to have the partial index like the following ? > > CREATE INDEX i_gm_t_movimenti_magazzini_partial > ON gm_t_movimenti_magazzini( ap_prod_unic ) > WHERE gm_moma_flag AND gm_moma_vali>='2008-01-01'::TIMESTAMPTZ; > > Hope this helps, > Vincent > > > -----Message d'origine----- > > De : pgsql-general-owner@postgresql.org > > [mailto:pgsql-general-owner@postgresql.org] De la part de Reg > > Me Please > > Envoyé : mardi 30 décembre 2008 17:09 > > À : Scott Marlowe > > Cc : Scott Ribe; Gauthier, Dave; pgsql-general@postgresql.org > > Objet : Re: [GENERAL] [PGSQL 8.3.5] Use of a partial indexes > > > > Here it comes: > > > > -- DDL > > > > CREATE TABLE gm_t_movimenti_magazzini > > ( > > gm_movi_unic INT8 NOT NULL REFERENCES gm_t_movimenti, > > gm_moti_unic TEXT NOT NULL REFERENCES gm_t_movimenti_tipi, > > ap_prod_unic INT8 NOT NULL REFERENCES ap_t_prodotti, > > gm_maga_unic TEXT NOT NULL REFERENCES gm_t_magazzini, > > gm_moma_qnta NUMERIC NOT NULL, > > gm_moma_flag BOOL NOT NULL, > > gm_moma_vali TIMESTAMPTZ NOT NULL > > ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_movi_unic > > ON gm_t_movimenti_magazzini( gm_movi_unic ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_moti_unic > > ON gm_t_movimenti_magazzini( gm_moti_unic ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_ap_prod_unic > > ON gm_t_movimenti_magazzini( ap_prod_unic ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_maga_unic > > ON gm_t_movimenti_magazzini( gm_maga_unic ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_moma_flag > > ON gm_t_movimenti_magazzini( gm_moma_flag ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_gm_moma_vali > > ON gm_t_movimenti_magazzini( gm_moma_vali ); > > > > CREATE INDEX i_gm_t_movimenti_magazzini_partial > > ON gm_t_movimenti_magazzini( (gm_moma_flag AND > > gm_moma_vali>='2008-01-01'::TIMESTAMPTZ) ) > > WHERE gm_moma_flag AND gm_moma_vali>='2008-01-01'::TIMESTAMPTZ; > > > > -- DML > > > > SELECT SUM( gm_moma_qnta ) > > FROM gm_t_movimenti_magazzini > > WHERE > > gm_moma_flag AND > > gm_moma_vali >= '2008-01-01'::TIMESTAMPTZ AND > > ap_prod_unic = <a value> > > ; > > > > where <a value> changes from query to query. > > > > -- > > Fahrbahn ist ein graues Band > > weisse Streifen, grüner Rand > > > > On Tuesday December 30 2008 15:12:33 Scott Marlowe wrote: > > > On Tue, Dec 30, 2008 at 2:02 AM, Reg Me Please > > > > <regmeplease@gmail.com> > > > > wrote: > > > > Only one question remains in my mind: > > > > > > > > why the planner is not using the partial index? > > > > > > > > The partial index is covering 2 predicates out of the 3 > > > > used in the > > > > > > where condition. Actually there is a boolean flag (to exclude > > > > "disabled" rows), a timestamp (for row age) and an int8 > > > > (a FK to another table). > > > > > > The first two are in the partial index in order to exclude > > > > "disabled" and older rows. The int8 is the "random" key I > > > > mentioned earlier. > > > > > > So the WHERE condition reads like: > > > > > > > > flag AND tstz >= '2008-01-01'::timestamptz and thekey=42 > > > > > > > > I can see in the EXPLAIN that there is no mention to the > > > > partial index. > > > > > > Please keep in mind that the table has 8+M rows, few of which are > > > > flagged, about 70% don't match the age limit and few dozens match > > > > the key. In my opinion the partial index should help a lot. > > > > > > Can you show us the DDL for the index creation and the > > > > select query as > > > > > well? > > > > -- > > Sent via pgsql-general mailing list > > (pgsql-general@postgresql.org) To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-general