Обсуждение: Enhanced containment selectivity function

Поиск
Список
Период
Сортировка

Enhanced containment selectivity function

От
Matteo Beccati
Дата:
Hi,

I've recently had problems with slow queries caused by the selectivity
of the <@ ltree operator, as you may see in my post here:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php


Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
restriction selectivity function for <@ is contsel, which returns a
constant value of 0.001. So I started digging in the source code trying
to understand how the default behaviour could be enhanced, and ended up
writing a little patch which adds an alternative containment selectivity
function (called "contstatsel") which is able to deliver better results.

This first version is based on the eqsel function and uses only
histogram values to calculate the selectivity and uses the 0.001
constant as a fallback.

This also made me think: is there a reason why geometric selectivity
functions return constant values rather than checking statistics for a
better result?

Attached you will find a patch suitable for current CVS HEAD. My C
skills are a bit rusty and my knowledge of pg internals are very poor,
so I'm sure it could be improved and modified to better fit the pg
coding standards.


Here are the results on a slow query:

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <@ '1041' AND t_stamp > '2005-07-01';

QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..553.02 rows=8 width=364) (actual
time=2.423..19787.259 rows=6785 loops=1)
    ->  Index Scan using gw_users_gisttree_key on gw_users
(cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434
rows=4696 loops=1)
          Index Cond: (tree <@ '1041'::ltree)
    ->  Index Scan using gw_batches_t_stamp_u_id_key on gw_batches
(cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1
loops=4696)
          Index Cond: ((gw_batches.t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone) AND ("outer".u_id =
gw_batches.u_id))
  Total runtime: 19805.447 ms
(6 rows)

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <<@ '1041' AND t_stamp > '2005-07-01';

QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=245.26..1151.80 rows=7671 width=364) (actual
time=69.562..176.966 rows=6785 loops=1)
    Hash Cond: ("outer".u_id = "inner".u_id)
    ->  Bitmap Heap Scan on gw_batches  (cost=57.74..764.39 rows=8212
width=212) (actual time=8.330..39.542 rows=7819 loops=1)
          Recheck Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp
with time zone)
          ->  Bitmap Index Scan on gw_batches_t_stamp_u_id_key
(cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819
loops=1)
                Index Cond: (t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone)
    ->  Hash  (cost=175.79..175.79 rows=4692 width=156) (actual
time=61.046..61.046 rows=4696 loops=1)
          ->  Seq Scan on gw_users  (cost=0.00..175.79 rows=4692
width=156) (actual time=0.083..34.200 rows=4696 loops=1)
                Filter: (tree <<@ '1041'::ltree)
  Total runtime: 194.621 ms
(10 rows)

The second query uses a custom <<@ operator I added to test the
alternative selectivity function:

CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS
double precision AS 'contstatsel' LANGUAGE internal;
CREATE OPERATOR <<@ (
          LEFTARG = ltree,
          LEFTARG = ltree,
          PROCEDURE = ltree_risparent,
          COMMUTATOR = '@>',
          RESTRICT = contstatsel,
          JOIN = contjoinsel
);


Of course any comments/feedback are welcome.


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.187
diff -r1.187 selfuncs.c
1309a1310,1433
>  *            contstatsel             - Selectivity of containment for any data types.
>  */
> Datum
> contstatsel(PG_FUNCTION_ARGS)
> {
>       PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
>       Oid                     operator = PG_GETARG_OID(1);
>       List       *args = (List *) PG_GETARG_POINTER(2);
>       int                     varRelid = PG_GETARG_INT32(3);
>       VariableStatData vardata;
>       Node       *other;
>       bool            varonleft;
>       Datum      *values;
>       int                     nvalues;
>       double          selec = 0.0;
>
>       /*
>        * If expression is not variable = something or something = variable,
>        * then punt and return a default estimate.
>        */
>       if (!get_restriction_variable(root, args, varRelid,
>                                                                 &vardata, &other, &varonleft))
>               PG_RETURN_FLOAT8(0.001);
>
>       /*
>        * If the something is a NULL constant, assume operator is strict and
>        * return zero, ie, operator will never return TRUE.
>        */
>       if (IsA(other, Const) &&
>               ((Const *) other)->constisnull)
>       {
>               ReleaseVariableStats(vardata);
>               PG_RETURN_FLOAT8(0.0);
>       }
>
>       if (HeapTupleIsValid(vardata.statsTuple))
>       {
>               Form_pg_statistic stats;
>
>               stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
>
>               if (IsA(other, Const))
>               {
>                       /* Variable is being compared to a known non-null constant */
>                       Datum           constval = ((Const *) other)->constvalue;
>                       bool            match = false;
>                       int                     i;
>
> elog(INFO, "Checking histogram");
>
>                       /*
>                        * Is the constant "=" to any of the column's most common
>                        * values?      (Although the given operator may not really be
>                        * "=", we will assume that seeing whether it returns TRUE is
>                        * an appropriate test.  If you don't like this, maybe you
>                        * shouldn't be using eqsel for your operator...)
>                        */
>                       if (get_attstatsslot(vardata.statsTuple,
>                                                                vardata.atttype, vardata.atttypmod,
>                                                                STATISTIC_KIND_HISTOGRAM, InvalidOid,
>                                                                &values, &nvalues,
>                                                                NULL, NULL))
>                       {
>                               FmgrInfo        contproc;
>
>                               fmgr_info(get_opcode(operator), &contproc);
>
> elog(INFO, "Found %d values", nvalues);
>
>                               for (i = 0; i < nvalues; i++)
>                               {
>                                       /* be careful to apply operator right way 'round */
>                                       if (varonleft)
>                                               match = DatumGetBool(FunctionCall2(&contproc,
>
values[i],
>
constval));
>                                       else
>                                               match = DatumGetBool(FunctionCall2(&contproc,
>
constval,
>
values[i]));
>                                       if (match)
>                                               selec++;
>                               }
>
>                               if (selec > 0.0 && nvalues > 0)
>                               {
>                                       selec /= nvalues;
>                               }
>
> elog(INFO, "Computed selectivity: %04.3f", selec);
>
>                       }
>                       else
>                       {
> elog(INFO, "No histogram info");
>
>                               /* no most-common-value info available */
>                               values = NULL;
>                               i = nvalues = 0;
>                       }
>
>                       if (!selec)
>                               selec = 0.001;
>
>                       free_attstatsslot(vardata.atttype, values, nvalues,
>                                                         NULL, 0);
>               }
>               else
>                       selec = 0.001;
>       }
>       else
>               selec = 0.001;
>
>       ReleaseVariableStats(vardata);
>
>       /* result should be in range, but make sure... */
>       CLAMP_PROBABILITY(selec);
>
> elog(INFO, "Returned selectivity: %04.3f", selec);
>
>       PG_RETURN_FLOAT8((float8) selec);
> }
>
> /*
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.380
diff -r1.380 pg_proc.h
3752a3753,3755
> DATA(insert OID = 2600 (  contstatsel            PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_
_null_       contstatsel - _null_ )); 
> DESCR("enhanced restriction selectivity for containment comparison operators");
>
Index: src/include/utils/selfuncs.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v
retrieving revision 1.23
diff -r1.23 selfuncs.h
97a98,99
> extern Datum contstatsel(PG_FUNCTION_ARGS);
>

Re: Enhanced containment selectivity function

От
Tom Lane
Дата:
Matteo Beccati <php@beccati.com> writes:
> This also made me think: is there a reason why geometric selectivity 
> functions return constant values rather than checking statistics for a 
> better result?

Because no one's ever bothered to work on them.  You should talk to
the PostGIS guys, however, because they *have* been working on real
statistics and real estimation functions for geometric types.  It'd
be nice to get some of that work back-ported into the core database.

http://www.postgis.org/
        regards, tom lane


Re: Enhanced containment selectivity function

От
Tom Lane
Дата:
Matteo Beccati <php@beccati.com> writes:
> Someone on IRC (AndrewSN if I'm not wrong) pointed out that the 
> restriction selectivity function for <@ is contsel, which returns a 
> constant value of 0.001. So I started digging in the source code trying 
> to understand how the default behaviour could be enhanced, and ended up 
> writing a little patch which adds an alternative containment selectivity 
> function (called "contstatsel") which is able to deliver better results.

After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment.  In most of the core
geometric types, the "histogram" ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.

The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree < operator.

So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core.  It might be best to call it something like
"parentsel", too, to avoid giving the impression that it has something
to do with 2-D containment.

Also, you should think about using the most-common-values list as well
as the histogram.  I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population.  Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.
        regards, tom lane


Re: Enhanced containment selectivity function

От
Matteo Beccati
Дата:
Tom Lane wrote:
> After looking at this a little, it doesn't seem like it has much to do
> with the ordinary 2-D notion of containment.  In most of the core
> geometric types, the "histogram" ordering is based on area, and so
> testing the histogram samples against the query doesn't seem like it's
> able to give very meaningful containment results --- the items shown
> in the histogram could have any locations whatever.
> 
> The approach might be sensible for ltree's isparent operator --- I don't
> have a very good feeling for the behavior of that operator, but it looks
> like it has at least some relationship to the ordering induced by the
> ltree < operator.

Actually, this was one of my doubts. The custom function seem to work 
well with ltree, but this also could be dependant from the way my 
dataset is organized.


> So my thought is that (assuming Oleg and Teodor agree this is sensible
> for ltree) we should put the selectivity function into contrib/ltree,
> not directly into the core.  It might be best to call it something like
> "parentsel", too, to avoid giving the impression that it has something
> to do with 2-D containment.
> 
> Also, you should think about using the most-common-values list as well
> as the histogram.  I would guess that many ltree applications would have
> enough duplicate entries that the MCV list represents a significant
> fraction of the total population.  Keep in mind when thinking about this
> that the histogram describes the population of data *exclusive of the
> MCV entries*.

I also agree that "parentsel" would better fit its purpose.

My patch was originally using MCV without good results, until I realized 
that MCV was empty because the column contains unique values :)
I'll look into adding a MCV check to it.

Moving it in contrib/ltree would be more difficult to me because it 
depends on other functions declared in selfuncs.c 
(get_restriction_variable, etc).

Thank you for your feedback


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/


Re: Enhanced containment selectivity function

От
Tom Lane
Дата:
Matteo Beccati <php@beccati.com> writes:
> Moving it in contrib/ltree would be more difficult to me because it 
> depends on other functions declared in selfuncs.c 
> (get_restriction_variable, etc).

I'd be willing to consider exporting those functions from selfuncs.c.
        regards, tom lane


Re: Enhanced containment selectivity function

От
Matteo Beccati
Дата:
Hi,

>>Moving it in contrib/ltree would be more difficult to me because it
>>depends on other functions declared in selfuncs.c
>>(get_restriction_variable, etc).
>
> I'd be willing to consider exporting those functions from selfuncs.c.

In the meanwhile here is the latest patch which uses both mcv and
histogram values.


BTW, when restoring my test database I've found out that there were many
errors on ALTER INDEX "something" OWNER TO ... :

ERROR:  "something" is not a table, view, or sequence

This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be
wrong, but I didn't get those errors a few days ago (some cvs updates ago).


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Index: contrib/ltree/ltree.sql.in
===================================================================
RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v
retrieving revision 1.9
diff -c -r1.9 ltree.sql.in
*** contrib/ltree/ltree.sql.in  30 Mar 2004 15:45:32 -0000      1.9
--- contrib/ltree/ltree.sql.in  6 Aug 2005 13:10:35 -0000
***************
*** 230,236 ****
        RIGHTARG = ltree,
        PROCEDURE = ltree_isparent,
          COMMUTATOR = '<@',
!         RESTRICT = contsel,
        JOIN = contjoinsel
  );

--- 230,236 ----
        RIGHTARG = ltree,
        PROCEDURE = ltree_isparent,
          COMMUTATOR = '<@',
!         RESTRICT = parentsel,
        JOIN = contjoinsel
  );

***************
*** 248,254 ****
        RIGHTARG = ltree,
        PROCEDURE = ltree_risparent,
          COMMUTATOR = '@>',
!         RESTRICT = contsel,
        JOIN = contjoinsel
  );

--- 248,254 ----
        RIGHTARG = ltree,
        PROCEDURE = ltree_risparent,
          COMMUTATOR = '@>',
!         RESTRICT = parentsel,
        JOIN = contjoinsel
  );

Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.187
diff -c -r1.187 selfuncs.c
*** src/backend/utils/adt/selfuncs.c    21 Jul 2005 04:41:43 -0000      1.187
--- src/backend/utils/adt/selfuncs.c    6 Aug 2005 13:10:46 -0000
***************
*** 1306,1311 ****
--- 1306,1488 ----
        return (Selectivity) selec;
  }

+ #define DEFAULT_PARENT_SEL 0.001
+
+ /*
+  *            parentsel               - Selectivity of parent relationship for ltree data types.
+  */
+ Datum
+ parentsel(PG_FUNCTION_ARGS)
+ {
+       PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+       Oid                     operator = PG_GETARG_OID(1);
+       List       *args = (List *) PG_GETARG_POINTER(2);
+       int                     varRelid = PG_GETARG_INT32(3);
+       VariableStatData vardata;
+       Node       *other;
+       bool            varonleft;
+       Datum      *values;
+       int                     nvalues;
+       float4     *numbers;
+       int                     nnumbers;
+       double          selec = 0.0;
+
+       /*
+        * If expression is not variable <@ something or something <@ variable,
+        * then punt and return a default estimate.
+        */
+       if (!get_restriction_variable(root, args, varRelid,
+                                                                 &vardata, &other, &varonleft))
+               PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
+
+       /*
+        * If the something is a NULL constant, assume operator is strict and
+        * return zero, ie, operator will never return TRUE.
+        */
+       if (IsA(other, Const) &&
+               ((Const *) other)->constisnull)
+       {
+               ReleaseVariableStats(vardata);
+               PG_RETURN_FLOAT8(0.0);
+       }
+
+       if (HeapTupleIsValid(vardata.statsTuple))
+       {
+               Form_pg_statistic stats;
+               double          mcvsum = 0.0;
+               double          mcvsel = 0.0;
+               double          hissel = 0.0;
+
+               stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
+
+               if (IsA(other, Const))
+               {
+                       /* Variable is being compared to a known non-null constant */
+                       Datum           constval = ((Const *) other)->constvalue;
+                       bool            match = false;
+                       int                     i;
+
+                       /*
+                        * Is the constant "<@" to any of the column's most common
+                        * values?
+                        */
+                       if (get_attstatsslot(vardata.statsTuple,
+                                                                vardata.atttype, vardata.atttypmod,
+                                                                STATISTIC_KIND_MCV, InvalidOid,
+                                                                &values, &nvalues,
+                                                                &numbers, &nnumbers))
+                       {
+                               FmgrInfo        contproc;
+
+                               fmgr_info(get_opcode(operator), &contproc);
+
+                               for (i = 0; i < nvalues; i++)
+                               {
+                                       /* be careful to apply operator right way 'round */
+                                       if (varonleft)
+                                               match = DatumGetBool(FunctionCall2(&contproc,
+
values[i],
+
constval));
+                                       else
+                                               match = DatumGetBool(FunctionCall2(&contproc,
+
constval,
+
values[i]));
+
+                                       /* calculate total selectivity of all most-common-values */
+                                       mcvsum += numbers[i];
+
+                                       /* calculate selectivity of matching most-common-values */
+                                       if (match)
+                                               mcvsel += numbers[i];
+                               }
+                       }
+                       else
+                       {
+                               /* no most-common-values info available */
+                               values = NULL;
+                               numbers = NULL;
+                               i = nvalues = nnumbers = 0;
+                       }
+
+                       free_attstatsslot(vardata.atttype, values, nvalues,
+                                                         NULL, 0);
+
+
+                       /*
+                        * Is the constant "<@" to any of the column's histogram
+                        * values?
+                        */
+                       if (get_attstatsslot(vardata.statsTuple,
+                                                                vardata.atttype, vardata.atttypmod,
+                                                                STATISTIC_KIND_HISTOGRAM, InvalidOid,
+                                                                &values, &nvalues,
+                                                                NULL, NULL))
+                       {
+                               FmgrInfo        contproc;
+
+                               fmgr_info(get_opcode(operator), &contproc);
+
+                               for (i = 0; i < nvalues; i++)
+                               {
+                                       /* be careful to apply operator right way 'round */
+                                       if (varonleft)
+                                               match = DatumGetBool(FunctionCall2(&contproc,
+
values[i],
+
constval));
+                                       else
+                                               match = DatumGetBool(FunctionCall2(&contproc,
+
constval,
+
values[i]));
+                                       /* count matching histogram values */
+                                       if (match)
+                                               hissel++;
+                               }
+
+                               if (hissel > 0.0)
+                               {
+                                       /*
+                                        * some matching values found inside histogram, divide matching entries number
+                                        * by total histogram entries to get the histogram related selectivity
+                                        */
+                                       hissel /= nvalues;
+                               }
+                       }
+                       else
+                       {
+                               /* no histogram info available */
+                               values = NULL;
+                               i = nvalues = 0;
+                       }
+
+                       free_attstatsslot(vardata.atttype, values, nvalues,
+                                                         NULL, 0);
+
+
+                       /*
+                        * calculate selectivity based on MCV and histogram result
+                        * histogram selectivity needs to be scaled down if there are any most-common-values
+                        */
+                       selec = mcvsel + hissel * (1.0 - mcvsum);
+
+                       /* don't return 0.0 selectivity unless all table values are inside mcv */
+                       if (selec == 0.0 && mcvsum != 1.0)
+                               selec = DEFAULT_PARENT_SEL;
+               }
+               else
+                       selec = DEFAULT_PARENT_SEL;
+       }
+       else
+               selec = DEFAULT_PARENT_SEL;
+
+
+       ReleaseVariableStats(vardata);
+
+       /* result should be in range, but make sure... */
+       CLAMP_PROBABILITY(selec);
+
+       PG_RETURN_FLOAT8((float8) selec);
+ }
+
  /*
   *            eqjoinsel               - Join selectivity of "="
   */
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.380
diff -c -r1.380 pg_proc.h
*** src/include/catalog/pg_proc.h       2 Aug 2005 16:11:57 -0000       1.380
--- src/include/catalog/pg_proc.h       6 Aug 2005 13:10:59 -0000
***************
*** 3750,3755 ****
--- 3750,3758 ----
  DATA(insert OID = 2592 (  gist_circle_compress        PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_
gist_circle_compress- _null_ )); 
  DESCR("GiST support");

+ DATA(insert OID = 2600 (  parentsel              PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_
_null_       parentsel - _null_ )); 
+ DESCR("enhanced restriction selectivity for ltree isparent comparison operators");
+

  /*
   * Symbolic values for provolatile column: these indicate whether the result
Index: src/include/utils/selfuncs.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v
retrieving revision 1.23
diff -c -r1.23 selfuncs.h
*** src/include/utils/selfuncs.h        5 Jun 2005 22:32:58 -0000       1.23
--- src/include/utils/selfuncs.h        6 Aug 2005 13:11:00 -0000
***************
*** 95,100 ****
--- 95,102 ----
  extern Datum nlikesel(PG_FUNCTION_ARGS);
  extern Datum icnlikesel(PG_FUNCTION_ARGS);

+ extern Datum parentsel(PG_FUNCTION_ARGS);
+
  extern Datum eqjoinsel(PG_FUNCTION_ARGS);
  extern Datum neqjoinsel(PG_FUNCTION_ARGS);
  extern Datum scalarltjoinsel(PG_FUNCTION_ARGS);


Re: Enhanced containment selectivity function

От
Bruce Momjian
Дата:
This has been saved for the 8.2 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Matteo Beccati wrote:
> Hi,
> 
> >>Moving it in contrib/ltree would be more difficult to me because it 
> >>depends on other functions declared in selfuncs.c 
> >>(get_restriction_variable, etc).
> > 
> > I'd be willing to consider exporting those functions from selfuncs.c.
> 
> In the meanwhile here is the latest patch which uses both mcv and 
> histogram values.
> 
> 
> BTW, when restoring my test database I've found out that there were many 
> errors on ALTER INDEX "something" OWNER TO ... :
> 
> ERROR:  "something" is not a table, view, or sequence
> 
> This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be 
> wrong, but I didn't get those errors a few days ago (some cvs updates ago).
> 
> 
> Best regards
> --
> Matteo Beccati
> http://phpadsnew.com/
> http://phppgads.com/

> Index: contrib/ltree/ltree.sql.in
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v
> retrieving revision 1.9
> diff -c -r1.9 ltree.sql.in
> *** contrib/ltree/ltree.sql.in  30 Mar 2004 15:45:32 -0000      1.9
> --- contrib/ltree/ltree.sql.in  6 Aug 2005 13:10:35 -0000
> ***************
> *** 230,236 ****
>         RIGHTARG = ltree,
>         PROCEDURE = ltree_isparent,
>           COMMUTATOR = '<@',
> !         RESTRICT = contsel,
>         JOIN = contjoinsel
>   );
>   
> --- 230,236 ----
>         RIGHTARG = ltree,
>         PROCEDURE = ltree_isparent,
>           COMMUTATOR = '<@',
> !         RESTRICT = parentsel,
>         JOIN = contjoinsel
>   );
>   
> ***************
> *** 248,254 ****
>         RIGHTARG = ltree,
>         PROCEDURE = ltree_risparent,
>           COMMUTATOR = '@>',
> !         RESTRICT = contsel,
>         JOIN = contjoinsel
>   );
>   
> --- 248,254 ----
>         RIGHTARG = ltree,
>         PROCEDURE = ltree_risparent,
>           COMMUTATOR = '@>',
> !         RESTRICT = parentsel,
>         JOIN = contjoinsel
>   );
>   
> Index: src/backend/utils/adt/selfuncs.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
> retrieving revision 1.187
> diff -c -r1.187 selfuncs.c
> *** src/backend/utils/adt/selfuncs.c    21 Jul 2005 04:41:43 -0000      1.187
> --- src/backend/utils/adt/selfuncs.c    6 Aug 2005 13:10:46 -0000
> ***************
> *** 1306,1311 ****
> --- 1306,1488 ----
>         return (Selectivity) selec;
>   }
>   
> + #define DEFAULT_PARENT_SEL 0.001
> + 
> + /*
> +  *            parentsel               - Selectivity of parent relationship for ltree data types.
> +  */
> + Datum
> + parentsel(PG_FUNCTION_ARGS)
> + {
> +       PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
> +       Oid                     operator = PG_GETARG_OID(1);
> +       List       *args = (List *) PG_GETARG_POINTER(2);
> +       int                     varRelid = PG_GETARG_INT32(3);
> +       VariableStatData vardata;
> +       Node       *other;
> +       bool            varonleft;
> +       Datum      *values;
> +       int                     nvalues;
> +       float4     *numbers;
> +       int                     nnumbers;
> +       double          selec = 0.0;
> + 
> +       /*
> +        * If expression is not variable <@ something or something <@ variable,
> +        * then punt and return a default estimate.
> +        */
> +       if (!get_restriction_variable(root, args, varRelid,
> +                                                                 &vardata, &other, &varonleft))
> +               PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
> + 
> +       /*
> +        * If the something is a NULL constant, assume operator is strict and
> +        * return zero, ie, operator will never return TRUE.
> +        */
> +       if (IsA(other, Const) &&
> +               ((Const *) other)->constisnull)
> +       {
> +               ReleaseVariableStats(vardata);
> +               PG_RETURN_FLOAT8(0.0);
> +       }
> + 
> +       if (HeapTupleIsValid(vardata.statsTuple))
> +       {
> +               Form_pg_statistic stats;
> +               double          mcvsum = 0.0;
> +               double          mcvsel = 0.0;
> +               double          hissel = 0.0;
> + 
> +               stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
> + 
> +               if (IsA(other, Const))
> +               {
> +                       /* Variable is being compared to a known non-null constant */
> +                       Datum           constval = ((Const *) other)->constvalue;
> +                       bool            match = false;
> +                       int                     i;
> + 
> +                       /*
> +                        * Is the constant "<@" to any of the column's most common
> +                        * values?
> +                        */
> +                       if (get_attstatsslot(vardata.statsTuple,
> +                                                                vardata.atttype, vardata.atttypmod,
> +                                                                STATISTIC_KIND_MCV, InvalidOid,
> +                                                                &values, &nvalues,
> +                                                                &numbers, &nnumbers))
> +                       {
> +                               FmgrInfo        contproc;
> + 
> +                               fmgr_info(get_opcode(operator), &contproc);
> + 
> +                               for (i = 0; i < nvalues; i++)
> +                               {
> +                                       /* be careful to apply operator right way 'round */
> +                                       if (varonleft)
> +                                               match = DatumGetBool(FunctionCall2(&contproc,
> +
values[i],
> +
constval));
> +                                       else
> +                                               match = DatumGetBool(FunctionCall2(&contproc,
> +
constval,
> +
values[i]));
> + 
> +                                       /* calculate total selectivity of all most-common-values */
> +                                       mcvsum += numbers[i];
> + 
> +                                       /* calculate selectivity of matching most-common-values */
> +                                       if (match)
> +                                               mcvsel += numbers[i];
> +                               }
> +                       }
> +                       else
> +                       {
> +                               /* no most-common-values info available */
> +                               values = NULL;
> +                               numbers = NULL;
> +                               i = nvalues = nnumbers = 0;
> +                       }
> + 
> +                       free_attstatsslot(vardata.atttype, values, nvalues,
> +                                                         NULL, 0);
> + 
> + 
> +                       /*
> +                        * Is the constant "<@" to any of the column's histogram
> +                        * values?
> +                        */
> +                       if (get_attstatsslot(vardata.statsTuple,
> +                                                                vardata.atttype, vardata.atttypmod,
> +                                                                STATISTIC_KIND_HISTOGRAM, InvalidOid,
> +                                                                &values, &nvalues,
> +                                                                NULL, NULL))
> +                       {
> +                               FmgrInfo        contproc;
> + 
> +                               fmgr_info(get_opcode(operator), &contproc);
> + 
> +                               for (i = 0; i < nvalues; i++)
> +                               {
> +                                       /* be careful to apply operator right way 'round */
> +                                       if (varonleft)
> +                                               match = DatumGetBool(FunctionCall2(&contproc,
> +
values[i],
> +
constval));
> +                                       else
> +                                               match = DatumGetBool(FunctionCall2(&contproc,
> +
constval,
> +
values[i]));
> +                                       /* count matching histogram values */
> +                                       if (match)
> +                                               hissel++;
> +                               }
> + 
> +                               if (hissel > 0.0)
> +                               {
> +                                       /*
> +                                        * some matching values found inside histogram, divide matching entries
number
 
> +                                        * by total histogram entries to get the histogram related selectivity
> +                                        */
> +                                       hissel /= nvalues;
> +                               }
> +                       }
> +                       else
> +                       {
> +                               /* no histogram info available */
> +                               values = NULL;
> +                               i = nvalues = 0;
> +                       }
> + 
> +                       free_attstatsslot(vardata.atttype, values, nvalues,
> +                                                         NULL, 0);
> + 
> + 
> +                       /*
> +                        * calculate selectivity based on MCV and histogram result
> +                        * histogram selectivity needs to be scaled down if there are any most-common-values
> +                        */
> +                       selec = mcvsel + hissel * (1.0 - mcvsum);
> + 
> +                       /* don't return 0.0 selectivity unless all table values are inside mcv */
> +                       if (selec == 0.0 && mcvsum != 1.0)
> +                               selec = DEFAULT_PARENT_SEL;
> +               }
> +               else
> +                       selec = DEFAULT_PARENT_SEL;
> +       }
> +       else
> +               selec = DEFAULT_PARENT_SEL;
> + 
> + 
> +       ReleaseVariableStats(vardata);
> + 
> +       /* result should be in range, but make sure... */
> +       CLAMP_PROBABILITY(selec);
> + 
> +       PG_RETURN_FLOAT8((float8) selec);
> + }
> + 
>   /*
>    *            eqjoinsel               - Join selectivity of "="
>    */
> Index: src/include/catalog/pg_proc.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
> retrieving revision 1.380
> diff -c -r1.380 pg_proc.h
> *** src/include/catalog/pg_proc.h       2 Aug 2005 16:11:57 -0000       1.380
> --- src/include/catalog/pg_proc.h       6 Aug 2005 13:10:59 -0000
> ***************
> *** 3750,3755 ****
> --- 3750,3758 ----
>   DATA(insert OID = 2592 (  gist_circle_compress        PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_
 gist_circle_compress - _null_ ));
 
>   DESCR("GiST support");
>   
> + DATA(insert OID = 2600 (  parentsel              PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_
_null_       parentsel - _null_ ));
 
> + DESCR("enhanced restriction selectivity for ltree isparent comparison operators");
> + 
>   
>   /*
>    * Symbolic values for provolatile column: these indicate whether the result
> Index: src/include/utils/selfuncs.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v
> retrieving revision 1.23
> diff -c -r1.23 selfuncs.h
> *** src/include/utils/selfuncs.h        5 Jun 2005 22:32:58 -0000       1.23
> --- src/include/utils/selfuncs.h        6 Aug 2005 13:11:00 -0000
> ***************
> *** 95,100 ****
> --- 95,102 ----
>   extern Datum nlikesel(PG_FUNCTION_ARGS);
>   extern Datum icnlikesel(PG_FUNCTION_ARGS);
>   
> + extern Datum parentsel(PG_FUNCTION_ARGS);
> + 
>   extern Datum eqjoinsel(PG_FUNCTION_ARGS);
>   extern Datum neqjoinsel(PG_FUNCTION_ARGS);
>   extern Datum scalarltjoinsel(PG_FUNCTION_ARGS);
> 

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Enhanced containment selectivity function

От
Matteo Beccati
Дата:
Bruce Momjian ha scritto:
> Your patch has been added to the PostgreSQL unapplied patches list at:
> 
>     http://momjian.postgresql.org/cgi-bin/pgpatches
> 
> It will be applied as soon as one of the PostgreSQL committers reviews
> and approves it.

Great. I would just like to remind that Tom said:

> I'd be willing to consider exporting those functions from selfuncs.c.

so that the selector function could be moved to contrib/ltree, which is 
its natural place.

It could also be noted that a similar feature could be useful outside 
ltree: I guess there are plenty of cases when scanning statistics would 
give a better result than using a constant selectivity.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com


Re: Enhanced containment selectivity function

От
Bruce Momjian
Дата:
Matteo Beccati wrote:
> Bruce Momjian ha scritto:
> > Your patch has been added to the PostgreSQL unapplied patches list at:
> > 
> >     http://momjian.postgresql.org/cgi-bin/pgpatches
> > 
> > It will be applied as soon as one of the PostgreSQL committers reviews
> > and approves it.
> 
> Great. I would just like to remind that Tom said:
> 
> > I'd be willing to consider exporting those functions from selfuncs.c.
> 
> so that the selector function could be moved to contrib/ltree, which is 
> its natural place.
> 
> It could also be noted that a similar feature could be useful outside 
> ltree: I guess there are plenty of cases when scanning statistics would 
> give a better result than using a constant selectivity.

Yes, I talked to Tom today about moving that stuff into /contrib.  I
will work on the patch to do that.

--  Bruce Momjian   http://candle.pha.pa.us SRA OSS, Inc.   http://www.sraoss.com
 + If your life is a hard drive, Christ can be your backup. +