Обсуждение: selectivity function

Поиск
Список
Период
Сортировка

selectivity function

От
Greg Hennessy
Дата:

I'm trying to include a sensitivity operator in a function. My issue is that when I have my function, I get a call to SupportRequestSimplify, but not SupportRequestSensitivity. It is not obvious what I am doing that is incorrect.

My c (stub) function is:

PG_FUNCTION_INFO_V1(pgq3c_join_selectivity);
Datum pgq3c_join_selectivity(PG_FUNCTION_ARGS)
{  Node     *rawreq = (Node *) PG_GETARG_POINTER(0);  Node     *ret = NULL;
  elog(WARNING,"in pgq3c_join_selectivity %d %d %d",       rawreq->type,T_SupportRequestSelectivity,T_SupportRequestSimplify);
    if (IsA(rawreq, SupportRequestSelectivity))    {      elog(WARNING,"found SupportRequestSelectivity");    }  if (IsA(rawreq, SupportRequestSimplify))    {      elog(WARNING,"found SupportRequestSimplify");    }
  PG_RETURN_POINTER(ret);
}

my sql function code is:

-- a selectivity function for the q3c join function
CREATE OR REPLACE FUNCTION q3c_join_selectivity(internal)       RETURNS internal       AS 'MODULE_PATHNAME', 'pgq3c_join_selectivity'       LANGUAGE C IMMUTABLE STRICT ;

and my function definition is:

    CREATE OR REPLACE FUNCTION q3c_join(leftra double precision, leftdec double precision,                    rightra double precision, rightdec double precision,                    radius double precision)        RETURNS boolean AS
'
SELECT (((q3c_ang2ipix($3,$4)>=(q3c_nearby_it($1,$2,$5,0))) AND (q3c_ang2ipix($3,$4)<=(q3c_nearby_it($1,$2,$5,1))))    OR ((q3c_ang2ipix($3,$4)>=(q3c_nearby_it($1,$2,$5,2))) AND (q3c_ang2ipix($3,$4)<=(q3c_nearby_it($1,$2,$5,3))))    OR ((q3c_ang2ipix($3,$4)>=(q3c_nearby_it($1,$2,$5,4))) AND (q3c_ang2ipix($3,$4)<=(q3c_nearby_it($1,$2,$5,5))))    OR ((q3c_ang2ipix($3,$4)>=(q3c_nearby_it($1,$2,$5,6))) AND (q3c_ang2ipix($3,$4)<=(q3c_nearby_it($1,$2,$5,7)))))     AND q3c_sindist($1,$2,$3,$4)<POW(SIN(RADIANS($5)/2),2)    AND ($5::double precision ==<<>>== ($1,$2,$3,$4)::q3c_type)
' LANGUAGE SQL IMMUTABLE COST 10 SUPPORT q3c_join_selectivity;

When I run my function, I get:

(base) [greg.hennessy@localhost ~]$ psql q3c_test
Timing is on.
Output format is unaligned.
psql (13.4)
Type "help" for help.

q3c_test=# select count(*) from test as a, test1 as b where  q3c_join(a.ra,a.dec,b.ra,b.dec,.01);
WARNING:  in pgq3c_join_selectivity 417 418 417
WARNING:  found SupportRequestSimplify
count
153
(1 row)
Time: 9701.717 ms (00:09.702)
q3c_test=# 

So, I see a call where I am asked for a SupportRequestSimplify, but not a SupportRequestSelectivity.

I admit to not being an expert in postgres internals hacking. Is there something obvious I am doing incorrect? How do I ensure my support Function is asked for a SupportRequestSelectivity?

Re: selectivity function

От
Tom Lane
Дата:
Greg Hennessy <greg.hennessy@gmail.com> writes:
> I'm trying to include a sensitivity operator in a function. My issue is
> that when I have my function, I get a call to SupportRequestSimplify, but
> not SupportRequestSensitivity. It is not obvious what I am doing that is
> incorrect.

Attaching a support function to a SQL-language function seems pretty
weird to me.  I think probably what is happening is that the SQL
function is getting inlined and thus there is nothing left to apply
the selectivity hook to.  simplify_function() will try the
SupportRequestSimplify hook before it tries inlining, so the fact
that that one registers isn't at odds with this theory.

            regards, tom lane



Re: selectivity function

От
Greg Hennessy
Дата:
On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Hennessy <greg.hennessy@gmail.com> writes:
> I'm trying to include a sensitivity operator in a function. My issue is
> that when I have my function, I get a call to SupportRequestSimplify, but
> not SupportRequestSensitivity. It is not obvious what I am doing that is
> incorrect.
 
On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: 
Attaching a support function to a SQL-language function seems pretty
weird to me.  I think probably what is happening is that the SQL
function is getting inlined and thus there is nothing left to apply
the selectivity hook to.  simplify_function() will try the
SupportRequestSimplify hook before it tries inlining, so the fact
that that one registers isn't at odds with this theory.

Is there a way to set the selectivity of a SQL-language function?  My use
case is I'm an astronomer, matching large star catalogs, and if I have
a 1e6 star catalog joined with a 1e6 star catalog, the planner estimates
about 1e12 rows, even though the selectivity is about 1e-9 or so.
I don't see a way to define a selectivity function. One of the indexed functions
does have a RESTRICT line with some about of selectivity in the function, but
it isn't apparent it is being referenced.

My issue is that when I have small and medium sized star catalogs, the join
I'm using uses the index, but at a certain large size it stops using the index
and starts using sequential scans, due to the cost of the sequential scan
being smaller than the cost of using the index. I surmise that the cost of
reading in the index, and the use of random_page_cost = 1.2 makes the
sequential scan seem cheaper/faster, even though as a human I know
that using the index scan would be faster. I'm just not sure how to convince
postgresql to calculate the costs properly.


Re: selectivity function

От
Tom Lane
Дата:
Greg Hennessy <greg.hennessy@gmail.com> writes:
> On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Attaching a support function to a SQL-language function seems pretty
>> weird to me.

> Is there a way to set the selectivity of a SQL-language function?

I think it'd work if you prevented inlining, but doing so would defeat
most of the value of writing it as a SQL function as opposed to (say)
plpgsql.  Can you do anything useful with attaching selectivity estimates
to the functions it references, instead?

            regards, tom lane



Re: selectivity function

От
Greg Hennessy
Дата:
On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can you do anything useful with attaching selectivity estimates
> to the functions it references, instead?
I may have been doing down a bad path before. The function I'm
working to improve has five argument, the last being "degrees", which
is the match radius. Obviously a larger match radius  should cause more
matches.

For a small value of a match radius (0.005 degrees):

q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=5 width=32) (actual 
time=7.799..10758.566 rows=31 loops=1)
   Buffers: shared hit=8005684
   ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.008..215.570 rows=1000000 loops=1)
         Buffers: shared hit=5406
   ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.009..0.009 rows=0 loops=1000000)

(note: I deleted some of the output, since I think I'm keeping the 
important bits)

So, the cost of the query is calculated as 2e10, where it expect five rows,
found 31, and a hot cache of reading 8 million units of disk space, I'd have
to check the fine manual to remind myself of the units of that.

When I do the same sort of query on a much larger match radius (5 deg) I 
get:
q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,5);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=4766288 width=32) (actual 
time=0.086..254995.691 rows=38051626 loops=1)
   Buffers: shared hit=104977026
   ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.008..261.425 rows=1000000 loops=1)
         Buffers: shared hit=5406
   ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.053..0.247 rows=38 loops=1000000)

The "total cost" is the same identical 2e10, this time the number of 
rows expectd
is 4.7 million, the number of rows delivered is 38 million (so the 
calculation is off
by a factor of 8, I'm not sure that is important), but the io is now 104 
million units.
So while we are doing a lot more IO, and dealing with a lot more rows, the
calculated cost is identical. That seems strange me me. Is that a normal 
thing?
Is it possible that the cost calculation isn't including the selectivity 
calculation?

Greg