Обсуждение: User Defined Functions/AM's inherently slow?

Поиск
Список
Период
Сортировка

User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
I've created a stub AM that literally does nothing.  It indexes 
nothing.  It scans for nothing.  Nadda.  It does just enough work to 
specify return values that prevent PG from dumping core.

What I've found is that this stub AM, compiled with the same options as 
postgres itself (-O2 -fno-strict-aliasing), is roughly 4 times slower 
than the built in btree AM that actually does something useful!

The test table contains 1 column, and 1 row:

My stub AM:explain analyze select * from test where a ==> '1';Total runtime: 0.254 ms

builtin btree AM:explain analyze select * from test where a = '1';Total runtime: 0.058 ms

(I ran each one a number times, with basically the same results).

What gives?  *scratches head*  I know btree's are efficient, but geez, 
can they really be more efficient than O(zero)?  :)  Looking at the 
backtrace from the beginscan function of each AM, PG doesn't appear to 
do anything different for user-provided AM's.

My platform is OS X 10.3.2, using PG 7.4, gcc version 3.3 (Apple's 
build #1495).

Any insight would be greatly appreciated.

Thanks!

eric



Re: User Defined Functions/AM's inherently slow?

От
Tom Lane
Дата:
Eric Ridge <ebr@tcdi.com> writes:
> I've created a stub AM that literally does nothing.

It's not possible for an index AM to "do nothing", at least not for an
indexscan.  It has to return tuple pointers.  What are you doing for
that?
        regards, tom lane


Re: User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
On Jan 17, 2004, at 10:22 PM, Tom Lane wrote:

> Eric Ridge <ebr@tcdi.com> writes:
>> I've created a stub AM that literally does nothing.
>
> It's not possible for an index AM to "do nothing", at least not for an
> indexscan.  It has to return tuple pointers.  What are you doing for
> that?

costestimate: {PG_RETURN_VOID();
}

beginscan: {Relation index = (Relation) PG_GETARG_POINTER(0);int keysz = PG_GETARG_INT32(1);ScanKey    scankey =
(ScanKey)PG_GETARG_POINTER(2);IndexScanDesc scan = RelationGetIndexScan(index, keysz, scankey);
 
PG_RETURN_POINTER(scan);
}

rescan: {PG_RETURN_VOID();
}

gettuple: {PG_RETURN_BOOL(false);
}

endscan: {PG_RETURN_VOID();
}

I think the above is about as close to "nothing" as one can get.

In trying to track down some performance issues with my real AM, I 
decided to make this stub AM just to see what the overhead is... then I 
started seeing these crazy results.

eric



Re: User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
On Jan 17, 2004, at 10:22 PM, Tom Lane wrote:

> Eric Ridge <ebr@tcdi.com> writes:
>> I've created a stub AM that literally does nothing.
>
> It's not possible for an index AM to "do nothing", at least not for an
> indexscan.  It has to return tuple pointers.  What are you doing for
> that?

I should have included the entire explain output:

stub AM: Index Scan using idxa_stub on test2  (cost=0.00..2.68 rows=1 width=5) 
(actual time=0.002..0.002 rows=0 loops=1)   Index Cond: (a ==> '1'::text) Total runtime: 0.247 ms

builtin btree AM:
Index Scan using idxa_btree on test2  (cost=0.00..4.68 rows=1 width=5) 
(actual time=0.024..0.026 rows=1 loops=1)   Index Cond: (a = '1'::text) Total runtime: 0.060 ms

If the "actual time" numbers are really a measure of the amount of time 
spent in (at least) the index, it seems the stub should report a 
smaller "total runtime", but alas, it doesn't.

eric



Re: User Defined Functions/AM's inherently slow?

От
Tom Lane
Дата:
Eric Ridge <ebr@tcdi.com> writes:
> costestimate: {
>     PG_RETURN_VOID();
> }

This at least needs to set some values into the output parameters ---
zeroes are okay, not setting them at all isn't.  I'm surprised the
planner doesn't go nuts.  It looks from your EXPLAIN results like
the values are coming up zero anyway, but seeing that cost_index()
doesn't do anything to initialize those local variables, I'd expect
fairly unpredictable behavior.

The other stubs look fine on quick review.
        regards, tom lane


Re: User Defined Functions/AM's inherently slow?

От
Tom Lane
Дата:
Eric Ridge <ebr@tcdi.com> writes:
> I should have included the entire explain output:

> stub AM:
>   Index Scan using idxa_stub on test2  (cost=0.00..2.68 rows=1 width=5) 
> (actual time=0.002..0.002 rows=0 loops=1)
>     Index Cond: (a ==> '1'::text)
>   Total runtime: 0.247 ms

> builtin btree AM:
> Index Scan using idxa_btree on test2  (cost=0.00..4.68 rows=1 width=5) 
> (actual time=0.024..0.026 rows=1 loops=1)
>     Index Cond: (a = '1'::text)
>   Total runtime: 0.060 ms

> If the "actual time" numbers are really a measure of the amount of time 
> spent in (at least) the index, it seems the stub should report a 
> smaller "total runtime", but alas, it doesn't.

The difference between "total runtime" and the top plan node's runtime
has to represent plan startup/shutdown time.  I'm suspicious that your
stubs are somehow not initializing something, though on first glance I
do not see what.

Theory B would be that there's some huge overhead in calling non-built-in
functions on your platform.  We do know that looking up a "C" function is
significantly more expensive than looking up a "builtin" function, but
there should only be half a dozen such calls involved in this test case;
it's hard to credit that that takes 200 msec.  Does the time drop at all
on second and subsequent repetitions in a single backend run?
        regards, tom lane


Re: User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
On Jan 17, 2004, at 11:27 PM, Tom Lane wrote:

> Eric Ridge <ebr@tcdi.com> writes:
>> costestimate: {
>>     PG_RETURN_VOID();
>> }
>
> This at least needs to set some values into the output parameters ---
> zeroes are okay, not setting them at all isn't.  I'm surprised the
> planner doesn't go nuts.  It looks from your EXPLAIN results like
> the values are coming up zero anyway, but seeing that cost_index()
> doesn't do anything to initialize those local variables, I'd expect
> fairly unpredictable behavior.

I have tried setting them all to zero, and even using the 
backend/utils/adt/selfuncs.c:genericcostestimate() code (it's not 
exported, it seems), but no matter what sane (or crazy!) numbers I 
provide for the cost estimate, the "fact" remains, my AM, which does 
"nothing" is slower than the builtin btree AM.

Could this just be related to the fact that my AM is in a .so, and 
there's just some operating system/C runtime overhead in calling 
functions in dynamically loaded libraries?

eric



Re: User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
On Jan 17, 2004, at 11:33 PM, Tom Lane wrote:
> The difference between "total runtime" and the top plan node's runtime
> has to represent plan startup/shutdown time.  I'm suspicious that your
> stubs are somehow not initializing something, though on first glance I
> do not see what.

I can't see anything either... which is why I brought it up.  I'm still 
a noob with this stuff, and thought maybe I was just missing something.

> Theory B would be that there's some huge overhead in calling 
> non-built-in
> functions on your platform.  We do know that looking up a "C" function 
> is
> significantly more expensive than looking up a "builtin" function, but
> there should only be half a dozen such calls involved in this test 
> case;
> it's hard to credit that that takes 200 msec.  Does the time drop at 
> all
> on second and subsequent repetitions in a single backend run?

Yes, it drops from about .680ms to the .250ish that I posted.

I suppose I should try compiling this little stub into postgres, eh?

eric





Re: User Defined Functions/AM's inherently slow?

От
Christopher Kings-Lynne
Дата:
>> Theory B would be that there's some huge overhead in calling non-built-in
>> functions on your platform.  We do know that looking up a "C" function is
>> significantly more expensive than looking up a "builtin" function, but
>> there should only be half a dozen such calls involved in this test case;
>> it's hard to credit that that takes 200 msec.  Does the time drop at all
>> on second and subsequent repetitions in a single backend run?
> 
> 
> Yes, it drops from about .680ms to the .250ish that I posted.
> 
> I suppose I should try compiling this little stub into postgres, eh?

What if you try the new preload_libraries (or whatever it's called) 
config variable in postgresql.conf in the 7.4 release?

Chris


Re: User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
On Jan 18, 2004, at 1:34 AM, Christopher Kings-Lynne wrote:

>>> Theory B would be that there's some huge overhead in calling 
>>> non-built-in
>>> functions on your platform.  We do know that looking up a "C" 
>>> function is
>>> significantly more expensive than looking up a "builtin" function, 
>>> but
>>> there should only be half a dozen such calls involved in this test 
>>> case;
>>> it's hard to credit that that takes 200 msec.  Does the time drop at 
>>> all
>>> on second and subsequent repetitions in a single backend run?
>> Yes, it drops from about .680ms to the .250ish that I posted.
>> I suppose I should try compiling this little stub into postgres, eh?
>
> What if you try the new preload_libraries (or whatever it's called) 
> config variable in postgresql.conf in the 7.4 release?

yes, that takes care of the initial load time of the library itself 
(bringing the initial .680ms back to .250ish for the first run), but 
this is not the problem.

eric



Re: User Defined Functions/AM's inherently slow?

От
Tom Lane
Дата:
>> Theory B would be that there's some huge overhead in calling 
>> non-built-in functions on your platform.

I've done some profiling and convinced myself that indeed there's pretty
steep overhead involved in fmgr_info() for a "C"-language function.
Much of it isn't platform-dependent either --- as best I can tell,
the lion's share of the time is being eaten in
expand_dynamic_library_name().  In scenarios where a function is called
many times per query, we cache the results of fmgr_info() ... but we do
not do so for operations like ambeginscan that are done just once per
query.

Every other function language uses shortcuts or caching to reduce the
cost of fmgr_info() lookup; external C language is the only one that
hasn't been optimized in this way.  I shall see what I can do about that.
ISTM we can have a hash table that maps function OID to function address
using the same sorts of techniques that plpgsql et al use.
        regards, tom lane


Re: User Defined Functions/AM's inherently slow?

От
Eric B.Ridge
Дата:
On Jan 18, 2004, at 7:28 PM, Tom Lane wrote:

>>> Theory B would be that there's some huge overhead in calling
>>> non-built-in functions on your platform.
>
> I've done some profiling and convinced myself that indeed there's 
> pretty
> steep overhead involved in fmgr_info() for a "C"-language function.
> Much of it isn't platform-dependent either --- as best I can tell,
> the lion's share of the time is being eaten in
> expand_dynamic_library_name().  In scenarios where a function is called
> many times per query, we cache the results of fmgr_info() ... but we do
> not do so for operations like ambeginscan that are done just once per
> query.

Wow, thanks for spending the time on this.  What about for gettuple?  
Do calls to it take advantage of the cache?  If not, this likely 
explains some of my custom am's performance troubles.

> Every other function language uses shortcuts or caching to reduce the
> cost of fmgr_info() lookup; external C language is the only one that
> hasn't been optimized in this way.  I shall see what I can do about 
> that.
> ISTM we can have a hash table that maps function OID to function 
> address
> using the same sorts of techniques that plpgsql et al use.

If there's anything I can do to help, let me know.  I'll be happy to 
test any patches you might come up with too.

eric



Re: User Defined Functions/AM's inherently slow?

От
Tom Lane
Дата:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> Wow, thanks for spending the time on this.  What about for gettuple?  
> Do calls to it take advantage of the cache?  If not, this likely 
> explains some of my custom am's performance troubles.

gettuple is looked up once at the start of a scan, so there's no
per-tuple overhead involved there.  As I said before, we're usually
pretty good about avoiding per-tuple overheads --- the cost you
identified here is a per-query overhead.

> If there's anything I can do to help, let me know.  I'll be happy to 
> test any patches you might come up with too.

I have committed a patch into CVS HEAD --- give it a try.
        regards, tom lane


Re: User Defined Functions/AM's inherently slow?

От
Eric Ridge
Дата:
I don't have the original thread in my inbox anymore, so for
reference:http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF
 
-8&selm=20129.1074484755%40sss.pgh.pa.us

Tom Lane last said:
> I have committed a patch into CVS HEAD --- give it a try.

It took me awhile, but I finally got around to trying this (by plopping  
it into my 7.4 sources), and yes, the hashlookup does significantly  
improve the performance of calling "C" functions.  I agree with your  
commit message that they're now roughly on par with built-in functions.

thanks!

eric