Обсуждение: GiST index performance
Revisiting the thread a month back or so, I'm still investigating performance problems with GiST indexes in Postgres. Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd like to clarify the contrib/seg issue. Contrib/seg is vulnerable to pathological behaviour which is fixed by my second patch, which can be viewed as complete. Contrib/cube, being multi-dimensional, is not affected to any significant degree, so should not need alteration. A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For that, we have a test case at http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for btree_gist indexes. I have a similar example with the bioseg GiST index. I have completely reimplemented the same algorithms in Java for algorithm investigation and instrumentation purposes, and it runs about a hundred times faster than in Postgres. I think this is a problem, and I'm willing to do some investigation to try and solve it. Do you have a recommendation for how to go about profiling Postgres, what profiler to use, etc? I'm running on Debian Linux x86_64. Matthew -- Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business. Quark: That's the 35th. Jadzia: Oh yes, that's right. What's the 34th again? Quark: War is good for business. It's easy to get them mixed up.
On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling<matthew@flymine.org> wrote: > Do you have a recommendation for how to go about profiling Postgres, what > profiler to use, etc? I'm running on Debian Linux x86_64. I mostly compile with --enable-profiling and use gprof. I know Tom Lane has had success with oprofile for quick and dirty measurements but I haven't quite figured out how to make all the bits work for that yet. ...Robert
On Fri, 5 Jun 2009, Robert Haas wrote: > On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling<matthew@flymine.org> wrote: >> Do you have a recommendation for how to go about profiling Postgres, what >> profiler to use, etc? I'm running on Debian Linux x86_64. > > I mostly compile with --enable-profiling and use gprof. I know Tom > Lane has had success with oprofile for quick and dirty measurements > but I haven't quite figured out how to make all the bits work for that > yet. Okay, I recompiled Postgres 8.4 beta 2 with profiling, installed btree_gist and bioseg, and did a large multi-column (btree_gist, bioseg) index search. EXPLAIN ANALYSE SELECT * FROM location l1, location l2 WHERE l1.objectid = l2.objectid AND bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end); QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.01..9292374.77 rows=19468831 width=130) (actual time=0.337..24538315.569 rows=819811624 loops=1) -> Seq Scan on location l1 (cost=0.00..90092.17 rows=4030117 width=65) (actual time=0.033..2561.142 rows=4030122 loops=1) -> Index Scan using location_object_bioseg on location l2 (cost=0.01..1.58 rows=35 width=65) (actual time=0.467..5.990 rows=203 loops=4030122) Index Cond: ((l2.objectid = l1.objectid) AND (bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start,l2.intermine_end))) Total runtime: 24613918.894 ms (5 rows) Here is the top of the profiling result: Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ks/call Ks/call name 35.41 2087.04 2087.04 823841746 0.00 0.00 gistnext 15.36 2992.60 905.56 8560743382 0.00 0.00 fmgr_oldstyle 8.65 3502.37 509.77 3641723296 0.00 0.00 FunctionCall5 7.08 3919.87 417.50 3641723296 0.00 0.00 gistdentryinit 5.03 4216.59 296.72 6668 0.00 0.00 DirectFunctionCall1 3.84 4443.05 226.46 3641724371 0.00 0.00 FunctionCall1 2.32 4579.94 136.89 1362367466 0.00 0.00 hash_search_with_hash_value 1.89 4691.15 111.21 827892583 0.00 0.00 FunctionCall2 1.83 4799.27 108.12 FunctionCall6 1.77 4903.56 104.30 2799321398 0.00 0.00 LWLockAcquire 1.45 4989.24 85.68 1043922430 0.00 0.00 PinBuffer 1.37 5070.15 80.91 823844102 0.00 0.00 index_getnext 1.33 5148.29 78.15 1647683886 0.00 0.00 slot_deform_tuple 0.95 5204.36 56.07 738604164 0.00 0.00 heap_page_prune_opt The final cumulative time is 5894.06 seconds, which doesn't seem to match the query run time at all. Also, no calls to anything including "bioseg" in the name are recorded, although they are definitely called as the GiST support functions for that data type. Could someone give me a hand decyphering this result? It seems from this that the time is spent in the gistnext function (in src/backend/access/gist/gistget.c) and not its children. However, it's quite a large function containing several loops - is there a way to make the profiling result more specific? Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks
Matthew Wakeling <matthew@flymine.org> writes: > The final cumulative time is 5894.06 seconds, which doesn't seem to match > the query run time at all. Depending on the platform you're using, gprof might have the wrong idea about the kernel's tick rate, leading to its numbers being some multiple or fraction of the true elapsed time. I had that problem awhile back https://bugzilla.redhat.com/show_bug.cgi?id=151763 although I'd like to think it's fixed on all recent Linuxen. However, your bigger problem is here: > Also, no calls to anything including "bioseg" > in the name are recorded, although they are definitely called as the GiST > support functions for that data type. I have never had any success in getting gprof to profile functions that are in loadable libraries, which of course is exactly what you need to do here. I have found hints on the web claiming that it's possible, but they don't work for me. gprof just ignores both the functions and the time spent in them. Personally I generally use oprofile these days, because it doesn't have that problem and also doesn't require any special build options. If you don't have that available to you, the best bet is probably to make a test build of Postgres in which your functions are linked directly into the main postgres executable. regards, tom lane
On Wed, 10 Jun 2009, Tom Lane wrote: >> Also, no calls to anything including "bioseg" >> in the name are recorded, although they are definitely called as the GiST >> support functions for that data type. > > I have never had any success in getting gprof to profile functions that > are in loadable libraries, which of course is exactly what you need to do > here. That sucks. However, as another observation, no calls to "gistfindnext" are recorded in the profile either, and that's in the same source file as "gistnext" which is recorded. Could it have been inlined? Shouldn't inlining be switched off on a profiling build? > ...the best bet is probably to make a test build of Postgres in which > your functions are linked directly into the main postgres executable. I'll give that a try. Oprofile scares me with the sheer number of options. Matthew -- Prolog doesn't have enough parentheses. -- Computer Science Lecturer
Matthew Wakeling <matthew@flymine.org> writes: > That sucks. However, as another observation, no calls to "gistfindnext" > are recorded in the profile either, and that's in the same source file as > "gistnext" which is recorded. Could it have been inlined? Probably. > Shouldn't inlining be switched off on a profiling build? Why? You generally want to profile the code's actual behavior, or as near as you can get to observing that. Defeating compiler optimizations doesn't sound like something that -pg should do on its own. If you really want it, there's a switch for it. > Oprofile scares me with the sheer number of options. You can ignore practically all of them; the defaults are pretty sane. The recipe I usually follow is: Initial setup (only needed once per system boot): sudo opcontrol --init sudo opcontrol --setup --no-vmlinux (If you need detail about what the kernel is doing, you need kernel debug symbols and then specify them on the previous line) Start/stop profiling sudo opcontrol --start sudo opcontrol --reset ... exercise your debug-enabled program here ... sudo opcontrol --dump ; sudo opcontrol --shutdown The test case should run at least a minute or two to get numbers with reasonable accuracy. Analysis: opreport --long-filenames | more opreport -l image:/path/to/postgres | more if you really want detail: opannotate --source /path/to/postgres >someplace regards, tom lane
On Thu, 11 Jun 2009, Tom Lane wrote: >> Oprofile scares me with the sheer number of options. > > You can ignore practically all of them; the defaults are pretty sane. Thanks, that was helpful. Here is the top of opreport --long-filenames: CPU: Core 2, speed 1998 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 5114464 61.5404 /lib/libc-2.9.so 1576829 18.9734 /changeable/pgsql_8.4_profiling/bin/postgres CPU_CLK_UNHALT...| samples| %| ------------------ 1572346 99.7157 /changeable/pgsql_8.4_profiling/bin/postgres 4482 0.2842 [vdso] (tgid:13593 range:0x7fff8dbff000-0x7fff8dc00000) 1 6.3e-05 [vdso] (tgid:13193 range:0x7fff8dbff000-0x7fff8dc00000) 409534 4.9278 /no-vmlinux 309990 3.7300 /changeable/pgsql_8.4_profiling/lib/btree_gist.so 203684 2.4509 /changeable/pgsql_8.4_profiling/lib/bioseg.so So it seems that btree_gist and bioseg are not using that much CPU at all, compared to core postgres code. In fact, the majority of time seems to be spent in libc. Unfortunately my libc doesn't have any debugging symbols. Anyway, running opannotate seems to make it clear that time *is* spent in the gistnext function, but almost all of that is in children of the function. Lots of time is actually spent in fmgr_oldstyle though. Here is the top of opreport -l: CPU: Core 2, speed 1998 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 5114464 61.5404 libc-2.9.so libc-2.9.so (no symbols) 496215 5.9708 postgres postgres gistnext 409534 4.9278 no-vmlinux no-vmlinux (no symbols) 404037 4.8616 postgres postgres fmgr_oldstyle 170079 2.0465 btree_gist.so btree_gist.so gbt_int4_consistent 160016 1.9254 postgres postgres gistdentryinit 153266 1.8442 nvidia_drv.so nvidia_drv.so (no symbols) 152463 1.8345 postgres postgres FunctionCall5 149374 1.7974 postgres postgres FunctionCall1 131112 1.5776 libxul.so libxul.so (no symbols) 120871 1.4544 postgres postgres .plt 94506 1.1372 bioseg.so bioseg.so bioseg_gist_consistent I'm guessing my next step is to install a version of libc with debugging symbols? Matthew -- Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski
Matthew Wakeling <matthew@flymine.org> writes: > So it seems that btree_gist and bioseg are not using that much CPU at all, > compared to core postgres code. In fact, the majority of time seems to be > spent in libc. Unfortunately my libc doesn't have any debugging symbols. hmm ... memcpy or qsort maybe? > Anyway, running opannotate seems to make it clear that time *is* spent in > the gistnext function, but almost all of that is in children of the > function. Lots of time is actually spent in fmgr_oldstyle though. So it'd be worth converting your functions to V1 style. > I'm guessing my next step is to install a version of libc with debugging > symbols? Yeah, if you want to find out what's happening in libc, that's what you need. regards, tom lane
On Thu, 11 Jun 2009, Tom Lane wrote: > So it'd be worth converting your functions to V1 style. Does that produce a significant reduction in overhead? (You'll probably say "yes, that's the whole point"). > hmm ... memcpy or qsort maybe? Surprise: CPU: Core 2, speed 1998 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 3005354 40.2868 libc-2.9.so libc-2.9.so __mcount_internal 1195592 16.0269 libc-2.9.so libc-2.9.so mcount 549998 7.3727 postgres postgres gistnext 420465 5.6363 postgres postgres fmgr_oldstyle 376333 5.0447 no-vmlinux no-vmlinux (no symbols) 210984 2.8282 postgres postgres FunctionCall5 182509 2.4465 postgres postgres gistdentryinit 174356 2.3372 btree_gist.so btree_gist.so gbt_int4_consistent 142829 1.9146 postgres postgres FunctionCall1 129800 1.7400 postgres postgres .plt 119180 1.5976 nvidia_drv.so nvidia_drv.so (no symbols) 96351 1.2916 libxul.so libxul.so (no symbols) 91726 1.2296 btree_gist.so btree_gist.so gbt_num_consistent A quick grep in the postgres source for mcount reveals no hits. No idea what it does - there is no man page for it. Matthew -- I pause for breath to allow you to get over your shock that I really did cover all that in only five minutes... -- Computer Science Lecturer
On Thu, 11 Jun 2009, Matthew Wakeling wrote: > A quick grep in the postgres source for mcount reveals no hits. No idea what > it does - there is no man page for it. Ah - that's part of gprof. I'll recompile without --enable-profiling and try again. Duh. Matthew -- What goes up must come down. Ask any system administrator.
Okay, I don't know quite what's happening here. Tom, perhaps you could advise. Running opannotate --source, I get this sort of stuff: /* * Total samples for file : ".../postgresql-8.4beta2/src/backend/access/gist/gistget.c" * * 6880 0.2680 */ and then: :static int64 :gistnext(IndexScanDesc scan, TIDBitmap *tbm) 81 0.0032 :{ /* gistnext total: 420087 16.3649 */ : Page p; The gistnext total doesn't seem to correspond to the amount I get by adding up all the individual lines in gistnest. Moreover, it is greater than the total samples attributed to the whole file, and greater than the samples assigned to all the lines where gistnext is called. However, yes it does seem like fmgr.c accounts for a large proportion of samples. Also, I still seem to be getting mcount, even after recompiling without --enable-profiling. CPU: Core 2, speed 1998 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 460213 17.9280 postgres postgres fmgr_oldstyle 420087 16.3649 postgres postgres gistnext 254975 9.9328 postgres postgres FunctionCall5 239572 9.3327 libc-2.9.so libc-2.9.so mcount 219963 8.5689 libc-2.9.so libc-2.9.so __mcount_internal 125674 4.8957 no-vmlinux no-vmlinux (no symbols) 117184 4.5650 postgres postgres gistdentryinit 106967 4.1670 btree_gist.so btree_gist.so gbt_int4_consistent 95677 3.7272 postgres postgres FunctionCall1 75397 2.9372 bioseg.so bioseg.so bioseg_gist_consistent 58832 2.2919 btree_gist.so btree_gist.so gbt_num_consistent 39128 1.5243 bioseg.so bioseg.so bioseg_overlap 33874 1.3196 libxul.so libxul.so (no symbols) 32008 1.2469 bioseg.so bioseg.so bioseg_gist_leaf_consistent 20890 0.8138 nvidia_drv.so nvidia_drv.so (no symbols) 19321 0.7527 bioseg.so bioseg.so bioseg_gist_decompress 17365 0.6765 libmozjs.so.1d libmozjs.so.1d (no symbols) Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there.
Matthew Wakeling <matthew@flymine.org> writes: > The gistnext total doesn't seem to correspond to the amount I get by > adding up all the individual lines in gistnest. Hmm, hadn't you determined that some other stuff was being inlined into gistnext? I'm not really sure how opannotate displays such cases, but this might be an artifact of that. > However, yes it does seem like fmgr.c accounts for a large proportion of > samples. Also, I still seem to be getting mcount, even after recompiling > without --enable-profiling. You must still have some -pg code in there someplace. Maybe you didn't recompile bioseg.so, or even psql? Remember the libc counts you are looking at are for system-wide usage of libc. regards, tom lane
Matthew Wakeling wrote: > Okay, I don't know quite what's happening here. Tom, perhaps you could > advise. Running opannotate --source, I get this sort of stuff: > > /* > * Total samples for file : > ".../postgresql-8.4beta2/src/backend/access/gist/gistget.c" > * > * 6880 0.2680 > */ > > and then: > > :static int64 > :gistnext(IndexScanDesc scan, TIDBitmap *tbm) > 81 0.0032 :{ /* gistnext total: 420087 16.3649 */ > : Page p; > > > > The gistnext total doesn't seem to correspond to the amount I get by > adding up all the individual lines in gistnest. Moreover, it is greater > than the total samples attributed to the whole file, and greater than > the samples assigned to all the lines where gistnext is called. there's another alternative for profiling that you might try if you can't get sensible results out of oprofile - cachegrind (which is part of the valgrind toolset). basically it runs the code in an emulated environment, but records every access (reads/writes/CPU cycles/cache hits/misses/etc). it's *extremely* good at finding hotspots, even when they are due to 'cache flushing' behavior in your code (for example, trawling a linked list is touching a bunch of pages and effectively blowing your CPU cache..) there's an associated graphical tool called kcachegrind which takes the dumped output and lets you drill down, even to the source code level (with cycle count/percentage annotations on the source lines) all you need to do is compile postgres with debug symbols (full optimization ON, otherwise you end up reaching the wrong conclusions). there's an example of running valgrind on postgres here: http://blog.cleverelephant.ca/2008/08/valgrinding-postgis.html for cachegrind, you basically need to use 'cachegrind' instead of 'valgrind', and don't disable optimization when you build..
Вложения
Tom Lane wrote: > Matthew Wakeling <matthew@flymine.org> writes: >> I'm guessing my next step is to install a version of libc with debugging >> symbols? > > Yeah, if you want to find out what's happening in libc, that's what you > need. Getting callgraph information from oprofile would also help. Although it won't directly tell what function in libc is being called, you would see where the calls are coming from, which is usually enough to guess what the libc function is. You can also get the oprofile data, including callgraph, into kcachegrind, which is *very* helpful. Here's a script I use: http://roberts.vorpus.org/~njs/op2calltree.py -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 11 Jun 2009, Tom Lane wrote: > Matthew Wakeling <matthew@flymine.org> writes: >> Oprofile scares me with the sheer number of options. > > You can ignore practically all of them; the defaults are pretty sane. > The recipe I usually follow is: An excellent brain dump from Tom and lots of other good stuff in this thread. I just dumped a summary of all the profiling lore discussed onto http://wiki.postgresql.org/wiki/Profiling_with_OProfile as I don't know that I've ever seen a concise intro to this subject before. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, Jun 26, 2009 at 10:33 AM, Greg Smith<gsmith@gregsmith.com> wrote: > On Thu, 11 Jun 2009, Tom Lane wrote: > >> Matthew Wakeling <matthew@flymine.org> writes: >>> >>> Oprofile scares me with the sheer number of options. >> >> You can ignore practically all of them; the defaults are pretty sane. >> The recipe I usually follow is: > > An excellent brain dump from Tom and lots of other good stuff in this > thread. I just dumped a summary of all the profiling lore discussed onto > http://wiki.postgresql.org/wiki/Profiling_with_OProfile as I don't know that > I've ever seen a concise intro to this subject before. Nice, thanks! ...Robert
Was there every any conclusion on this issue? --------------------------------------------------------------------------- Matthew Wakeling wrote: > > Revisiting the thread a month back or so, I'm still investigating > performance problems with GiST indexes in Postgres. > > Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd > like to clarify the contrib/seg issue. Contrib/seg is vulnerable to > pathological behaviour which is fixed by my second patch, which can be > viewed as complete. Contrib/cube, being multi-dimensional, is not affected > to any significant degree, so should not need alteration. > > A second quite distinct issue is the general performance of GiST indexes > which is also mentioned in the old thread linked from Open Items. For > that, we have a test case at > http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for > btree_gist indexes. I have a similar example with the bioseg GiST index. I > have completely reimplemented the same algorithms in Java for algorithm > investigation and instrumentation purposes, and it runs about a hundred > times faster than in Postgres. I think this is a problem, and I'm willing > to do some investigation to try and solve it. > > Do you have a recommendation for how to go about profiling Postgres, what > profiler to use, etc? I'm running on Debian Linux x86_64. > > Matthew > > -- > Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business. > Quark: That's the 35th. > Jadzia: Oh yes, that's right. What's the 34th again? > Quark: War is good for business. It's easy to get them mixed up. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. +
On Thu, Feb 25, 2010 at 6:42 PM, Bruce Momjian <bruce@momjian.us> wrote: > Was there every any conclusion on this issue? I don't think so. ...Robert > --------------------------------------------------------------------------- > > Matthew Wakeling wrote: >> >> Revisiting the thread a month back or so, I'm still investigating >> performance problems with GiST indexes in Postgres. >> >> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd >> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to >> pathological behaviour which is fixed by my second patch, which can be >> viewed as complete. Contrib/cube, being multi-dimensional, is not affected >> to any significant degree, so should not need alteration. >> >> A second quite distinct issue is the general performance of GiST indexes >> which is also mentioned in the old thread linked from Open Items. For >> that, we have a test case at >> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for >> btree_gist indexes. I have a similar example with the bioseg GiST index. I >> have completely reimplemented the same algorithms in Java for algorithm >> investigation and instrumentation purposes, and it runs about a hundred >> times faster than in Postgres. I think this is a problem, and I'm willing >> to do some investigation to try and solve it. >> >> Do you have a recommendation for how to go about profiling Postgres, what >> profiler to use, etc? I'm running on Debian Linux x86_64. >> >> Matthew >> >> -- >> Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business. >> Quark: That's the 35th. >> Jadzia: Oh yes, that's right. What's the 34th again? >> Quark: War is good for business. It's easy to get them mixed up. >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do > + If your life is a hard drive, Christ can be your backup. + > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Thu, 25 Feb 2010, Bruce Momjian wrote: > Was there every any conclusion on this issue? Not really. Comments inline: > Matthew Wakeling wrote: >> Revisiting the thread a month back or so, I'm still investigating >> performance problems with GiST indexes in Postgres. >> >> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd >> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to >> pathological behaviour which is fixed by my second patch, which can be >> viewed as complete. Contrib/cube, being multi-dimensional, is not affected >> to any significant degree, so should not need alteration. This issue is addressed by my patch, which AFAIK noone has reviewed. However, that patch was derived from a patch that I applied to bioseg, which is itself a derivative of seg. This patch works very well indeed, and gave an approximate 100 times speed improvement in the one test I ran. So you could say that the sister patch of the one I submitted is tried and tested in production. >> A second quite distinct issue is the general performance of GiST indexes >> which is also mentioned in the old thread linked from Open Items. For >> that, we have a test case at >> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for >> btree_gist indexes. I have a similar example with the bioseg GiST index. I >> have completely reimplemented the same algorithms in Java for algorithm >> investigation and instrumentation purposes, and it runs about a hundred >> times faster than in Postgres. I think this is a problem, and I'm willing >> to do some investigation to try and solve it. I have not made any progress on this issue. I think Oleg and Teodor would be better placed working it out. All I can say is that I implemented the exact same indexing algorithm in Java, and it performed 100 times faster than Postgres. Now, Postgres has to do a lot of additional work, like mapping the index onto disc, locking pages, and abstracting to plugin user functions, so I would expect some difference - I'm not sure 100 times is reasonable though. I tried to do some profiling, but couldn't see any one section of code that was taking too much time. Not sure what I can further do. Matthew -- Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski
On Mon, Mar 15, 2010 at 11:58 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Thu, 25 Feb 2010, Bruce Momjian wrote: >> >> Was there every any conclusion on this issue? > > Not really. Comments inline: > >> Matthew Wakeling wrote: >>> >>> Revisiting the thread a month back or so, I'm still investigating >>> performance problems with GiST indexes in Postgres. >>> >>> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd >>> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to >>> pathological behaviour which is fixed by my second patch, which can be >>> viewed as complete. Contrib/cube, being multi-dimensional, is not >>> affected >>> to any significant degree, so should not need alteration. > > This issue is addressed by my patch, which AFAIK noone has reviewed. > However, that patch was derived from a patch that I applied to bioseg, which > is itself a derivative of seg. This patch works very well indeed, and gave > an approximate 100 times speed improvement in the one test I ran. > > So you could say that the sister patch of the one I submitted is tried and > tested in production. We rely fairly heavily on the commitfest app to track which patches need review; perhaps it would be good to add it here. https://commitfest.postgresql.org/action/commitfest_view/open I seem to recall thinking that this patch wasn't ready to apply for some reason, but I'm not sure why I thought that. ...Robert
Matthew Wakeling wrote: >> Matthew Wakeling wrote: >>> A second quite distinct issue is the general performance of GiST >>> indexes >>> which is also mentioned in the old thread linked from Open Items. For >>> that, we have a test case at >>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php >>> for >>> btree_gist indexes. I have a similar example with the bioseg GiST >>> index. I >>> have completely reimplemented the same algorithms in Java for algorithm >>> investigation and instrumentation purposes, and it runs about a hundred >>> times faster than in Postgres. I think this is a problem, and I'm >>> willing >>> to do some investigation to try and solve it. > I have not made any progress on this issue. I think Oleg and Teodor > would be better placed working it out. All I can say is that I > implemented the exact same indexing algorithm in Java, and it > performed 100 times faster than Postgres. Now, Postgres has to do a > lot of additional work, like mapping the index onto disc, locking > pages, and abstracting to plugin user functions, so I would expect > some difference - I'm not sure 100 times is reasonable though. I tried > to do some profiling, but couldn't see any one section of code that > was taking too much time. Not sure what I can further do. Hello Mathew and list, A lot of time spent in gistget.c code and a lot of functioncall5's to the gist's consistent function which is out of sight for gprof. Something different but related since also gist: we noticed before that gist indexes that use a compressed form for index entries suffer from repeated compress calls on query operands (see http://archives.postgresql.org/pgsql-hackers/2009-05/msg00078.php). The btree_gist int4 compress function calls the generic gbt_num_compress, which does a palloc. Maybe this palloc is allso hit al lot when scanning the index, because the constants that are queries with are repeatedly compressed and palloced. regards, Yeb Havinga
Yeb Havinga wrote: > Matthew Wakeling wrote: >>> Matthew Wakeling wrote: >>>> A second quite distinct issue is the general performance of GiST >>>> indexes >>>> which is also mentioned in the old thread linked from Open Items. For >>>> that, we have a test case at >>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php >>>> for >>>> btree_gist indexes. I have a similar example with the bioseg GiST >>>> index. I >>>> have completely reimplemented the same algorithms in Java for >>>> algorithm >>>> investigation and instrumentation purposes, and it runs about a >>>> hundred >>>> times faster than in Postgres. I think this is a problem, and I'm >>>> willing >>>> to do some investigation to try and solve it. >> I have not made any progress on this issue. I think Oleg and Teodor >> would be better placed working it out. All I can say is that I >> implemented the exact same indexing algorithm in Java, and it >> performed 100 times faster than Postgres. Now, Postgres has to do a >> lot of additional work, like mapping the index onto disc, locking >> pages, and abstracting to plugin user functions, so I would expect >> some difference - I'm not sure 100 times is reasonable though. I >> tried to do some profiling, but couldn't see any one section of code >> that was taking too much time. Not sure what I can further do. > Hello Mathew and list, > > A lot of time spent in gistget.c code and a lot of functioncall5's to > the gist's consistent function which is out of sight for gprof. > Something different but related since also gist: we noticed before > that gist indexes that use a compressed form for index entries suffer > from repeated compress calls on query operands (see > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00078.php). > > The btree_gist int4 compress function calls the generic > gbt_num_compress, which does a palloc. Maybe this palloc is allso hit > al lot when scanning the index, because the constants that are queries > with are repeatedly compressed and palloced. Looked in the code a bit more - only the index nodes are compressed at index creation, the consistent functions does not compress queries, so not pallocs there. However when running Mathews example from http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php with the gist index, the coverage shows in gistget.c: 1000000 palloc0 's of gistsearchstack at line 152 and 2010982 palloc's also of the gistsearchstack on line 342. Two pfrees are also hit a lot: line 195: 1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is that the pain is here. My knowledge of gistget or the other sources in access/gist is zero, but couldn't it be possible to determine the maximum needed size of the stack and then allocate it at once and use a pop/push kind off api? regards, Yeb Havinga > > regards, > Yeb Havinga > >
Yeb Havinga wrote: > Yeb Havinga wrote: >> Matthew Wakeling wrote: >>>> Matthew Wakeling wrote: >>>>> A second quite distinct issue is the general performance of GiST >>>>> indexes >>>>> which is also mentioned in the old thread linked from Open Items. For >>>>> that, we have a test case at >>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php >>>>> for >>>>> btree_gist indexes. I have a similar example with the bioseg GiST >>>>> index. I >>>>> have completely reimplemented the same algorithms in Java for >>>>> algorithm >>>>> investigation and instrumentation purposes, and it runs about a >>>>> hundred >>>>> times faster than in Postgres. I think this is a problem, and I'm >>>>> willing >>>>> to do some investigation to try and solve it. >>> I have not made any progress on this issue. I think Oleg and Teodor >>> would be better placed working it out. All I can say is that I >>> implemented the exact same indexing algorithm in Java, and it >>> performed 100 times faster than Postgres. Now, Postgres has to do a >>> lot of additional work, like mapping the index onto disc, locking >>> pages, and abstracting to plugin user functions, so I would expect >>> some difference - I'm not sure 100 times is reasonable though. I >>> tried to do some profiling, but couldn't see any one section of code >>> that was taking too much time. Not sure what I can further do. > Looked in the code a bit more - only the index nodes are compressed at > index creation, the consistent functions does not compress queries, so > not pallocs there. However when running Mathews example from > http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php > with the gist index, the coverage shows in gistget.c: 1000000 palloc0 > 's of gistsearchstack at line 152 and 2010982 palloc's also of the > gistsearchstack on line 342. Two pfrees are also hit a lot: line 195: > 1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is > that the pain is here. My knowledge of gistget or the other sources in > access/gist is zero, but couldn't it be possible to determine the > maximum needed size of the stack and then allocate it at once and use > a pop/push kind off api? Waisted some time today on a ghost chase... I though that removing the millions of pallocs would help, so I wrote an alternative of the gistsearchstack-stack to find out that it was not the index scanning itself that caused milltions of pallocs, but the scan being in the inner loop that was called 1000000 times. The actual scanning time was not changed significantly. The actual scanning time in my vm is for btree (actual time=0.006..0.008) and gist (actual time=0.071..0.074). An error in my searchstack alternative caused pages to be scanned twice, returing twice the amount of rows (6 instead of 3 each time). This resulted in a likewise increase of ms (actual time=0.075..0.150). Somewhere I hit something that causes ~= 0.070 ms twice. For a single index scan, 0.070ms startup time for gist vs 0.006 for btree doesn't seem like a big problem, but yeah when calling it a million times... regards, Yeb Havinga
Yeb Havinga wrote: >>>>> Matthew Wakeling wrote: >>>>>> A second quite distinct issue is the general performance of GiST >>>>>> indexes >>>>>> which is also mentioned in the old thread linked from Open Items. >>>>>> For >>>>>> that, we have a test case at >>>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php >>>>>> for >>>>>> btree_gist indexes. I have a similar example with the bioseg GiST >>>>>> index. I >>>>>> have completely reimplemented the same algorithms in Java for >>>>>> algorithm >>>>>> investigation and instrumentation purposes, and it runs about a >>>>>> hundred >>>>>> times faster than in Postgres. I think this is a problem, and I'm >>>>>> willing >>>>>> to do some investigation to try and solve it. More gist performance.. Some background: I've been involved in developing several datatypes that make use of gist indexes (we just published a paper on it, see http://arxiv.org/abs/1003.3370), that's the main reason I'm very much interested in possible improvements in gist index speed. One of the datatypes was 'very badly' indexable: non leaf pages were getting very general keys, so in the end we could see from the scanning times compared to sequential scans that the whole index was being scanned. One thing I remember was the idea that somehow it would be nice if the dept of the gist tree could be fiddled with: in that case keys of non leaf nodes would be more specific again. In the original Guttman R-tree paper there was mention of a parameter that determined the size of entries in nodes and thereby indirectly the depth of the tree. I missed that in the PostgreSQL gist api. One thing Gist scanning does very often is key comparisons. So another approach is to try to limit those and again this might be possible by increasing the depth / decrease number of entries per page. I just did a test where in gistfitpage the gistpagesize was divided by 5 and something similar in gistnospace. Scantime before adjustment: about 70 seconds. After adjustment: 45 seconds. With gist_stat from the gevel package confirmed that the depth was now 4 (original 3). Drawback: bigger index size because pages are not filled completely anymore. The explain shows (actual time=0.030..0.032) for the inner loop times, which is less then half of the original scan time. Since the gistpagesize is derived from the database blocksize, it might be wise to set the blocksize low for this case, I'm going to play with this a bit more. regards, Yeb Havinga
Yeb Havinga wrote: > > Since the gistpagesize is derived from the database blocksize, it > might be wise to set the blocksize low for this case, I'm going to > play with this a bit more. Ok, one last mail before it turns into spam: with a 1KB database blocksize, the query now runs in 30 seconds (original 70 on my machine, shared buffers 240MB). The per inner loop access time now 24 microsecs compared to on my machine original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad speedup with such a simple parameter :-) postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b + 2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual time=0.184..29540.355 rows=2999997 loops=1) -> Seq Scan on b (cost=0.00..47037.62 rows=999962 width=4) (actual time=0.024..1783.484 rows=1000000 loops=1) -> Index Scan using a_a on a (cost=0.00..2224.78 rows=111114 width=4) (actual time=0.021..0.024 rows=3 loops=1000000) Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2))) Total runtime: 30483.303 ms (5 rows) postgres=# select gist_stat('a_a'); gist_stat ------------------------------------------- Number of levels: 5 + Number of pages: 47618 + Number of leaf pages: 45454 + Number of tuples: 1047617 + Number of invalid tuples: 0 + Number of leaf tuples: 1000000 + Total size of tuples: 21523756 bytes+ Total size of leaf tuples: 20545448 bytes+ Total size of index: 48760832 bytes+ (1 row)
Hi Yeb, I have not looked at the gist code, but would it be possible to make virtual pages that have a size that is 1/power-of-2 * blocksize. Then the leaf node could be 1/8 or even 1/16 the size of the full pagesize. Regards, Ken On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote: > Yeb Havinga wrote: >> >> Since the gistpagesize is derived from the database blocksize, it might be >> wise to set the blocksize low for this case, I'm going to play with this a >> bit more. > Ok, one last mail before it turns into spam: with a 1KB database blocksize, > the query now runs in 30 seconds (original 70 on my machine, shared buffers > 240MB). > The per inner loop access time now 24 microsecs compared to on my machine > original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad > speedup with such a simple parameter :-) > > postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b > + 2; > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual > time=0.184..29540.355 rows=2999997 loops=1) > -> Seq Scan on b (cost=0.00..47037.62 rows=999962 width=4) (actual > time=0.024..1783.484 rows=1000000 loops=1) > -> Index Scan using a_a on a (cost=0.00..2224.78 rows=111114 width=4) > (actual time=0.021..0.024 rows=3 loops=1000000) > Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2))) > Total runtime: 30483.303 ms > (5 rows) > > > postgres=# select gist_stat('a_a'); > gist_stat > ------------------------------------------- > Number of levels: 5 + > Number of pages: 47618 + > Number of leaf pages: 45454 + > Number of tuples: 1047617 + > Number of invalid tuples: 0 + > Number of leaf tuples: 1000000 + > Total size of tuples: 21523756 bytes+ > Total size of leaf tuples: 20545448 bytes+ > Total size of index: 48760832 bytes+ > (1 row) > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Fri, Mar 19, 2010 at 10:16 PM, Kenneth Marshall <ktm@rice.edu> wrote:
Hello Ken,
The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient.
But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it.
Maybe Theodor or Oleg could say something about how easy or hard it is to do?
regards,
Yeb Havinga
Hi Yeb,
I have not looked at the gist code, but would it be possible to
make virtual pages that have a size that is 1/power-of-2 * blocksize.
Then the leaf node could be 1/8 or even 1/16 the size of the full
pagesize.
Hello Ken,
The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient.
But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it.
Maybe Theodor or Oleg could say something about how easy or hard it is to do?
regards,
Yeb Havinga
Regards,
Ken
On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote:
> Yeb Havinga wrote:
>>
>> Since the gistpagesize is derived from the database blocksize, it might be
>> wise to set the blocksize low for this case, I'm going to play with this a
>> bit more.
> Ok, one last mail before it turns into spam: with a 1KB database blocksize,
> the query now runs in 30 seconds (original 70 on my machine, shared buffers
> 240MB).
> The per inner loop access time now 24 microsecs compared to on my machine
> original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad
> speedup with such a simple parameter :-)
>
> postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b
> + 2;
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual
> time=0.184..29540.355 rows=2999997 loops=1)
> -> Seq Scan on b (cost=0.00..47037.62 rows=999962 width=4) (actual
> time=0.024..1783.484 rows=1000000 loops=1)
> -> Index Scan using a_a on a (cost=0.00..2224.78 rows=111114 width=4)
> (actual time=0.021..0.024 rows=3 loops=1000000)
> Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
> Total runtime: 30483.303 ms
> (5 rows)
>
>
> postgres=# select gist_stat('a_a');
> gist_stat
> -------------------------------------------
> Number of levels: 5 +
> Number of pages: 47618 +
> Number of leaf pages: 45454 +
> Number of tuples: 1047617 +
> Number of invalid tuples: 0 +
> Number of leaf tuples: 1000000 +
> Total size of tuples: 21523756 bytes+
> Total size of leaf tuples: 20545448 bytes+
> Total size of index: 48760832 bytes+
> (1 row)
>
>> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
On Sat, 20 Mar 2010, Yeb Havinga wrote: > The gist virtual pages would then match more the original blocksizes that > were used in Guttman's R-tree paper (first google result, then figure 4.5). > Since the nature/characteristics of the underlying datatypes and keys is not > changed, it might be that with the disk pages getting larger, gist indexing > has therefore become unexpectedly inefficient. Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. > But I am also not really into the core-gist code, but do have a motivation > to dive into it (more than 200% performance increase in Mathew's test case). > However I'd like to verify for community support before working on it. I'd also love to dive into the core gist code, but am rather daunted by it. I believe that there is something there that is taking more time than I can account for. The indexing algorithm itself is good. Matthew -- "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary." - James Nicoll
Matthew Wakeling wrote: > On Sat, 20 Mar 2010, Yeb Havinga wrote: >> The gist virtual pages would then match more the original blocksizes >> that >> were used in Guttman's R-tree paper (first google result, then figure >> 4.5). >> Since the nature/characteristics of the underlying datatypes and keys >> is not >> changed, it might be that with the disk pages getting larger, gist >> indexing >> has therefore become unexpectedly inefficient. > > Yes, that is certainly a factor. For example, the page size for bioseg > which we use here is 130 entries, which is very excessive, and doesn't > allow very deep trees. On the other hand, it means that a single disc > seek performs quite a lot of work. Yeah, I only did in-memory fitting tests and wondered about increased io's. However I bet that even for bigger than ram db's, the benefit of having to fan out to less pages still outweighs the over-general non leaf nodes and might still result in less disk io's. I redid some earlier benchmarking with other datatypes with a 1kB block size and also multicolumn gist and the multicolumn variant had an ever greater benefit than the single column indexes, both equality and range scans. (Like execution times down to 20% of original). If gist is important to you, I really recommend doing a test with 1kB blocks. regards, Yeb Havinga
On Mon, 22 Mar 2010, Yeb Havinga wrote: >> Yes, that is certainly a factor. For example, the page size for bioseg >> which we use here is 130 entries, which is very excessive, and doesn't >> allow very deep trees. On the other hand, it means that a single disc seek >> performs quite a lot of work. > Yeah, I only did in-memory fitting tests and wondered about increased io's. > However I bet that even for bigger than ram db's, the benefit of having to > fan out to less pages still outweighs the over-general non leaf nodes and > might still result in less disk io's. I redid some earlier benchmarking with > other datatypes with a 1kB block size and also multicolumn gist and the > multicolumn variant had an ever greater benefit than the single column > indexes, both equality and range scans. (Like execution times down to 20% of > original). If gist is important to you, I really recommend doing a test with > 1kB blocks. Purely from a disc seek count point of view, assuming an infinite CPU speed and infinite disc transfer rate, the larger the index pages the better. The number of seeks per fetch will be equivalent to the depth of the tree. If you take disc transfer rate into account, the break-even point is when you spend an equal time transferring as seeking, which places the page size around 500kB on a modern disc, assuming RAID stripe alignment doesn't make that into two seeks instead of one. However, for efficient CPU usage, the ideal page size for a tree index is much smaller - between two and ten entries, depending on the type of the data. There may be some mileage in reorganising indexes into a two-level system. That is, have an index format where the page size is 512kB or similar, but each page is internally a CPU-efficient tree itself. However, this is beyond the scope of the problem of speeding up gist. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks