Обсуждение: Index tuple killing code committed
Per previous discussion, I have committed changes that cause the btree and hash index methods to mark index tuples "killed" the first time they are fetched after becoming globally dead. Subsequently the killed entries are not returned out of indexscans, saving useless heap fetches. (I haven't changed rtree and gist yet; they will need some internal restructuring to do this efficiently. Perhaps Oleg or Teodor would like to take that on.) This seems to make a useful improvement in pgbench results. Yesterday's CVS tip gave me these results: (Running postmaster with "-i -F -B 1024", other parameters at defaults, and pgbench initialized with "pgbench -i -s 10 bench") $ time pgbench -c 5 -t 1000 -n bench tps = 26.428787(including connections establishing) tps = 26.443410(excluding connections establishing) real 3:09.74 $ time pgbench -c 5 -t 1000 -n bench tps = 18.838304(including connections establishing) tps = 18.846281(excluding connections establishing) real 4:26.41 $ time pgbench -c 5 -t 1000 -n bench tps = 13.541641(including connections establishing) tps = 13.545646(excluding connections establishing) real 6:10.19 Note the "-n" switches here to prevent vacuums between runs; the point is to observe the degradation as more and more dead tuples accumulate. With the just-committed changes I get (starting from a fresh start): $ time pgbench -c 5 -t 1000 -n bench tps = 28.393271(including connections establishing) tps = 28.410117(excluding connections establishing) real 2:56.53 $ time pgbench -c 5 -t 1000 -n bench tps = 23.498645(including connections establishing) tps = 23.510134(excluding connections establishing) real 3:33.89 $ time pgbench -c 5 -t 1000 -n bench tps = 18.773239(including connections establishing) tps = 18.780936(excluding connections establishing) real 4:26.84 The remaining degradation is actually in seqscan performance, not indexscan --- unless one uses a much larger -s setting, the planner will think it ought to use seqscans for updating the "branches" and "tellers" tables, since those nominally have just a few rows; and there's no way to avoid scanning lots of dead tuples in a seqscan. Forcing indexscans helps some in the former CVS tip: $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench tps = 28.840678(including connections establishing) tps = 28.857442(excluding connections establishing) real 2:53.9 $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench tps = 25.670674(including connections establishing) tps = 25.684493(excluding connections establishing) real 3:15.7 $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench tps = 22.593429(including connections establishing) tps = 22.603928(excluding connections establishing) real 3:42.7 and with the changes I get: $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench tps = 29.445004(including connections establishing) tps = 29.463948(excluding connections establishing) real 2:50.3 $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench tps = 30.277968(including connections establishing) tps = 30.301363(excluding connections establishing) real 2:45.6 $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench tps = 30.209377(including connections establishing) tps = 30.230646(excluding connections establishing) real 2:46.0 This is the first time I have ever seen repeated pgbench runs without substantial performance degradation. Not a bad result for a Friday afternoon... regards, tom lane
> This is the first time I have ever seen repeated pgbench runs without > substantial performance degradation. Not a bad result for a Friday > afternoon... Congatulations :-) This sounds great !!! Andreas
Tom Lane wrote: > The remaining degradation is actually in seqscan performance, not > indexscan --- unless one uses a much larger -s setting, the planner will > think it ought to use seqscans for updating the "branches" and "tellers" > tables, since those nominally have just a few rows; and there's no way > to avoid scanning lots of dead tuples in a seqscan. Forcing indexscans > helps some in the former CVS tip: > This may qualify as a "way out there" idea, or more trouble than it's worth, but what about a table option which provides a bitmap index of tuple status -- i.e. tuple dead t/f. If available, a seqscan in between vacuums could maybe gain some of the same efficiency. > This is the first time I have ever seen repeated pgbench runs without > substantial performance degradation. Not a bad result for a Friday > afternoon... Nice work! Joe
On Sat, 2002-05-25 at 02:38, Joe Conway wrote: > Tom Lane wrote: > > The remaining degradation is actually in seqscan performance, not > > indexscan --- unless one uses a much larger -s setting, the planner will > > think it ought to use seqscans for updating the "branches" and "tellers" > > tables, since those nominally have just a few rows; and there's no way > > to avoid scanning lots of dead tuples in a seqscan. Forcing indexscans > > helps some in the former CVS tip: > > > > This may qualify as a "way out there" idea, or more trouble than it's > worth, but what about a table option which provides a bitmap index of > tuple status -- i.e. tuple dead t/f. If available, a seqscan in between > vacuums could maybe gain some of the same efficiency. I guess this would be only useful if it is a bitmap of dead _pages_ not tuples (page reading is mostexpensive plus there is no way to know how many tuples per page) but for worst cases (small table with lots of updates) this can be a great thing that can postpone fixing optimiser to account for dead tuples. one 8K pages can hold bits for 8192*8 = 65536 pages = 512 Mbytes and if seqscan could skip first 500 of them it would definitely be worth it ;) > > This is the first time I have ever seen repeated pgbench runs without > > substantial performance degradation. Not a bad result for a Friday > > afternoon... Really good news! ----------- Hannu
Joe Conway <mail@joeconway.com> writes: > This may qualify as a "way out there" idea, or more trouble than it's > worth, but what about a table option which provides a bitmap index of > tuple status -- i.e. tuple dead t/f. If available, a seqscan in between > vacuums could maybe gain some of the same efficiency. Hmm. I'm inclined to think that a separate bitmap index wouldn't be worth the trouble. Under most scenarios it'd just require extra I/O and not buy much. However ... we could potentially take over the LP_DELETED flag bit of heap tuples for the same use as for index tuples: set it when the tuple is known dead for all transactions. This would save calling HeapTupleSatisfiesSnapshot in the inner loop of heapgettup, while not adding much expense for the normal case where the tuple's not dead. regards, tom lane
I found following mail in my mail archive and tried the same test with 7.4 current. Contrary to my expectation 7.4 showed some performance degration with continuous pgbench runs: $ pgbench -c 5 -t 1000 -n test tps = 57.444037 (including connections establishing) tps = 57.455300 (excluding connections establishing) $ pgbench -c 5 -t 1000 -n test tps = 54.125785 (including connections establishing) tps = 54.134871 (excluding connections establishing) $pgbench -c 5 -t 1000 -n test tps = 51.116465 (including connections establishing) tps = 51.124878 (excluding connections establishing) $ pgbench -c 5 -t 1000 -n test tps = 50.410659 (including connections establishing) tps = 50.420215 (excluding connections establishing) $ pgbench -c 5 -t 1000 test tps = 46.791980 (including connections establishing) tps = 46.799837 (excluding connections establishing) Any idea? data is initialized by pgbench -i -s 10. -- Tatsuo Ishii > From: Tom Lane <tgl@sss.pgh.pa.us> > To: pgsql-hackers@postgresql.org > Date: Fri, 24 May 2002 16:42:55 -0400 > > Per previous discussion, I have committed changes that cause the btree > and hash index methods to mark index tuples "killed" the first time they > are fetched after becoming globally dead. Subsequently the killed > entries are not returned out of indexscans, saving useless heap fetches. > (I haven't changed rtree and gist yet; they will need some internal > restructuring to do this efficiently. Perhaps Oleg or Teodor would like > to take that on.) > > This seems to make a useful improvement in pgbench results. Yesterday's > CVS tip gave me these results: > > (Running postmaster with "-i -F -B 1024", other parameters at defaults, > and pgbench initialized with "pgbench -i -s 10 bench") > > $ time pgbench -c 5 -t 1000 -n bench > tps = 26.428787(including connections establishing) > tps = 26.443410(excluding connections establishing) > real 3:09.74 > $ time pgbench -c 5 -t 1000 -n bench > tps = 18.838304(including connections establishing) > tps = 18.846281(excluding connections establishing) > real 4:26.41 > $ time pgbench -c 5 -t 1000 -n bench > tps = 13.541641(including connections establishing) > tps = 13.545646(excluding connections establishing) > real 6:10.19 > > Note the "-n" switches here to prevent vacuums between runs; the point > is to observe the degradation as more and more dead tuples accumulate. > > With the just-committed changes I get (starting from a fresh start): > > $ time pgbench -c 5 -t 1000 -n bench > tps = 28.393271(including connections establishing) > tps = 28.410117(excluding connections establishing) > real 2:56.53 > $ time pgbench -c 5 -t 1000 -n bench > tps = 23.498645(including connections establishing) > tps = 23.510134(excluding connections establishing) > real 3:33.89 > $ time pgbench -c 5 -t 1000 -n bench > tps = 18.773239(including connections establishing) > tps = 18.780936(excluding connections establishing) > real 4:26.84 > > The remaining degradation is actually in seqscan performance, not > indexscan --- unless one uses a much larger -s setting, the planner will > think it ought to use seqscans for updating the "branches" and "tellers" > tables, since those nominally have just a few rows; and there's no way > to avoid scanning lots of dead tuples in a seqscan. Forcing indexscans > helps some in the former CVS tip: > > $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench > tps = 28.840678(including connections establishing) > tps = 28.857442(excluding connections establishing) > real 2:53.9 > $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench > tps = 25.670674(including connections establishing) > tps = 25.684493(excluding connections establishing) > real 3:15.7 > $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench > tps = 22.593429(including connections establishing) > tps = 22.603928(excluding connections establishing) > real 3:42.7 > > and with the changes I get: > > $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench > tps = 29.445004(including connections establishing) > tps = 29.463948(excluding connections establishing) > real 2:50.3 > $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench > tps = 30.277968(including connections establishing) > tps = 30.301363(excluding connections establishing) > real 2:45.6 > $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench > tps = 30.209377(including connections establishing) > tps = 30.230646(excluding connections establishing) > real 2:46.0 > > > This is the first time I have ever seen repeated pgbench runs without > substantial performance degradation. Not a bad result for a Friday > afternoon... > > regards, tom lane >
Forgot to mention an important thing: I set enable_seqscan = off in postgresql.conf. > I found following mail in my mail archive and tried the same test with > 7.4 current. Contrary to my expectation 7.4 showed some performance > degration with continuous pgbench runs: > > $ pgbench -c 5 -t 1000 -n test > tps = 57.444037 (including connections establishing) > tps = 57.455300 (excluding connections establishing) > > $ pgbench -c 5 -t 1000 -n test > tps = 54.125785 (including connections establishing) > tps = 54.134871 (excluding connections establishing) > > $pgbench -c 5 -t 1000 -n test > tps = 51.116465 (including connections establishing) > tps = 51.124878 (excluding connections establishing) > > $ pgbench -c 5 -t 1000 -n test > tps = 50.410659 (including connections establishing) > tps = 50.420215 (excluding connections establishing) > > $ pgbench -c 5 -t 1000 test > tps = 46.791980 (including connections establishing) > tps = 46.799837 (excluding connections establishing) > > Any idea? > > data is initialized by pgbench -i -s 10. > -- > Tatsuo Ishii > > > From: Tom Lane <tgl@sss.pgh.pa.us> > > To: pgsql-hackers@postgresql.org > > Date: Fri, 24 May 2002 16:42:55 -0400 > > > > Per previous discussion, I have committed changes that cause the btree > > and hash index methods to mark index tuples "killed" the first time they > > are fetched after becoming globally dead. Subsequently the killed > > entries are not returned out of indexscans, saving useless heap fetches. > > (I haven't changed rtree and gist yet; they will need some internal > > restructuring to do this efficiently. Perhaps Oleg or Teodor would like > > to take that on.) > > > > This seems to make a useful improvement in pgbench results. Yesterday's > > CVS tip gave me these results: > > > > (Running postmaster with "-i -F -B 1024", other parameters at defaults, > > and pgbench initialized with "pgbench -i -s 10 bench") > > > > $ time pgbench -c 5 -t 1000 -n bench > > tps = 26.428787(including connections establishing) > > tps = 26.443410(excluding connections establishing) > > real 3:09.74 > > $ time pgbench -c 5 -t 1000 -n bench > > tps = 18.838304(including connections establishing) > > tps = 18.846281(excluding connections establishing) > > real 4:26.41 > > $ time pgbench -c 5 -t 1000 -n bench > > tps = 13.541641(including connections establishing) > > tps = 13.545646(excluding connections establishing) > > real 6:10.19 > > > > Note the "-n" switches here to prevent vacuums between runs; the point > > is to observe the degradation as more and more dead tuples accumulate. > > > > With the just-committed changes I get (starting from a fresh start): > > > > $ time pgbench -c 5 -t 1000 -n bench > > tps = 28.393271(including connections establishing) > > tps = 28.410117(excluding connections establishing) > > real 2:56.53 > > $ time pgbench -c 5 -t 1000 -n bench > > tps = 23.498645(including connections establishing) > > tps = 23.510134(excluding connections establishing) > > real 3:33.89 > > $ time pgbench -c 5 -t 1000 -n bench > > tps = 18.773239(including connections establishing) > > tps = 18.780936(excluding connections establishing) > > real 4:26.84 > > > > The remaining degradation is actually in seqscan performance, not > > indexscan --- unless one uses a much larger -s setting, the planner will > > think it ought to use seqscans for updating the "branches" and "tellers" > > tables, since those nominally have just a few rows; and there's no way > > to avoid scanning lots of dead tuples in a seqscan. Forcing indexscans > > helps some in the former CVS tip: > > > > $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench > > tps = 28.840678(including connections establishing) > > tps = 28.857442(excluding connections establishing) > > real 2:53.9 > > $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench > > tps = 25.670674(including connections establishing) > > tps = 25.684493(excluding connections establishing) > > real 3:15.7 > > $ PGOPTIONS="-fs" time pgbench -c 5 -t 1000 -n bench > > tps = 22.593429(including connections establishing) > > tps = 22.603928(excluding connections establishing) > > real 3:42.7 > > > > and with the changes I get: > > > > $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench > > tps = 29.445004(including connections establishing) > > tps = 29.463948(excluding connections establishing) > > real 2:50.3 > > $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench > > tps = 30.277968(including connections establishing) > > tps = 30.301363(excluding connections establishing) > > real 2:45.6 > > $ PGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench > > tps = 30.209377(including connections establishing) > > tps = 30.230646(excluding connections establishing) > > real 2:46.0 > > > > > > This is the first time I have ever seen repeated pgbench runs without > > substantial performance degradation. Not a bad result for a Friday > > afternoon... > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I found following mail in my mail archive and tried the same test with > 7.4 current. Contrary to my expectation 7.4 showed some performance > degration with continuous pgbench runs: I did some profiling and found that with repeated runs, an increasing amount of time was being spent in _bt_check_unique, verifying that updates aren't inserting duplicate keys. This is not real surprising when running without vacuums between runs --- the number of dead tuples will increase from run to run. I am not sure why the effect was not visible in my old post that you quote. Perhaps it was masked by some other inefficiency and is now visible in 7.4. I was able to reduce the cost somewhat by changing the order of tests in _bt_check_unique. With CVS tip, using default postmaster settings (except fsync off) and "pgbench -i -s 10 bench" followed by a manual CHECKPOINT command, I get this for repeated runs ofPGOPTIONS=-fs time pgbench -c 5 -t 1000 -n bench tps = 113.387182 (including connections establishing) tps = 113.433897 (excluding connections establishing) 0.95user 0.86system 0:44.51elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k tps = 72.989851 (including connections establishing) tps = 73.010209 (excluding connections establishing) 0.86user 0.89system 1:09.17elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 88.848571 (including connections establishing) tps = 88.878369 (excluding connections establishing) 0.82user 0.92system 0:57.19elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 89.156320 (including connections establishing) tps = 89.186299 (excluding connections establishing) 0.88user 0.92system 0:57.08elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 92.708733 (including connections establishing) tps = 92.742335 (excluding connections establishing) 0.85user 0.83system 0:54.76elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 93.533412 (including connections establishing) tps = 93.566385 (excluding connections establishing) 0.92user 0.87system 0:54.33elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 94.987538 (including connections establishing) tps = 95.021715 (excluding connections establishing) 0.91user 0.88system 0:53.44elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 98.119179 (including connections establishing) tps = 98.157492 (excluding connections establishing) 0.93user 0.94system 0:51.58elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 108.404186 (including connections establishing) tps = 108.448242 (excluding connections establishing) 0.85user 0.90system 0:47.30elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 102.725845 (including connections establishing) tps = 102.767392 (excluding connections establishing) 0.91user 0.90system 0:49.93elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k Not sure why the apparent up-trend in performance after the second run. Maybe we've done about as much btree index splitting as we need, after a few runs? Odd. Same conditions, but with a vacuum between runs, ie no "-n": tps = 112.266969 (including connections establishing) tps = 112.314017 (excluding connections establishing) 0.91user 0.89system 0:44.62elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k tps = 77.936183 (including connections establishing) tps = 77.959810 (excluding connections establishing) 0.90user 0.92system 1:07.20elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 81.669010 (including connections establishing) tps = 81.694586 (excluding connections establishing) 0.88user 0.87system 1:02.23elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 86.493578 (including connections establishing) tps = 86.522430 (excluding connections establishing) 0.87user 0.83system 0:59.61elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 100.759980 (including connections establishing) tps = 100.797922 (excluding connections establishing) 0.90user 0.89system 0:50.10elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 77.215387 (including connections establishing) tps = 77.238131 (excluding connections establishing) 0.95user 0.88system 1:06.35elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 83.298182 (including connections establishing) tps = 83.324866 (excluding connections establishing) 0.88user 0.89system 1:01.23elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 90.534451 (including connections establishing) tps = 90.566022 (excluding connections establishing) 0.95user 0.86system 0:56.40elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 84.085577 (including connections establishing) tps = 84.113060 (excluding connections establishing) 0.93user 0.89system 1:05.35elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 95.637302 (including connections establishing) tps = 95.672293 (excluding connections establishing) 0.92user 0.85system 0:52.80elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k Same with vacuum and checkpoint between runs: tps = 107.715226 (including connections establishing) tps = 107.758258 (excluding connections establishing) 0.86user 0.88system 0:46.51elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 78.816549 (including connections establishing) tps = 78.839575 (excluding connections establishing) 0.87user 0.87system 1:03.68elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 90.150489 (including connections establishing) tps = 90.184264 (excluding connections establishing) 0.90user 0.89system 0:55.74elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 81.648951 (including connections establishing) tps = 81.674276 (excluding connections establishing) 0.93user 0.94system 1:01.54elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 86.850517 (including connections establishing) tps = 86.879127 (excluding connections establishing) 0.84user 0.88system 0:57.86elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 85.776988 (including connections establishing) tps = 85.809491 (excluding connections establishing) 0.83user 0.94system 0:58.53elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 80.293178 (including connections establishing) tps = 80.317318 (excluding connections establishing) 0.90user 0.86system 1:02.49elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 80.691960 (including connections establishing) tps = 80.716409 (excluding connections establishing) 0.84user 0.93system 1:02.19elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k tps = 85.067751 (including connections establishing) tps = 85.095110 (excluding connections establishing) 0.92user 0.91system 0:59.02elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k tps = 80.134553 (including connections establishing) tps = 80.158595 (excluding connections establishing) 0.87user 0.97system 1:02.61elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k It appears that much of the run-to-run variability is accounted for by background checkpoints; forcing a checkpoint between runs helps but doesn't eliminate the noise. (Probably if I'd changed the checkpoint frequency settings to prevent intra-run checkpoints from happening, I could get stable numbers. I'm out of time to play with it for now though.) regards, tom lane