Обсуждение: A couple of proposed pgbench changes
Hi Tatsuo, Attached is a proposed patch that deals with a couple of pgbench issues. The change at line 490 updates doCustom's local variable "commands" after selecting a new file (command sequence). I think that the existing coding will cause the thing to use the first command of the old sequence in the remainder of the routine, which would be a bug. I have not tried to set up a test case to prove it, though. The other two changes cause doCustom to loop after processing a meta-command. This might be a bit controversial, but as the code is currently written, each meta-command "costs" one cycle of the outer select() loop. Thus, for example, with the default TPC-B script, once a backend returns "COMMIT" it will not receive a new command until four cycles of issuing commands to other backends have elapsed. (You can see this very easily by strace'ing pgbench under load.) ISTM it is bad to have backends sitting idle waiting for pgbench to give them a new command. On the other hand you could argue that finishing out several consecutive metacommands will delay issuance of new commands to other backends. In the test case I was running, making this change made for a small but noticeable improvement in total CPU usage, but I'm not entirely convinced it'd always be a win. I get the impression that pgbench itself is a bit of a bottleneck when running on multi-CPU machines. I can't get the total CPU usage to reach 90% with ten client threads on a dual HT-enabled Xeon, and the only explanation I can see is that there are too many backends waiting for pgbench to give them new commands instead of doing useful work. strace shows that each select() iteration usually finds *all ten* sockets read-ready, which is definitely bad. (I tried nice'ing pgbench to negative priority, but it did not improve matters. It could easily be that this is a Heisenproblem, though, with strace itself slowing pgbench enough to cause the behavior.) I wonder if it would help for pgbench to fork off multiple sub-processes and have each sub-process tend just one backend. Anyway, since I'm not sure of either of these changes, I'm passing them to you for review. regards, tom lane Index: pgbench.c =================================================================== RCS file: /cvsroot/pgsql/contrib/pgbench/pgbench.c,v retrieving revision 1.48 diff -c -r1.48 pgbench.c *** pgbench.c 23 Nov 2005 12:19:12 -0000 1.48 --- pgbench.c 29 Nov 2005 23:51:46 -0000 *************** *** 411,416 **** --- 411,417 ---- CState *st = &state[n]; Command **commands; + top: commands = sql_files[st->use_file]; if (st->listen) *************** *** 489,494 **** --- 490,496 ---- { st->state = 0; st->use_file = getrand(0, num_files - 1); + commands = sql_files[st->use_file]; } } *************** *** 572,577 **** --- 574,581 ---- free(val); st->listen = 1; } + + goto top; } }
> Hi Tatsuo, > > Attached is a proposed patch that deals with a couple of pgbench issues. > > The change at line 490 updates doCustom's local variable "commands" > after selecting a new file (command sequence). I think that the > existing coding will cause the thing to use the first command of the > old sequence in the remainder of the routine, which would be a bug. > I have not tried to set up a test case to prove it, though. Thanks for pointing it out. It's definitely a bug. To reproduce the problem, you have two SQL files, say, a.sql: SELECT 1; SELECT 2; b.sql: SELECT 300; SELECT 400; then run, pgbench -f a.sql -f b.sql and see the SQL trace by enabling log_statement=all, I see: LOG: statement: SELECT 300; LOG: statement: SELECT 400; LOG: statement: SELECT 300; LOG: statement: SELECT 2; LOG: statement: SELECT 1; LOG: statement: SELECT 400; LOG: statement: SELECT 300; LOG: statement: SELECT 400; LOG: statement: SELECT 300; LOG: statement: SELECT 2; LOG: statement: SELECT 1; : : #1 and #2 are ok, but #3(SELECT 300) and #4(SELECT 2) combo is supposed to be impossible if pgbench works correctly. > The other two changes cause doCustom to loop after processing a > meta-command. This might be a bit controversial, but as the code > is currently written, each meta-command "costs" one cycle of the > outer select() loop. Thus, for example, with the default TPC-B script, > once a backend returns "COMMIT" it will not receive a new command > until four cycles of issuing commands to other backends have elapsed. > (You can see this very easily by strace'ing pgbench under load.) > ISTM it is bad to have backends sitting idle waiting for pgbench to > give them a new command. On the other hand you could argue that > finishing out several consecutive metacommands will delay issuance of > new commands to other backends. In the test case I was running, > making this change made for a small but noticeable improvement in > total CPU usage, but I'm not entirely convinced it'd always be a win. Agreed. > I get the impression that pgbench itself is a bit of a bottleneck when > running on multi-CPU machines. I can't get the total CPU usage to reach > 90% with ten client threads on a dual HT-enabled Xeon, and the only > explanation I can see is that there are too many backends waiting for > pgbench to give them new commands instead of doing useful work. strace > shows that each select() iteration usually finds *all ten* sockets > read-ready, which is definitely bad. (I tried nice'ing pgbench to > negative priority, but it did not improve matters. It could easily be > that this is a Heisenproblem, though, with strace itself slowing pgbench > enough to cause the behavior.) I wonder if it would help for pgbench to > fork off multiple sub-processes and have each sub-process tend just one > backend. I'm not sure multiple sub-processes version of pgbench shows superior performance than current implementation because of process context switching overhead. Maybe threading is better? Mr. Yasuo Ohgaki implemented pthead version of pgbench. http://wiki.ohgaki.net/index.php?plugin=attach&pcmd=open&file=ppgbench-20050906-3.tar.bz2&refer=PostgreSQL%2Fppgbench You might want to try. > Anyway, since I'm not sure of either of these changes, I'm passing them > to you for review. Your patches seem fine. -- Tatsuo Ishii SRA OSS, Inc. Japan
I have commited your fixes into PostgreSQL 8.1 stable branches. -- Tatsuo Ishii SRA OSS, Inc. Japan > Hi Tatsuo, > > Attached is a proposed patch that deals with a couple of pgbench issues. > > The change at line 490 updates doCustom's local variable "commands" > after selecting a new file (command sequence). I think that the > existing coding will cause the thing to use the first command of the > old sequence in the remainder of the routine, which would be a bug. > I have not tried to set up a test case to prove it, though. > > The other two changes cause doCustom to loop after processing a > meta-command. This might be a bit controversial, but as the code > is currently written, each meta-command "costs" one cycle of the > outer select() loop. Thus, for example, with the default TPC-B script, > once a backend returns "COMMIT" it will not receive a new command > until four cycles of issuing commands to other backends have elapsed. > (You can see this very easily by strace'ing pgbench under load.) > ISTM it is bad to have backends sitting idle waiting for pgbench to > give them a new command. On the other hand you could argue that > finishing out several consecutive metacommands will delay issuance of > new commands to other backends. In the test case I was running, > making this change made for a small but noticeable improvement in > total CPU usage, but I'm not entirely convinced it'd always be a win. > > I get the impression that pgbench itself is a bit of a bottleneck when > running on multi-CPU machines. I can't get the total CPU usage to reach > 90% with ten client threads on a dual HT-enabled Xeon, and the only > explanation I can see is that there are too many backends waiting for > pgbench to give them new commands instead of doing useful work. strace > shows that each select() iteration usually finds *all ten* sockets > read-ready, which is definitely bad. (I tried nice'ing pgbench to > negative priority, but it did not improve matters. It could easily be > that this is a Heisenproblem, though, with strace itself slowing pgbench > enough to cause the behavior.) I wonder if it would help for pgbench to > fork off multiple sub-processes and have each sub-process tend just one > backend. > > Anyway, since I'm not sure of either of these changes, I'm passing them > to you for review. > > regards, tom lane > > > Index: pgbench.c > =================================================================== > RCS file: /cvsroot/pgsql/contrib/pgbench/pgbench.c,v > retrieving revision 1.48 > diff -c -r1.48 pgbench.c > *** pgbench.c 23 Nov 2005 12:19:12 -0000 1.48 > --- pgbench.c 29 Nov 2005 23:51:46 -0000 > *************** > *** 411,416 **** > --- 411,417 ---- > CState *st = &state[n]; > Command **commands; > > + top: > commands = sql_files[st->use_file]; > > if (st->listen) > *************** > *** 489,494 **** > --- 490,496 ---- > { > st->state = 0; > st->use_file = getrand(0, num_files - 1); > + commands = sql_files[st->use_file]; > } > } > > *************** > *** 572,577 **** > --- 574,581 ---- > free(val); > st->listen = 1; > } > + > + goto top; > } > } > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >