Обсуждение: [HACKERS] TAP tests take a long time

Поиск
Список
Период
Сортировка

[HACKERS] TAP tests take a long time

От
Andrew Dunstan
Дата:
I've complained about this before. Below are some timings from buildfarm
member nightjar as I test out the new client code.

This buildfarm run as you can see takes 33m32s, and the Tap tests take a
combined 19m52s of that time. That seems quite an inordinate amount of
time, when checking out the code, configuring, building and running all
the other checks in two locales in many cases consumes 13m40s. If people
want to incorporate these into the check-world targets and then
encourage developers to run check-world, there will be a massive fail if
you can take a shower and make and drink a cup of coffee in the time it
takes to run it.


cheers


andrew


Tue Apr 11 10:49:18 2017: buildfarm run for nightjar:HEAD starting
[10:49:18] checking out source ...
[10:49:34] checking if build run needed ...
[10:49:34] creating vpath build dir pgsql.build ...
[10:49:34] running configure ...
[10:49:59] running make ...
[10:50:43] running make check ...
[10:51:47] running make contrib ...
[10:51:53] running make testmodules ...
[10:51:54] running make install ...
[10:52:01] running make contrib install ...
[10:52:03] running make testmodules install ...
[10:52:03] checking pg_upgrade
[10:54:17] running make bin installcheck ...
[11:04:04] running make misc checks ...
[11:14:09] setting up db cluster (C)...
[11:14:15] starting db (C)...
[11:14:16] running make installcheck (C)...
[11:15:31] restarting db (C)...
[11:15:36] running make isolation check ...
[11:17:24] restarting db (C)...
[11:17:28] running make PL installcheck (C)...
[11:17:36] restarting db (C)...
[11:17:38] running make contrib installcheck (C)...
[11:18:24] restarting db (C)...
[11:18:27] running make test-modules installcheck (C)...
[11:18:39] stopping db (C)...
[11:18:56] setting up db cluster (cs_CZ.UTF-8)...
[11:19:01] starting db (cs_CZ.UTF-8)...
[11:19:02] running make installcheck (cs_CZ.UTF-8)...
[11:20:03] restarting db (cs_CZ.UTF-8)...
[11:20:08] running make PL installcheck (cs_CZ.UTF-8)...
[11:20:20] restarting db (cs_CZ.UTF-8)...
[11:20:22] running make contrib installcheck (cs_CZ.UTF-8)...
[11:21:06] restarting db (cs_CZ.UTF-8)...
[11:21:08] running make test-modules installcheck (cs_CZ.UTF-8)...
[11:21:24] stopping db (cs_CZ.UTF-8)...
[11:21:42] running make ecpg check ...
[11:22:12] running find_typedefs ...
[11:22:50] OK


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] TAP tests take a long time

От
Robert Haas
Дата:
On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> I've complained about this before. Below are some timings from buildfarm
> member nightjar as I test out the new client code.
>
> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
> combined 19m52s of that time. That seems quite an inordinate amount of
> time, when checking out the code, configuring, building and running all
> the other checks in two locales in many cases consumes 13m40s. If people
> want to incorporate these into the check-world targets and then
> encourage developers to run check-world, there will be a massive fail if
> you can take a shower and make and drink a cup of coffee in the time it
> takes to run it.

I don't think it's quite fair to complain about the TAP tests taking a
long time to run as a general matter.  Many people here have long
wanted a separate test-suite for long running tests that can be run to
really check everything possible, and apparently, TAP tests are it.

What I think would be more useful is to break down where the time is
getting spent.  It may be that some of those tests are not adding
value proportionate to their runtime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] TAP tests take a long time

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>> combined 19m52s of that time.

> I don't think it's quite fair to complain about the TAP tests taking a
> long time to run as a general matter.  Many people here have long
> wanted a separate test-suite for long running tests that can be run to
> really check everything possible, and apparently, TAP tests are it.

> What I think would be more useful is to break down where the time is
> getting spent.  It may be that some of those tests are not adding
> value proportionate to their runtime.

The other thing that might be useful here is to push on parallelizing
buildfarm runs.  Admittedly that will do nothing for the oldest and
slowest buildfarm critters, but for reasonably modern hardware the
serialization of the tests really handicaps you.  We seem to have
fixed that for manual application of "make check-world", at least
if you know the right magic incantations to parallelize it; but
AFAIK the buildfarm script is entirely serial.

BTW, speaking of "value proportionate to runtime", the hash_index
regression script is now visibly a bottleneck in the core regression
tests.  Could we take another look at whether that needs to run
quite so long?
        regards, tom lane



Re: [HACKERS] TAP tests take a long time

От
Peter Eisentraut
Дата:
On 4/11/17 12:08, Tom Lane wrote:
> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.

Especially the recovery and subscription tests do a lot of waiting for
one of the nodes to catch up.  You get by that by running several of
these tests in parallel, which is well supported now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] TAP tests take a long time

От
Andrew Dunstan
Дата:

On 04/11/2017 12:08 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.



Well, you can get a lot of information on timings in crake's latest
builds for example, or nightjar's.

Here's an interesting fact: almost all the time taken up in the scripts
test set seems to be consumed in running initdb over and over. The
actual tests take almost no time at all. Other test sets might have
similar stats, I am still digging.

+1 for doing some assessments on test value, and also for possibly
creating several test categories, based on both speed and criticality. I
could imaging having some very long running tests, using TAP or
pg_regress,  and optionally enabling them in a buildfarm critter.

I'm not at all opposed to having TAP tests - I have just put in quite a
bit of time extending and improving buildfarm coverage of them. And I'm
probably about to do some more work to enable similar functionality for
MSVC.


> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.  We seem to have
> fixed that for manual application of "make check-world", at least
> if you know the right magic incantations to parallelize it; but
> AFAIK the buildfarm script is entirely serial.


Yeah, it is, and changing that won't be simple. Say 3 steps run in
parallel fail? What do we do? I guess we can just pick the first (by
some definition) and discard the others. Parallelizing individual steps
might be simpler, I'd have to take a look.

Honestly, I'm not that concerned about how long it takes the buildfarm
client per se to run, although I am working on facilities for it where
timing will matter to some people.

And anyone who is concerned about how long it takes on their machine can
run without --enable-tap-tests.

Not sure what the magic incantation is other than "make -j nn"?

One reason I haven't supported "make -j nn" everywhere (it is supported
for making core, contrib and test_modules) is that the output tends to
get rather jumbled, and I didn't want to impose that extra burden in
people trying to decipher the results.


cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] TAP tests take a long time

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> The other thing that might be useful here is to push on parallelizing
>> buildfarm runs.

> One reason I haven't supported "make -j nn" everywhere (it is supported
> for making core, contrib and test_modules) is that the output tends to
> get rather jumbled, and I didn't want to impose that extra burden in
> people trying to decipher the results.

Agreed, that would be a mess.  I was thinking in terms of running steps in
parallel if they have independent output log files, so that that problem
wouldn't arise.  AFAIK, for example, we could run the per-subdirectory
"make check" tests in src/bin/ in parallel without incurring any
legibility issues.
        regards, tom lane



Re: [HACKERS] TAP tests take a long time

От
Andres Freund
Дата:
On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> >> The other thing that might be useful here is to push on parallelizing
> >> buildfarm runs.
> 
> > One reason I haven't supported "make -j nn" everywhere (it is supported
> > for making core, contrib and test_modules) is that the output tends to
> > get rather jumbled, and I didn't want to impose that extra burden in
> > people trying to decipher the results.
> 
> Agreed, that would be a mess.

Doesn't make's -Otarget largely solve this?

Andres



Re: [HACKERS] TAP tests take a long time

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
>> Agreed, that would be a mess.

> Doesn't make's -Otarget largely solve this?

Given a sufficiently new make (which would include exactly zero of
my own buildfarm critters), that would help.  I think it does nothing
much for "prove -j" though, and certainly not for MSVC.
        regards, tom lane



Re: [HACKERS] TAP tests take a long time

От
Andrew Dunstan
Дата:

On 04/11/2017 03:58 PM, Andres Freund wrote:
> On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
>> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>>>> The other thing that might be useful here is to push on parallelizing
>>>> buildfarm runs.
>>> One reason I haven't supported "make -j nn" everywhere (it is supported
>>> for making core, contrib and test_modules) is that the output tends to
>>> get rather jumbled, and I didn't want to impose that extra burden in
>>> people trying to decipher the results.
>> Agreed, that would be a mess.
> Doesn't make's -Otarget largely solve this?
>


Possibly, but I have just gone to a not inconsiderable amount of trouble
to disaggregate the TAP tests so we get not only the output but all the
logs for each step separately. Say there's a problem in pg_rewind.
Compare this

<https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2017-04-11%2017%3A54%3A34&stg=pg_rewind-installcheck>
with having to pick apart this:
<https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2017-04-08%2010%3A37%3A38&stg=bin-check>
So I don't really want the output from these run in parallel to be run
all together, even if they aren't jumbled.

Now it might make sense for other pieces to be run like that. I will
have a look. But the ones that are taking the most time are ones that
seem least suitable.

Afterthought: It might be possible to make it a runtime option.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] TAP tests take a long time

От
Craig Ringer
Дата:


On 12 Apr. 2017 03:14, "Andrew Dunstan" <andrew.dunstan@2ndquadrant.com> wrote:


On 04/11/2017 12:08 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.



Well, you can get a lot of information on timings in crake's latest
builds for example, or nightjar's.

Here's an interesting fact: almost all the time taken up in the scripts
test set seems to be consumed in running initdb over and over.


Is it worth adding an init(cache => 1) option to PostgresNode, where we stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ? 

Even default to caching and allow an option to disable the cached copy.

We're going to need to initdb a lot in the TAP tests. We often need a clean state. Tests also get harder to debug the more you pack into a single test run and more difficult to run individually to test some specific failure. So IMO the best thing is to try to make that repeat initdb as fast as possible.

It reduces our coverage of initdb only incredibly slightly - all that repeat runs will do is help find very uncommon intermittent failures. And we rerun the buildfarm all the time so it's not like there's a shortage of initdb runs anyway.

We should also have a debug --no-fsync option for initdb, or maybe allow it to take -o options to pass to child postgres so we can pass fsync=off .

Re: [HACKERS] TAP tests take a long time

От
Michael Paquier
Дата:
On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:
> Well, you can get a lot of information on timings in crake's latest
> builds for example, or nightjar's.
>
> Here's an interesting fact: almost all the time taken up in the scripts
> test set seems to be consumed in running initdb over and over.
>
> Is it worth adding an init(cache => 1) option to PostgresNode, where we
> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?

This looks like a good idea to me, but I don't like much the idea of
an option in the init() routine as that's hard to maintain. It would
make sense to me to be able to override the initialization with an
environment variable instead, or just make it the default and get the
base image stored in tmp_install/. Base backups are out of scope
though.

> It reduces our coverage of initdb only incredibly slightly - all that repeat
> runs will do is help find very uncommon intermittent failures. And we rerun
> the buildfarm all the time so it's not like there's a shortage of initdb
> runs anyway.

initdb is always run with the same set of options when init() is
called, so this is not something to worry about. And tests inherent to
initdb should happen in src/bin/initdb.

> We should also have a debug --no-fsync option for initdb, or maybe allow it
> to take -o options to pass to child postgres so we can pass fsync=off .

That's an idea as well...
-- 
Michael



Re: [HACKERS] TAP tests take a long time

От
Craig Ringer
Дата:
On 12 April 2017 at 08:22, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
> <craig.ringer@2ndquadrant.com> wrote:
>>
>> We should also have a debug --no-fsync option for initdb, or maybe allow it
>> to take -o options to pass to child postgres so we can pass fsync=off .
>
> That's an idea as well...

OK, looking at this, initdb already runs postgres -F then does its own
fsync()s at the end.

It already has a -N / --no-sync option too. Which is already used by
PostgresNode.pm . So as much as can be done has already been done
here.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training
&Services
 



Re: [HACKERS] TAP tests take a long time

От
Andrew Dunstan
Дата:

On 04/11/2017 08:12 PM, Craig Ringer wrote:
>
>
> On 12 Apr. 2017 03:14, "Andrew Dunstan"
> <andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>> wrote:
>
>
>
>     On 04/11/2017 12:08 PM, Tom Lane wrote:
>     > Robert Haas <robertmhaas@gmail.com
>     <mailto:robertmhaas@gmail.com>> writes:
>     >> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>     >> <andrew.dunstan@2ndquadrant.com
>     <mailto:andrew.dunstan@2ndquadrant.com>> wrote:
>     >>> This buildfarm run as you can see takes 33m32s, and the Tap
>     tests take a
>     >>> combined 19m52s of that time.
>     >> I don't think it's quite fair to complain about the TAP tests
>     taking a
>     >> long time to run as a general matter.  Many people here have long
>     >> wanted a separate test-suite for long running tests that can be
>     run to
>     >> really check everything possible, and apparently, TAP tests are it.
>     >> What I think would be more useful is to break down where the
>     time is
>     >> getting spent.  It may be that some of those tests are not adding
>     >> value proportionate to their runtime.
>
>
>
>     Well, you can get a lot of information on timings in crake's latest
>     builds for example, or nightjar's.
>
>     Here's an interesting fact: almost all the time taken up in the
>     scripts
>     test set seems to be consumed in running initdb over and over.
>
>
>
> Is it worth adding an init(cache => 1) option to PostgresNode, where
> we stash an initdb'd db in tmp_check/  and just do a simple fs copy of
> it ? 
>
> Even default to caching and allow an option to disable the cached copy.
>
> We're going to need to initdb a lot in the TAP tests. We often need a
> clean state. Tests also get harder to debug the more you pack into a
> single test run and more difficult to run individually to test some
> specific failure. So IMO the best thing is to try to make that repeat
> initdb as fast as possible.
>
> It reduces our coverage of initdb only incredibly slightly - all that
> repeat runs will do is help find very uncommon intermittent failures.
> And we rerun the buildfarm all the time so it's not like there's a
> shortage of initdb runs anyway.
>
> We should also have a debug --no-fsync option for initdb, or maybe
> allow it to take -o options to pass to child postgres so we can pass
> fsync=off .


Absolutely worth it I should say.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] TAP tests take a long time

От
Andres Freund
Дата:
On 2017-04-12 10:39:22 +0800, Craig Ringer wrote:
> On 12 April 2017 at 08:22, Michael Paquier <michael.paquier@gmail.com> wrote:
> > On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
> > <craig.ringer@2ndquadrant.com> wrote:
> >>
> >> We should also have a debug --no-fsync option for initdb, or maybe allow it
> >> to take -o options to pass to child postgres so we can pass fsync=off .
> >
> > That's an idea as well...
> 
> OK, looking at this, initdb already runs postgres -F then does its own
> fsync()s at the end.
> 
> It already has a -N / --no-sync option too. Which is already used by
> PostgresNode.pm . So as much as can be done has already been done
> here.

The time in initdb is largely spent in regprocin and server start/stop,
so that doesn't surprise me.

- Andres



Re: [HACKERS] TAP tests take a long time

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> The time in initdb is largely spent in regprocin and server start/stop,
> so that doesn't surprise me.

Yeah.  We had a plan for removing the regprocin overhead via preprocessing
of the .bki file, but I've not heard anything about that project for
months :-(
        regards, tom lane



Re: [HACKERS] TAP tests take a long time

От
Craig Ringer
Дата:
On 12 April 2017 at 08:22, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
> <craig.ringer@2ndquadrant.com> wrote:
>> Well, you can get a lot of information on timings in crake's latest
>> builds for example, or nightjar's.
>>
>> Here's an interesting fact: almost all the time taken up in the scripts
>> test set seems to be consumed in running initdb over and over.
>>
>> Is it worth adding an init(cache => 1) option to PostgresNode, where we
>> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?
>
> This looks like a good idea to me, but I don't like much the idea of
> an option in the init() routine as that's hard to maintain.

How so?

> It would
> make sense to me to be able to override the initialization with an
> environment variable instead

Yeah, that's reasonable.

Patch attached. It supports setting CACHE_INITDB=0 to disable the cache.

With cache: 3m55.063s (user 7.7s, sys 0m11.833s)

Without cache:  4m11.229s (user 0m16.963s, sys 0m12.384s)

so in a serial run it doesn't make a ton of difference.

Adding some timing on initdb runs shows that initdb takes about 1s, a
cached copy about 0.1s with our Perl based recursive copy code. So
it's hardly an overwhelming benefit, but might be more beneficial with
prove -j .


BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
collect more data from the buildfarm on what takes a while. Sample
output:

[13:20:47] t/001_stream_rep.pl .................. ok    50445 ms (
0.01 usr  0.00 sys +  2.16 cusr  3.57 csys =  5.74 CPU)
[13:21:38] t/002_archiving.pl ................... ok     5883 ms (
0.01 usr  0.00 sys +  0.19 cusr  0.45 csys =  0.65 CPU)
[13:21:44] t/003_recovery_targets.pl ............ ok    26197 ms (
0.00 usr  0.00 sys +  0.70 cusr  1.75 csys =  2.45 CPU)
[13:22:10] t/004_timeline_switch.pl ............. ok    10049 ms (
0.01 usr  0.00 sys +  0.24 cusr  0.53 csys =  0.78 CPU)
[13:22:20] t/005_replay_delay.pl ................ ok    10224 ms (
0.00 usr  0.01 sys +  0.18 cusr  0.43 csys =  0.62 CPU)
[13:22:30] t/006_logical_decoding.pl ............ ok     7570 ms (
0.00 usr  0.00 sys +  0.25 cusr  0.32 csys =  0.57 CPU)
[13:22:38] t/007_sync_rep.pl .................... ok    20693 ms (
0.00 usr  0.00 sys +  0.51 cusr  1.24 csys =  1.75 CPU)
[13:22:58] t/008_fsm_truncation.pl .............. ok    25399 ms (
0.01 usr  0.00 sys +  0.26 cusr  0.52 csys =  0.79 CPU)
[13:23:24] t/009_twophase.pl .................... ok    58531 ms (
0.01 usr  0.00 sys +  0.49 cusr  0.92 csys =  1.42 CPU)
[13:24:22] t/010_logical_decoding_timelines.pl .. ok     8135 ms (
0.00 usr  0.01 sys +  0.34 cusr  0.76 csys =  1.11 CPU)
[13:24:30] t/011_crash_recovery.pl .............. ok     4334 ms (
0.00 usr  0.00 sys +  0.12 cusr  0.16 csys =  0.28 CPU)


-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] TAP tests take a long time

От
Craig Ringer
Дата:
On 12 April 2017 at 13:27, Craig Ringer <craig.ringer@2ndquadrant.com> wrote:
> On 12 April 2017 at 08:22, Michael Paquier <michael.paquier@gmail.com> wrote:
>> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
>> <craig.ringer@2ndquadrant.com> wrote:
>>> Well, you can get a lot of information on timings in crake's latest
>>> builds for example, or nightjar's.
>>>
>>> Here's an interesting fact: almost all the time taken up in the scripts
>>> test set seems to be consumed in running initdb over and over.
>>>
>>> Is it worth adding an init(cache => 1) option to PostgresNode, where we
>>> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?
>>
>> This looks like a good idea to me, but I don't like much the idea of
>> an option in the init() routine as that's hard to maintain.
>
> How so?
>
>> It would
>> make sense to me to be able to override the initialization with an
>> environment variable instead
>
> Yeah, that's reasonable.
>
> Patch attached. It supports setting CACHE_INITDB=0 to disable the cache.
>
> With cache: 3m55.063s (user 7.7s, sys 0m11.833s)
>
> Without cache:  4m11.229s (user 0m16.963s, sys 0m12.384s)
>
> so in a serial run it doesn't make a ton of difference.
>
> Adding some timing on initdb runs shows that initdb takes about 1s, a
> cached copy about 0.1s with our Perl based recursive copy code. So
> it's hardly an overwhelming benefit, but might be more beneficial with
> prove -j .

Of course, it'll need locking for prove -j. Added. Patch attached,
applies on top of prior patch.

With

make PROVE_FLAGS="--timer -j 9" check

I don't see much difference with/without caching initdb results -
saves about 4 seconds, from 74 to 70 seconds, but hard to tell with
the error margins.

So if we're going to do anything, defaulting to parallel prove seems a
good start, and I'm not sure caching initdb results is worth it.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] TAP tests take a long time

От
Tom Lane
Дата:
Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> With
> make PROVE_FLAGS="--timer -j 9" check
> I don't see much difference with/without caching initdb results -
> saves about 4 seconds, from 74 to 70 seconds, but hard to tell with
> the error margins.

> So if we're going to do anything, defaulting to parallel prove seems a
> good start, and I'm not sure caching initdb results is worth it.

Yeah, the results shown in this thread are not very good :-(.  I'm
suspicious that the results are basically I/O-bound in your environment.
There might be some value in the cached-initdb-tree solution for slower,
more CPU-bound machines, but cutting out the regprocin overhead would
probably do just as much good there.
        regards, tom lane



Re: [HACKERS] TAP tests take a long time

От
Andrew Dunstan
Дата:

On 04/12/2017 01:27 AM, Craig Ringer wrote:
>
> BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
> collect more data from the buildfarm on what takes a while. Sample
> output:
>


I'll add that to the commandline the buildfarm uses in the upcoming release.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] TAP tests take a long time

От
Amit Kapila
Дата:
On Tue, Apr 11, 2017 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>> <andrew.dunstan@2ndquadrant.com> wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.
>
> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.  We seem to have
> fixed that for manual application of "make check-world", at least
> if you know the right magic incantations to parallelize it; but
> AFAIK the buildfarm script is entirely serial.
>
> BTW, speaking of "value proportionate to runtime", the hash_index
> regression script is now visibly a bottleneck in the core regression
> tests.  Could we take another look at whether that needs to run
> quite so long?
>

I have looked into the tests and I think we can do some optimization
without losing much on code coverage.  First is we are doing both
Vacuum Full and Vacuum on hash_split_heap in the same test after
executing few statements, it seems to me that we can avoid doing
Vacuum Full.  Also, I think we can try by reducing the number of
inserts in hash_split_heap to see if we can achieve the code coverage
of split bucket code path.  Finally, I am not sure if Reindex Index
hash_split_index is required for code coverage, but we might need it
for the sake of testing that operation.

Mithun, as you are the original author of these tests, can you please
try some of the above optimizations and any others you can think of
and see if we can reduce the time for hash index tests?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] TAP tests take a long time

От
Mithun Cy
Дата:
On Wed, Apr 12, 2017 at 6:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I have looked into the tests and I think we can do some optimization
> without losing much on code coverage.  First is we are doing both
> Vacuum Full and Vacuum on hash_split_heap in the same test after
> executing few statements, it seems to me that we can avoid doing
> Vacuum Full.  Also, I think we can try by reducing the number of
> inserts in hash_split_heap to see if we can achieve the code coverage
> of split bucket code path.  Finally, I am not sure if Reindex Index
> hash_split_index is required for code coverage, but we might need it
> for the sake of testing that operation.
>
> Mithun, as you are the original author of these tests, can you please
> try some of the above optimizations and any others you can think of
> and see if we can reduce the time for hash index tests?

I have tried to optimize the tests maintaining the same coverage we were able to get with it.

Original coverage

Lines:      1466     2341      62.6 %
Functions: 79        101       78.2 %

After test changes

Lines:       1534    2341     65.5 %
Functions: 79         101     78.2 %

changes done
Earlier :

CREATE INDEX hash_split_index on hash_split_heap USING HASH (keycol);        
+ Time: 8.950 ms                                                                
INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 70000) a;
+ Time: 3135.098 ms (00:03.135)                                                
VACUUM FULL hash_split_heap;                                                  
+ Time: 2748.146 ms (00:02.748)
REINDEX INDEX hash_split_index;                                              
+ Time: 114.290 ms

Insertion and vacuum full were taking most of the time in tests. I have removed the vacuum full as suggested by you, which is taken care by vacuum statement at the end, And reduced the number of rows inserted to trigger a split, by inserting some records before creating hash index. This way we start with less number of buckets and thus splits will happen much early.

Now :

+ INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 500) a;          
+ Time: 2.946 ms                                                                
CREATE INDEX hash_split_index on hash_split_heap USING HASH (keycol);        
! Time: 8.933 ms                                                                
! INSERT INTO hash_split_heap SELECT 1 FROM generate_series(1, 5000) a;        
! Time: 81.674 ms

Also removed REINDEX INDEX as suggested which was not adding for coverage. With this, I think tests should run significantly faster now.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Вложения

Re: [HACKERS] TAP tests take a long time

От
Tom Lane
Дата:
Mithun Cy <mithun.cy@enterprisedb.com> writes:
> I have tried to optimize the tests maintaining the same coverage we were
> able to get with it.

This patch looks good to me: on my workstation, it reduces the total
runtime of the parallel regression tests from ~20.5 to ~16.5 seconds.
I concur that it doesn't look like it would reduce test coverage
significantly.  It gets rid of a VACUUM FULL and a REINDEX, both of
which should be equivalent to an ordinary index build so far as hash is
concerned, and it trims the volume of data being run through the test.

If no objections, I'll push this soon.
        regards, tom lane



Re: [HACKERS] TAP tests take a long time

От
Andrew Dunstan
Дата:

On 04/12/2017 08:40 AM, Andrew Dunstan wrote:
>
> On 04/12/2017 01:27 AM, Craig Ringer wrote:
>> BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
>> collect more data from the buildfarm on what takes a while. Sample
>> output:
>>
>
> I'll add that to the commandline the buildfarm uses in the upcoming release.
>


See
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2017-04-12%2013%3A54%3A22>

One thing I noticed was this
   [10:02:40] t/001_basic.pl ......... ok      320 ms   [10:02:59] t/002_pg_dump.pl ....... ok    19441 ms   [10:03:47]
t/010_dump_connstr.pl.. ok    47430 ms
 

Why does the last test set take 47s?

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] TAP tests take a long time

От
Stephen Frost
Дата:
Andrew,

* Andrew Dunstan (andrew.dunstan@2ndquadrant.com) wrote:
> On 04/12/2017 08:40 AM, Andrew Dunstan wrote:
> > On 04/12/2017 01:27 AM, Craig Ringer wrote:
> >> BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
> >> collect more data from the buildfarm on what takes a while. Sample
> >> output:
> >
> > I'll add that to the commandline the buildfarm uses in the upcoming release.
>
> See
> <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2017-04-12%2013%3A54%3A22>
>
> One thing I noticed was this
>
>     [10:02:40] t/001_basic.pl ......... ok      320 ms
>     [10:02:59] t/002_pg_dump.pl ....... ok    19441 ms
>     [10:03:47] t/010_dump_connstr.pl .. ok    47430 ms
>
> Why does the last test set take 47s?

It's doing a bunch of dropdb/createdb stuff, and also creates multiple
clusters.  Part of this is because it's generating random names for
databases to make sure they are handled correctly.  I'd certainly love
to see it take less time but I've not really done very much with it.

Thanks!

Stephen