Обсуждение: Improving test coverage of extensions with pg_dump

Поиск
Список
Период
Сортировка

Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
Hi all,

While investigating the issue that has been committed as ebd092b to
fix FK dependencies in pg_dump for tables in extensions, I developed a
small test case aimed for integration in src/test/modules to ensure
that this does not break again in the future. Note that as the
regression tests of this test module use TAP tests to run a set of
pg_dump commands and check the sanity of FK dependency tracking with
extensions, this has needed a one-line tweak of the target prove_check
to be able to install the contents of the current directory to the
temp install folder before running the tests.

I imagine that it would be nice to integrate this test case to improve
test coverage of pg_dump, extending at the same time TAP support for
extensions, so attached are patches for this purpose.

Those patches are really simple, but then perhaps there are better or
simpler ways than what is attached, so feel free to comment if you
have any ideas.
Regards,
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Tue, Mar 3, 2015 at 2:40 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Those patches are really simple, but then perhaps there are better or
> simpler ways than what is attached, so feel free to comment if you
> have any ideas.

Attached are new patches somewhat based on the comments provided by
Peter Eisentraut
(http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net).
- 0001 makes prove_check install the contents of the path located at
$(CURDIR)/t/extra if present, which would be the path that test
developers could use to store extensions, plugins or modules that
would then be usable by a given set of TAP tests as they are installed
in the temporary installation before launching the tests.
- 0002 is the test for pg_dump checking extensions containing FK
tables, this time integrated as a TAP test in src/bin/pg_dump, with
the extension in src/bin/pg_dump/t/extra.
IMO, this approach is scalable enough thinking long-term. And I think
that the location of those extra extensions is fine in t/ as an
hardcoded subfolder (any fresh idea being of course welcome) as this
makes the stuff in extra/ dedicated to only on set of TAP tests, and
it is even possible to set multiple extensions/modules.
Regards,
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Heikki Linnakangas
Дата:
On 03/07/2015 02:34 PM, Michael Paquier wrote:
> On Tue, Mar 3, 2015 at 2:40 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Those patches are really simple, but then perhaps there are better or
>> simpler ways than what is attached, so feel free to comment if you
>> have any ideas.
>
> Attached are new patches somewhat based on the comments provided by
> Peter Eisentraut
> (http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net).
> - 0001 makes prove_check install the contents of the path located at
> $(CURDIR)/t/extra if present, which would be the path that test
> developers could use to store extensions, plugins or modules that
> would then be usable by a given set of TAP tests as they are installed
> in the temporary installation before launching the tests.
> - 0002 is the test for pg_dump checking extensions containing FK
> tables, this time integrated as a TAP test in src/bin/pg_dump, with
> the extension in src/bin/pg_dump/t/extra.
> IMO, this approach is scalable enough thinking long-term. And I think
> that the location of those extra extensions is fine in t/ as an
> hardcoded subfolder (any fresh idea being of course welcome) as this
> makes the stuff in extra/ dedicated to only on set of TAP tests, and
> it is even possible to set multiple extensions/modules.

Hmm. I think it'd be better to put the tables_fk extension into 
src/test/modules, and the test case under src/test/tables_fk/t/. I'm 
inclined to think of this as a test case for an extension that contains 
a table, which includes testing that pg_dump/restore works, rather than 
as a test of pg_dump itself.

- Heikki



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Wed, Jul 8, 2015 at 1:32 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Hmm. I think it'd be better to put the tables_fk extension into
> src/test/modules, and the test case under src/test/tables_fk/t/. I'm
> inclined to think of this as a test case for an extension that contains a
> table, which includes testing that pg_dump/restore works, rather than as a
> test of pg_dump itself.

The first version of this patch actually did that:
http://www.postgresql.org/message-id/CAB7nPqQrxhy3+wvUmA69KJXiRPpV5qWJi-3cLn3ZJgByqe_BQQ@mail.gmail.com
The reason why it has been changed to this way of doing is that there
were concerns about bloating src/test/modules with many similar tests
in the future, like imagine pg_dump_test_1, pg_dump_test_2.

Attached is an updated version that can be used in src/test/modules as
well. Makefile needed also an extra EXTRA_INSTALL pointing to
src/test/modules/tables_fk. Nothing amazing. I have also reworded
one-two things at the same time while looking at that again.
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Andreas Karlsson
Дата:
I have reviewed this patch and it compiles runs and the new test case 
passes. The code is also clean and the test seems like a useful 
regression test.

What I do not like though is how the path src/test/tables_fk/t/ tells us 
nothing about what features are of PostgreSQL are tested here. For this 
I personally prefer the earlier versions where I think that was clear.

Another though: would it be worthwhile to also add an assertion to check 
if the data really was restored properly or would that just be redundant 
code?

-- 
Andreas



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Jul 30, 2015 at 5:54 AM, Andreas Karlsson <andreas@proxel.se> wrote:
> What I do not like though is how the path src/test/tables_fk/t/ tells us
> nothing about what features are of PostgreSQL are tested here. For this I
> personally prefer the earlier versions where I think that was clear.

+Simple module used to check data dump ordering of pg_dump with tables
+linked with foreign keys.
The README mentions that this is to test pg_dump, perhaps referring to
the TAP tests makes sense as well?

> Another thought: would it be worthwhile to also add an assertion to check if
> the data really was restored properly or would that just be redundant code?

That seems worth doing, hence added something for it. Thanks for the suggestion.

At the same time I have added an entry in .gitignore for the generated
tmp_check/.
Regards,
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Andreas Karlsson
Дата:
On 07/30/2015 04:48 AM, Michael Paquier wrote:
> On Thu, Jul 30, 2015 at 5:54 AM, Andreas Karlsson <andreas@proxel.se> wrote:
>> What I do not like though is how the path src/test/tables_fk/t/ tells us
>> nothing about what features are of PostgreSQL are tested here. For this I
>> personally prefer the earlier versions where I think that was clear.
>
> +Simple module used to check data dump ordering of pg_dump with tables
> +linked with foreign keys.
> The README mentions that this is to test pg_dump, perhaps referring to
> the TAP tests makes sense as well?

The comment is good, but what I personally do not like is that the path 
to the test suite is non-obvious and not self explanatory I would not 
expect src/test/tables_fk/t/ to test pg_dump and extensions. I find it 
to confusing to regard the test suite as testing an extension called 
"tables_fk" since in my mind that is just a necessary tool built to test 
something else.

This is obviously a subjective thing, and I see that for example Heikki 
likes it the way it is. I am fine with setting this as ready for 
committer and and let a committer weigh in on the naming.

>> Another thought: would it be worthwhile to also add an assertion to check if
>> the data really was restored properly or would that just be redundant code?
>
> That seems worth doing, hence added something for it. Thanks for the suggestion.
>
> At the same time I have added an entry in .gitignore for the generated
> tmp_check/.

Nice changes.

-- 
Andreas




Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Fri, Jul 31, 2015 at 6:41 AM, Andreas Karlsson <andreas@proxel.se> wrote:
> The comment is good, but what I personally do not like is that the path to
> the test suite is non-obvious and not self explanatory I would not expect
> src/test/tables_fk/t/ to test pg_dump and extensions. I find it to confusing
> to regard the test suite as testing an extension called "tables_fk" since in
> my mind that is just a necessary tool built to test something else.
>
> This is obviously a subjective thing, and I see that for example Heikki
> likes it the way it is. I am fine with setting this as ready for committer
> and and let a committer weigh in on the naming.

Well, honestly, I would just have it in src/test/modules and call it a
deal. Because it is now the only extension that has such interactions
with perl tests. And because if modules/ gets bloated later on, we
could extend prove_check to install modules from extra paths, just
that it does not seem worth it to do it now for one module, and there
is no guarantee that we'll have that many around. Of course there is
no guarantee either that modules/ is not going to get bloated.

Note as well that I will be fine with any decision taken by the
committer who picks up this, this test case is useful by itself in any
case.
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Jul 30, 2015 at 5:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Note as well that I will be fine with any decision taken by the
> committer who picks up this, this test case is useful by itself in any
> case.

And I just recalled that I actually did no tests for this thing on
Windows. As this uses the TAP facility, I think that it makes most
sense to run it with tapcheck instead of modulescheck in vcregress.pl
because of its dependency with IPC::Run. The compilation with MSVC is
fixed as well.
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Andres Freund
Дата:
On 2015-08-02 19:15:58 -0700, Michael Paquier wrote:
> +psql 'postgres', 'CREATE EXTENSION tables_fk';
> +
> +# Insert some data before running the dump, this is needed to check
> +# consistent data dump of tables with foreign key dependencies
> +psql 'postgres', 'INSERT INTO cc_tab_fkey VALUES (1)';
> +psql 'postgres', 'INSERT INTO bb_tab_fkey VALUES (1)';
> +psql 'postgres', 'INSERT INTO aa_tab_fkey VALUES (1)';
> +
> +# Create a table depending on a FK defined in the extension
> +psql 'postgres', 'CREATE TABLE dd_tab_fkey (id int REFERENCES bb_tab_fkey(id))';
> +psql 'postgres', 'INSERT INTO dd_tab_fkey VALUES (1)';
> +
> +# Take a dump then re-deploy it to a new database
> +command_ok(['pg_dump', '-d', 'postgres', '-f', "$tempdir/dump.sql"],
> +           'taken dump with installed extension');
> +command_ok(['createdb', 'foobar1'], 'created new database to redeploy dump');
> +command_ok(['psql', '--set=ON_ERROR_STOP=on', '-d', 'foobar1', '-f',
> +            "$tempdir/dump.sql"], 'dump restored');
> +
> +# And check its content
> +my $result = psql 'foobar1',
> +    q{SELECT id FROM aa_tab_fkey UNION ALL
> +SELECT id FROM bb_tab_fkey UNION ALL
> +SELECT id FROM cc_tab_fkey UNION ALL
> +SELECT id FROM dd_tab_fkey};
> +is($result, qq(1
> +1
> +1
> +1),
> +    'consistency of data restored');
> diff --git a/src/test/modules/tables_fk/tables_fk--1.0.sql b/src/test/modules/tables_fk/tables_fk--1.0.sql
> new file mode 100644
> index 0000000..e424610
> --- /dev/null
> +++ b/src/test/modules/tables_fk/tables_fk--1.0.sql
> @@ -0,0 +1,20 @@
> +/* src/test/modules/tables_fk/tables_fk--1.0.sql */
> +
> +-- complain if script is sourced in psql, rather than via CREATE EXTENSION
> +\echo Use "CREATE EXTENSION tables_fk" to load this file. \quit
> +
> +CREATE TABLE IF NOT EXISTS cc_tab_fkey (
> +    id int PRIMARY KEY
> +);
> +
> +CREATE TABLE IF NOT EXISTS bb_tab_fkey (
> +    id int PRIMARY KEY REFERENCES cc_tab_fkey(id)
> +);
> +
> +CREATE TABLE IF NOT EXISTS aa_tab_fkey (
> +    id int REFERENCES bb_tab_fkey(id)
> +);
> +
> +SELECT pg_catalog.pg_extension_config_dump('aa_tab_fkey', '');
> +SELECT pg_catalog.pg_extension_config_dump('bb_tab_fkey', '');
> +SELECT pg_catalog.pg_extension_config_dump('cc_tab_fkey', '');
> diff --git a/src/test/modules/tables_fk/tables_fk.control b/src/test/modules/tables_fk/tables_fk.control

Isn't a full test with a separate initdb, create extension etc. a really
heavyhanded way to test this? I mean that's a test where the setup takes
up to 10s, whereas the actual runtime is in the millisecond range?

Adding tests in this manner doesn't seem scalable to me.

I think we should rather add *one* test that does dump/restore over the
normal regression test database. Something similar to the pg_upgrade
tests. And then work at adding more content to the regression test
database - potentially sourcing from src/test/modules.

Greetings,

Andres Freund



Re: Improving test coverage of extensions with pg_dump

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> Isn't a full test with a separate initdb, create extension etc. a really
> heavyhanded way to test this? I mean that's a test where the setup takes
> up to 10s, whereas the actual runtime is in the millisecond range?

I spent some time looking over this patch yesterday, and that was one
concern I had too.  I was thinking that if we turn this into a single
module that can contain several extensions, or several pg_dump-related
tests, and then test multiple features without requiring an initdb for
each, it would be more palatable.

> Adding tests in this manner doesn't seem scalable to me.

I was thinking in having this be renamed src/test/modules/extensions/
and then the extension contained here would be renamed ext001_fk_tables
or something like that; later we could ext002_something for testing some
other angle of extensions, not necessarily pg_dump related.

> I think we should rather add *one* test that does dump/restore over the
> normal regression test database. Something similar to the pg_upgrade
> tests. And then work at adding more content to the regression test
> database - potentially sourcing from src/test/modules.

That's another option, but we've had this idea for many years now and it
hasn't materialized.  As I recall, Andrew Dunstan has a module that
tests cross-version pg_upgrade and one thing he does is dump both and
compare; the problem is that there are differences, so he keeps a count
of how many lines he expect to differ between any two releases.  Or
something like that.  While it's a good enough start, I don't think it's
robust enough to be in core.  How would we do it?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Improving test coverage of extensions with pg_dump

От
Andres Freund
Дата:
On 2015-09-02 14:30:33 -0300, Alvaro Herrera wrote:
> I was thinking in having this be renamed src/test/modules/extensions/
> and then the extension contained here would be renamed ext001_fk_tables
> or something like that; later we could ext002_something for testing some
> other angle of extensions, not necessarily pg_dump related.

The largest dataset we have for this is the current regression test
database, it seems a waste not to include it...

> That's another option, but we've had this idea for many years now and it
> hasn't materialized.

But that's just minimally more complex than what's currently done in
that test and in the pg_upgrade test?

> As I recall, Andrew Dunstan has a module that
> tests cross-version pg_upgrade and one thing he does is dump both and
> compare; the problem is that there are differences, so he keeps a count
> of how many lines he expect to differ between any two releases.

I'm not suggesting to do anything cross-release - that'd indeed be
another ballpark.

Just that instead of a pg_dump test that tests some individual things we
have one that tests the whole regression test output and then does a
diff?

Greetings,

Andres Freund



Re: Improving test coverage of extensions with pg_dump

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> > As I recall, Andrew Dunstan has a module that
> > tests cross-version pg_upgrade and one thing he does is dump both and
> > compare; the problem is that there are differences, so he keeps a count
> > of how many lines he expect to differ between any two releases.
> 
> I'm not suggesting to do anything cross-release - that'd indeed be
> another ballpark.

Ah, you're right.

> Just that instead of a pg_dump test that tests some individual things we
> have one that tests the whole regression test output and then does a
> diff?

Sorry if I'm slow -- A diff against what?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 2:30 AM, Alvaro Herrera wrote:
> Andres Freund wrote:
>> Isn't a full test with a separate initdb, create extension etc. a really
>> heavyhanded way to test this? I mean that's a test where the setup takes
>> up to 10s, whereas the actual runtime is in the millisecond range?
>
> I spent some time looking over this patch yesterday, and that was one
> concern I had too.  I was thinking that if we turn this into a single
> module that can contain several extensions, or several pg_dump-related
> tests, and then test multiple features without requiring an initdb for
> each, it would be more palatable.

Yeah sure. That's a statement you could generalize for all the TAP
tests, per se for example pg_rewind tests.

>> Adding tests in this manner doesn't seem scalable to me.
>
> I was thinking in having this be renamed src/test/modules/extensions/
> and then the extension contained here would be renamed ext001_fk_tables
> or something like that; later we could ext002_something for testing some
> other angle of extensions, not necessarily pg_dump related.

So, if I am following here correctly, what you are suggesting is:
1) create src/test/modules/extensions/, and include fk_tables in it for now
2) offer the possibility to run make check in this path, doing
complicated tests on a single instance, pg_dump on an instance is one
of them.
Note that for now the TAP test machinery does not allow to share a
single cluster instance across multiple pl scripts, and it seems to me
that's what you are meaning here (in my opinion that's out of sight of
this patch, and I don't think either we should do it this will make
error handling a nightmare). And we can now have a single TAP script
doing all the tests on a single instance. So which one are you
suggesting?

The patch proposed here is actually doing the second, initializing an
instance and performing pg_dump on it. If we change it your way we
should just rename the test script to something like
001_test_extensions.pl and mention in it that new tests for extensions
should be included in it. Still I would imagine that each extension
are actually going to need slightly differ test patterns to work or
cover what they are intended to cover. See for example the set of
tests added in the CREATE EXTENSION CASCADE patch: those are not
adapted to run with the TAP machinery. I may be of course wrong, we
may have other extensions in the future using that, still I am seeing
none around for now.

If you are worrying about run time, we could as well have make check
add an EXTRA_INSTALL pointing to src/test/modules/extensions/, with an
additional test, say extensions.sql that creates data based on it.
Then we let pg_upgrade do the dump/restore checks. This way the run
time of make check-world is minimally impacted, and the code paths we
want tested are done by the buildfarm. That's ugly, still it won't
impact performance.

>> I think we should rather add *one* test that does dump/restore over the
>> normal regression test database. Something similar to the pg_upgrade
>> tests. And then work at adding more content to the regression test
>> database - potentially sourcing from src/test/modules.
>
> That's another option, but we've had this idea for many years now and it
> hasn't materialized.  As I recall, Andrew Dunstan has a module that
> tests cross-version pg_upgrade and one thing he does is dump both and
> compare; the problem is that there are differences, so he keeps a count
> of how many lines he expect to differ between any two releases.  Or
> something like that.  While it's a good enough start, I don't think it's
> robust enough to be in core.  How would we do it?

That does not seem plausible to me for an integration in core, you
would need to keep switching between branches in a given git tree if
the test is being done within a git repository, and that would be just
not doable from a raw tarball which just stores one given version.
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 12:38 AM, Andres Freund wrote:
> Isn't a full test with a separate initdb, create extension etc. a really
> heavyhanded way to test this? I mean that's a test where the setup takes
> up to 10s, whereas the actual runtime is in the millisecond range?
>
> Adding tests in this manner doesn't seem scalable to me.

How to include such kind of tests is in the heart of the discussion
since this patch has showed up. I think we are now close to 5
different opinions with 4 different hackers on the matter, the Riggs'
theorem applies nicely.

> I think we should rather add *one* test that does dump/restore over the
> normal regression test database. Something similar to the pg_upgrade
> tests. And then work at adding more content to the regression test
> database - potentially sourcing from src/test/modules.

If you are worrying about run time, pg_upgrade already exactly does
that. What would be the point to double the amount of time to do the
same thing in two different places?
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 10:35 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Sep 3, 2015 at 12:38 AM, Andres Freund wrote:
>> Isn't a full test with a separate initdb, create extension etc. a really
>> heavyhanded way to test this? I mean that's a test where the setup takes
>> up to 10s, whereas the actual runtime is in the millisecond range?
>>
>> Adding tests in this manner doesn't seem scalable to me.
>
> How to include such kind of tests is in the heart of the discussion
> since this patch has showed up. I think we are now close to 5
> different opinions with 4 different hackers on the matter, the Riggs'
> theorem applies nicely.
>
>> I think we should rather add *one* test that does dump/restore over the
>> normal regression test database. Something similar to the pg_upgrade
>> tests. And then work at adding more content to the regression test
>> database - potentially sourcing from src/test/modules.
>
> If you are worrying about run time, pg_upgrade already exactly does
> that. What would be the point to double the amount of time to do the
> same thing in two different places?

So, to summarize the state of this patch whose status is now "Waiting
on Author", we have the following possibilities:
1) Keep the test as-is, as an independent test of src/test/modules.
2) Integrate it in the test suite of src/test/regress and let
pg_upgrade make the work with dump/restore.
3) Create a new facility as src/test/modules/extensions that does a
run of the regression test suite with a set of extensions, and then a
dump/restore. For now the only extension added in the installation is
tables_fk. Future ones would be added here as well, though it is not
clear to me what they are and if we'd have some (there may be... Or
not).
4) Include it as a test of pg_dump in src/bin/pg_dump/t, with the
extension located in for example t/extensions and extend prove_check
to install as well any extensions in t/extensions.

I would still go for 1), the infrastructures included by the other
proposals may become useful depending on the types of tests that are
added in the future, but it is still unclear what those tests are, and
they may even need something else than what listed here (see that as
an example http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net)
to run properly.
Regards,
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Andres Freund
Дата:
On 2015-09-07 22:55:50 +0900, Michael Paquier wrote:
> So, to summarize the state of this patch whose status is now "Waiting
> on Author", we have the following possibilities:
> 1) Keep the test as-is, as an independent test of src/test/modules.

I find that a bad option. A test that takes this long and has that much
setup for such a marginal amount of coverage just doesn't seem worth it.

> 2) Integrate it in the test suite of src/test/regress and let
> pg_upgrade make the work with dump/restore.

2b) ... and create a single src/test/modules/pg_dumprestore test that
initdbs, runs installcheck, dumps, loads and compares a repated dump.

I think 2b) is by far the best choice.

> I would still go for 1), the infrastructures included by the other
> proposals may become useful depending on the types of tests that are
> added in the future, but it is still unclear what those tests are, and
> they may even need something else than what listed here (see that as
> an example http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net)
> to run properly.

By that argument we're going to add more and more isolated tests and
never merge anything.

I fail to see a single benefit of this approach.

Greetings,

Andres Freund



Re: Improving test coverage of extensions with pg_dump

От
Bruce Momjian
Дата:
On Wed, Sep  2, 2015 at 02:30:33PM -0300, Alvaro Herrera wrote:
> > I think we should rather add *one* test that does dump/restore over the
> > normal regression test database. Something similar to the pg_upgrade
> > tests. And then work at adding more content to the regression test
> > database - potentially sourcing from src/test/modules.
> 
> That's another option, but we've had this idea for many years now and it
> hasn't materialized.  As I recall, Andrew Dunstan has a module that
> tests cross-version pg_upgrade and one thing he does is dump both and
> compare; the problem is that there are differences, so he keeps a count
> of how many lines he expect to differ between any two releases.  Or
> something like that.  While it's a good enough start, I don't think it's
> robust enough to be in core.  How would we do it?

I have shell scripts that test pg_dump restore/upgrade of every
supported PG version.  I also have expected pg_dump output files for
every major version.  This is explained in src/bin/pg_upgrade/TESTING.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Wed, Sep 9, 2015 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-09-07 22:55:50 +0900, Michael Paquier wrote:
>> So, to summarize the state of this patch whose status is now "Waiting
>> on Author", we have the following possibilities:
>> 1) Keep the test as-is, as an independent test of src/test/modules.
>
> I find that a bad option. A test that takes this long and has that much
> setup for such a marginal amount of coverage just doesn't seem worth it.
>
>> 2) Integrate it in the test suite of src/test/regress and let
>> pg_upgrade make the work with dump/restore.
>
> 2b) ... and create a single src/test/modules/pg_dumprestore test that
> initdbs, runs installcheck, dumps, loads and compares a repated dump.
>
> I think 2b) is by far the best choice.

And I guess that the extensions tested should be located directly in
this path, or we would need again to tweak the MSVC scripts...
Attached is an attempt to solve things by converting the previous
patch as proposed by Andres. A source and a target databases are used
on a cluster, and their respective dumps are compared with
File::Compare (available down to perl 5.8.8) at the end.
Regards,
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Andres Freund
Дата:
On 2015-09-09 10:48:24 +0900, Michael Paquier wrote:
> On Wed, Sep 9, 2015 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2015-09-07 22:55:50 +0900, Michael Paquier wrote:
> >> So, to summarize the state of this patch whose status is now "Waiting
> >> on Author", we have the following possibilities:
> >> 1) Keep the test as-is, as an independent test of src/test/modules.
> >
> > I find that a bad option. A test that takes this long and has that much
> > setup for such a marginal amount of coverage just doesn't seem worth it.
> >
> >> 2) Integrate it in the test suite of src/test/regress and let
> >> pg_upgrade make the work with dump/restore.
> >
> > 2b) ... and create a single src/test/modules/pg_dumprestore test that
> > initdbs, runs installcheck, dumps, loads and compares a repated dump.
> >
> > I think 2b) is by far the best choice.
> 
> And I guess that the extensions tested should be located directly in
> this path, or we would need again to tweak the MSVC scripts...
> Attached is an attempt to solve things by converting the previous
> patch as proposed by Andres. A source and a target databases are used
> on a cluster, and their respective dumps are compared with
> File::Compare (available down to perl 5.8.8) at the end.

Unless I miss something this isn't 2b) as it does *not* actually run
the actual installcheck. Therefore the actual pg_dump/restore coverage
is neglegible.

Greetings,

Andres Freund



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Wed, Sep 16, 2015 at 3:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-09-09 10:48:24 +0900, Michael Paquier wrote:
>> On Wed, Sep 9, 2015 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2015-09-07 22:55:50 +0900, Michael Paquier wrote:
>> >> So, to summarize the state of this patch whose status is now "Waiting
>> >> on Author", we have the following possibilities:
>> >> 1) Keep the test as-is, as an independent test of src/test/modules.
>> >
>> > I find that a bad option. A test that takes this long and has that much
>> > setup for such a marginal amount of coverage just doesn't seem worth it.
>> >
>> >> 2) Integrate it in the test suite of src/test/regress and let
>> >> pg_upgrade make the work with dump/restore.
>> >
>> > 2b) ... and create a single src/test/modules/pg_dumprestore test that
>> > initdbs, runs installcheck, dumps, loads and compares a repated dump.
>> >
>> > I think 2b) is by far the best choice.
>>
>> And I guess that the extensions tested should be located directly in
>> this path, or we would need again to tweak the MSVC scripts...
>> Attached is an attempt to solve things by converting the previous
>> patch as proposed by Andres. A source and a target databases are used
>> on a cluster, and their respective dumps are compared with
>> File::Compare (available down to perl 5.8.8) at the end.
>
> Unless I miss something this isn't 2b) as it does *not* actually run
> the actual installcheck. Therefore the actual pg_dump/restore coverage
> is neglegible.

Hm. OK. I didn't get your message correctly, sorry for that. Would you
be fine then to have a pg_regress command using parallel_schedule + an
extra schedule launching tests related to the extensions in
src/test/modules/pg_dumprestore then? The choice of parallel_schedule
is based on what Windows does, aka this schedule is used in the
equivalent of make check in vcregress.pl. The TAP script then simply
initializes the cluster, runs pg_regress, and does the dump/restore
job. There is no real need to worry about setval as dump is not taken
from a standby..
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Wed, Sep 16, 2015 at 8:00 PM, Michael Paquier wrote:
> Hm. OK. I didn't get your message correctly, sorry for that. Would you
> be fine then to have a pg_regress command using parallel_schedule + an
> extra schedule launching tests related to the extensions in
> src/test/modules/pg_dumprestore then? The choice of parallel_schedule
> is based on what Windows does, aka this schedule is used in the
> equivalent of make check in vcregress.pl. The TAP script then simply
> initializes the cluster, runs pg_regress, and does the dump/restore
> job. There is no real need to worry about setval as dump is not taken
> from a standby..

So, here we go.

I have found something quite interesting when playing with the patch
attached: dump does not guarantee the column ordering across databases
for some inherited tables, see that example from the main regression
test suite the following diff between a dump taken from a source
database and a target database where the source dump has been restored
in first:
-INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 3, 'mumble', NULL);
-INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 4, NULL, NULL);
-INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, 'bumble', NULL);
-INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, NULL, NULL);
+INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 3, NULL, 'mumble');
+INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 4, NULL, NULL);
+INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, 'bumble');
+INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, NULL);

Problem is similar with --column-inserts, --inserts and COPY. We could
use --exclude-table like in the patch attached when taking the dump
from source database but that's grotty, or we could improve pg_dump
itself, though it may not be worth it for just this purpose.
Thoughts?
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Andres Freund
Дата:
On 2015-09-16 22:18:55 -0700, Michael Paquier wrote:
> So, here we go.

Cool.

> I have found something quite interesting when playing with the patch
> attached: dump does not guarantee the column ordering across databases
> for some inherited tables, see that example from the main regression
> test suite the following diff between a dump taken from a source
> database and a target database where the source dump has been restored
> in first:
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 3, 'mumble', NULL);
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 4, NULL, NULL);
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, 'bumble', NULL);
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, NULL, NULL);
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 3, NULL, 'mumble');
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 4, NULL, NULL);
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, 'bumble');
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, NULL);
> 
> Problem is similar with --column-inserts, --inserts and COPY. We could
> use --exclude-table like in the patch attached when taking the dump
> from source database but that's grotty, or we could improve pg_dump
> itself, though it may not be worth it for just this purpose.
> Thoughts?

Hm. To me that sounds like a moderately serious bug worth fixing. I mean
if you have a COPY command that worked before a dump/restore but that's
wrong afterwards, it's pretty ugly, no?

Greetings,

Andres Freund



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Sep 17, 2015 at 7:24 AM, Andres Freund wrote:
> On 2015-09-16 22:18:55 -0700, Michael Paquier wrote:
>> Problem is similar with --column-inserts, --inserts and COPY. We could
>> use --exclude-table like in the patch attached when taking the dump
>> from source database but that's grotty, or we could improve pg_dump
>> itself, though it may not be worth it for just this purpose.
>> Thoughts?
>
> Hm. To me that sounds like a moderately serious bug worth fixing. I mean
> if you have a COPY command that worked before a dump/restore but that's
> wrong afterwards, it's pretty ugly, no?

COPY or INSERT include the column list in dumps, so that's not really
an issue here, what is potentially a problem (looking at that freshly)
is --inserts with data-only dumps though. So yes we had better fix it
and perhaps consider a backpatch...
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Thu, Sep 17, 2015 at 9:47 AM, Michael Paquier wrote:
> COPY or INSERT include the column list in dumps, so that's not really
> an issue here, what is potentially a problem (looking at that freshly)
> is --inserts with data-only dumps though. So yes we had better fix it
> and perhaps consider a backpatch...

When adding a column to a parent table, attnum gets confused on the
child table... Take this example:
=# create table aa (a int, b int);
CREATE TABLE
=# create table bb (c int) inherits (aa);
CREATE TABLE
=# alter table aa add column d text;
ALTER TABLE
=# select relname, attname, attnum from pg_attribute join pg_class on
(attrelid = oid) where attrelid in( 'bb'::regclass, 'aa'::regclass)
and attnum > 0;relname | attname | attnum
---------+---------+--------aa      | d       |      3aa      | b       |      2aa      | a       |      1bb      | d
   |      4bb      | c       |      3bb      | b       |      2bb      | a       |      1
 
(7 rows)

When this is dumped and restored on another database the ordering gets
different, c and d are switched for child relation bb here:relname | attname | attnum
---------+---------+--------aa      | d       |      3aa      | b       |      2aa      | a       |      1bb      | c
   |      4bb      | d       |      3bb      | b       |      2bb      | a       |      1
 
(7 rows)

pg_dump relies on attnum to define the column ordering, so one
possibility would be to do things more consistently at backend level.
Thoughts?
-- 
Michael



Re: Improving test coverage of extensions with pg_dump

От
Euler Taveira
Дата:
On 17-09-2015 14:21, Michael Paquier wrote:
> pg_dump relies on attnum to define the column ordering, so one
> possibility would be to do things more consistently at backend level.
> Thoughts?
>
According to your example, problem is the columns from the parent table 
"aa" are added _before_ declaring the inherited table "bb". Then, an 
attnum from column "d" (part of parent table "aa") is assigned to a 
lower number than in the original table "bb".

Someone can say that we could assign an attnum for column "d" 
considering all of the inheritance tree. However, attnum is used as an 
index to arrays (we could bloat some of those) and some logic rely on it 
to count the number of columns. It would become tablecmds.c into an 
spaghetti.

IMHO a possible way to solve it is adding support for logical column 
ordering. An ALTER TABLE command (emitted if a parameter was informed) 
during dump could handle it. BTW, last thread [1] about logical column 
ordering seems to have died a few months ago. Alvaro?


[1] 
http://www.postgresql.org/message-id/20141209174146.GP1768@alvh.no-ip.org


--    Euler Taveira                   Timbira - http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 



Re: Improving test coverage of extensions with pg_dump

От
Alvaro Herrera
Дата:
Euler Taveira wrote:
> On 17-09-2015 14:21, Michael Paquier wrote:
> >pg_dump relies on attnum to define the column ordering, so one
> >possibility would be to do things more consistently at backend level.

We discussed this in some other thread, not long ago.  I looked briefly
in the archives but couldn't find it.  I think the conclusion was
something along the lines of "hmm, tough".

> Someone can say that we could assign an attnum for column "d" considering
> all of the inheritance tree. However, attnum is used as an index to arrays
> (we could bloat some of those) and some logic rely on it to count the number
> of columns. It would become tablecmds.c into an spaghetti.

We don't need any more spaghetti there, thanks!

> IMHO a possible way to solve it is adding support for logical column
> ordering. An ALTER TABLE command (emitted if a parameter was informed)
> during dump could handle it. BTW, last thread [1] about logical column
> ordering seems to have died a few months ago. Alvaro?

Tomas Vondra also worked a bit on this patch, and we eventually gave up
on it due to lack of time.  We might be able to get back on it someday,
but do not hold your breath.  If you want the current bug fixed, do not
wait for logical column numbering.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:


On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Euler Taveira wrote:
> On 17-09-2015 14:21, Michael Paquier wrote:
> >pg_dump relies on attnum to define the column ordering, so one
> >possibility would be to do things more consistently at backend level.

We discussed this in some other thread, not long ago.  I looked briefly
in the archives but couldn't find it.  I think the conclusion was
something along the lines of "hmm, tough".

That's hours, even days of fun ahead for sure.
 
> Someone can say that we could assign an attnum for column "d" considering
> all of the inheritance tree. However, attnum is used as an index to arrays
> (we could bloat some of those) and some logic rely on it to count the number
> of columns. It would become tablecmds.c into an spaghetti.

That was my first impression, but that's just a band-aid. Switching the logic in stable branches to assign a correct attnum for a child table, while switching on the way the attnum referring to the parent entries is a scary idea. I would suspect that this would break many other things for fixing a narrow case, and it is still possible for the user to get away by using COPY or INSERT with the list of columns listed when taking a dump.

We don't need any more spaghetti there, thanks!

Agreed.
 
> IMHO a possible way to solve it is adding support for logical column
> ordering. An ALTER TABLE command (emitted if a parameter was informed)
> during dump could handle it. BTW, last thread [1] about logical column
> ordering seems to have died a few months ago. Alvaro?

Tomas Vondra also worked a bit on this patch, and we eventually gave up
on it due to lack of time.  We might be able to get back on it someday,
but do not hold your breath.  If you want the current bug fixed, do not
wait for logical column numbering.

Honestly I have the feeling that the discussion of this thread gets unproductive, let's not forget that the patch presented on this thread is just aiming at adding one test case to ensure that extensions using dumpable relations with FKs get correctly dumped, which is to ensure that we do not reintroduce a bug that existed in the extension facility since its introduction in 9.1. That being said, the patch is just fine for this purpose, but that's just my opinion.
--
Michael

Re: Improving test coverage of extensions with pg_dump

От
Andres Freund
Дата:
On 2015-09-26 21:54:46 +0900, Michael Paquier wrote:
> On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> > We discussed this in some other thread, not long ago.  I looked briefly
> > in the archives but couldn't find it.  I think the conclusion was
> > something along the lines of "hmm, tough".

> That's hours, even days of fun ahead for sure.

To me that's a somewhat serious bug, and something that we probably need
to address at some point.

> Honestly I have the feeling that the discussion of this thread gets
> unproductive, let's not forget that the patch presented on this thread is
> just aiming at adding one test case to ensure that extensions using
> dumpable relations with FKs get correctly dumped, which is to ensure that
> we do not reintroduce a bug that existed in the extension facility since
> its introduction in 9.1. That being said, the patch is just fine for this
> purpose, but that's just my opinion.

It's an unsustainable test model. Adding own test runners, directories,
initdb etc. for a simple regression test of a couple lines won't hurt
for maybe the first five but after that it starts to get unmaintainable.

Greetings,

Andres Freund



Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Sat, Sep 26, 2015 at 10:22 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-09-26 21:54:46 +0900, Michael Paquier wrote:
>> On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
>> wrote:
>> > We discussed this in some other thread, not long ago.  I looked briefly
>> > in the archives but couldn't find it.  I think the conclusion was
>> > something along the lines of "hmm, tough".
>
>> That's hours, even days of fun ahead for sure.
>
> To me that's a somewhat serious bug, and something that we probably need
> to address at some point.

Well, addressing it at the root of the problem is... Let's say...
Really tough. I have put my head on this stuff for a couple of hours
and tried to draw up a couple of ways to fix this. First, even if
pg_dump relies heavily on the assumption that attributes need to be
ordered by attnum, I thought that it would have been possible to use
attinhcount to order  the columns in the same way when taking the dump
from a source db and a target (where dump of source has been restored
to). This would work if there is no more than 1 level of child
relations, but with grand-child relations the order gets messed up
again.

Locating a fix on the backend side would make things for pg_dump
easier, an idea would be to simply reorder the attribute numbers when
a column is added to a parent table. For example with something like
that:
CREATE TABLE test_parent (c1 integer, c2 integer);
CREATE TABLE test_child_1 (c3 integer) inherits (test_parent);
CREATE TABLE test_child_2 (c4 integer) inherits (test_child_1);
ALTER TABLE test_parent ADD COLUMN c5 integer;
We get the following attribute ordering:
=# SELECT c.relname, a.attname, a.attnum
FROM pg_attribute a
  JOIN pg_class c ON (a.attrelid = c.oid)
WHERE a.attrelid IN ('test_parent'::regclass,
                     'test_child_1'::regclass,
                     'test_child_2'::regclass)
  AND a.attnum > 0 ORDER BY c.relname, a.attnum;
   relname    | attname | attnum
--------------+---------+--------
 test_child_1 | c1      |      1
 test_child_1 | c2      |      2
 test_child_1 | c3      |      3
 test_child_1 | c5      |      4
 test_child_2 | c1      |      1
 test_child_2 | c2      |      2
 test_child_2 | c3      |      3
 test_child_2 | c4      |      4
 test_child_2 | c5      |      5
 test_parent  | c1      |      1
 test_parent  | c2      |      2
 test_parent  | c5      |      3
(12 rows)

Now, what we could do on a child relation when a new attribute in its
parent relation is to increment the attnum of the existing columns
with attinhcount = 0 by 1, and insert the new column at the end of the
existing ones where attinhcount > 0, to give something like that:
   relname    | attname | attnum
--------------+---------+--------
 test_child_1 | c1      |      1
 test_child_1 | c2      |      2
 test_child_1 | c5      |      3
 test_child_1 | c3      |      4
 test_child_2 | c1      |      1
 test_child_2 | c2      |      2
 test_child_2 | c3      |      3
 test_child_2 | c5      |      4
 test_child_2 | c4      |      5
 test_parent  | c1      |      1
 test_parent  | c2      |      2
 test_parent  | c5      |      3
(12 rows)
Looking at tablecmds.c, this could be intuitively done as a part of
ATExecAddColumn by scanning the attributes of the child relation from
the end. But it is of course not that simple, a lot of code paths rely
on the attnum of the current attributes. One is CookedConstraint, but
that's a can of worms for back branches.

Another bandaid solution, that would be less invasive for a backpatch
is to reproduce the sequence of DDL commands within the dump itself:
we would need to dump attinhcount in getTableAttrs and use it to guess
what are the columns on the child relations that have been added on a
parent relation after the children have been created. This would not
solve the case of data-only dumps containing INSERT queries that have
no column names on databases restored from past schema dumps though.

Perhaps you did not look at the last patch I sent on this thread, but
I changed it so as a schedule is used with a call to pg_regress.
That's a more scalable approach as you were concerned about as we can
plug in more easily new modules and new regression tests without
modifying the perl script used to kick the tests, though it does not
run the main regression test suite and it actually cannot run it,
except with a modified schedule or with a filter of the child-parent
tables. Patch is attached for consideration, which looks like a good
base for future support, feel free to disagree :)
Thanks,
--
Michael

Вложения

Re: Improving test coverage of extensions with pg_dump

От
Michael Paquier
Дата:
On Tue, Sep 29, 2015 at 9:39 PM, Michael Paquier wrote:
> Perhaps you did not look at the last patch I sent on this thread, but
> I changed it so as a schedule is used with a call to pg_regress.
> That's a more scalable approach as you were concerned about as we can
> plug in more easily new modules and new regression tests without
> modifying the perl script used to kick the tests, though it does not
> run the main regression test suite and it actually cannot run it,
> except with a modified schedule or with a filter of the child-parent
> tables. Patch is attached for consideration, which looks like a good
> base for future support, feel free to disagree :)

It is a bit sad to say, but this patch aiming at adding a test case
for an uncovered code path of pg_dump has not reached an agreement.
The most simple way to reduce the overhead of this test would be to
include something in the main regression test suite and let pg_dump
call incorporated in pg_upgrade test do the work. This has been
suggested upthread already as one way to do things.
Doing diff comparision of pg_dump/restore on the regression database
data set would have been nice, but the column ordering of inherited
tables when a column is added to a child is a can of worms, so that's
not something this patch should try to address.
So marking it as returned with feedback or moving it to next CF?
-- 
Michael