Обсуждение: Improving test coverage of extensions with pg_dump
Hi all, While investigating the issue that has been committed as ebd092b to fix FK dependencies in pg_dump for tables in extensions, I developed a small test case aimed for integration in src/test/modules to ensure that this does not break again in the future. Note that as the regression tests of this test module use TAP tests to run a set of pg_dump commands and check the sanity of FK dependency tracking with extensions, this has needed a one-line tweak of the target prove_check to be able to install the contents of the current directory to the temp install folder before running the tests. I imagine that it would be nice to integrate this test case to improve test coverage of pg_dump, extending at the same time TAP support for extensions, so attached are patches for this purpose. Those patches are really simple, but then perhaps there are better or simpler ways than what is attached, so feel free to comment if you have any ideas. Regards, -- Michael
Вложения
On Tue, Mar 3, 2015 at 2:40 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Those patches are really simple, but then perhaps there are better or > simpler ways than what is attached, so feel free to comment if you > have any ideas. Attached are new patches somewhat based on the comments provided by Peter Eisentraut (http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net). - 0001 makes prove_check install the contents of the path located at $(CURDIR)/t/extra if present, which would be the path that test developers could use to store extensions, plugins or modules that would then be usable by a given set of TAP tests as they are installed in the temporary installation before launching the tests. - 0002 is the test for pg_dump checking extensions containing FK tables, this time integrated as a TAP test in src/bin/pg_dump, with the extension in src/bin/pg_dump/t/extra. IMO, this approach is scalable enough thinking long-term. And I think that the location of those extra extensions is fine in t/ as an hardcoded subfolder (any fresh idea being of course welcome) as this makes the stuff in extra/ dedicated to only on set of TAP tests, and it is even possible to set multiple extensions/modules. Regards, -- Michael
Вложения
On 03/07/2015 02:34 PM, Michael Paquier wrote: > On Tue, Mar 3, 2015 at 2:40 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Those patches are really simple, but then perhaps there are better or >> simpler ways than what is attached, so feel free to comment if you >> have any ideas. > > Attached are new patches somewhat based on the comments provided by > Peter Eisentraut > (http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net). > - 0001 makes prove_check install the contents of the path located at > $(CURDIR)/t/extra if present, which would be the path that test > developers could use to store extensions, plugins or modules that > would then be usable by a given set of TAP tests as they are installed > in the temporary installation before launching the tests. > - 0002 is the test for pg_dump checking extensions containing FK > tables, this time integrated as a TAP test in src/bin/pg_dump, with > the extension in src/bin/pg_dump/t/extra. > IMO, this approach is scalable enough thinking long-term. And I think > that the location of those extra extensions is fine in t/ as an > hardcoded subfolder (any fresh idea being of course welcome) as this > makes the stuff in extra/ dedicated to only on set of TAP tests, and > it is even possible to set multiple extensions/modules. Hmm. I think it'd be better to put the tables_fk extension into src/test/modules, and the test case under src/test/tables_fk/t/. I'm inclined to think of this as a test case for an extension that contains a table, which includes testing that pg_dump/restore works, rather than as a test of pg_dump itself. - Heikki
On Wed, Jul 8, 2015 at 1:32 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Hmm. I think it'd be better to put the tables_fk extension into > src/test/modules, and the test case under src/test/tables_fk/t/. I'm > inclined to think of this as a test case for an extension that contains a > table, which includes testing that pg_dump/restore works, rather than as a > test of pg_dump itself. The first version of this patch actually did that: http://www.postgresql.org/message-id/CAB7nPqQrxhy3+wvUmA69KJXiRPpV5qWJi-3cLn3ZJgByqe_BQQ@mail.gmail.com The reason why it has been changed to this way of doing is that there were concerns about bloating src/test/modules with many similar tests in the future, like imagine pg_dump_test_1, pg_dump_test_2. Attached is an updated version that can be used in src/test/modules as well. Makefile needed also an extra EXTRA_INSTALL pointing to src/test/modules/tables_fk. Nothing amazing. I have also reworded one-two things at the same time while looking at that again. -- Michael
Вложения
I have reviewed this patch and it compiles runs and the new test case passes. The code is also clean and the test seems like a useful regression test. What I do not like though is how the path src/test/tables_fk/t/ tells us nothing about what features are of PostgreSQL are tested here. For this I personally prefer the earlier versions where I think that was clear. Another though: would it be worthwhile to also add an assertion to check if the data really was restored properly or would that just be redundant code? -- Andreas
On Thu, Jul 30, 2015 at 5:54 AM, Andreas Karlsson <andreas@proxel.se> wrote: > What I do not like though is how the path src/test/tables_fk/t/ tells us > nothing about what features are of PostgreSQL are tested here. For this I > personally prefer the earlier versions where I think that was clear. +Simple module used to check data dump ordering of pg_dump with tables +linked with foreign keys. The README mentions that this is to test pg_dump, perhaps referring to the TAP tests makes sense as well? > Another thought: would it be worthwhile to also add an assertion to check if > the data really was restored properly or would that just be redundant code? That seems worth doing, hence added something for it. Thanks for the suggestion. At the same time I have added an entry in .gitignore for the generated tmp_check/. Regards, -- Michael
Вложения
On 07/30/2015 04:48 AM, Michael Paquier wrote: > On Thu, Jul 30, 2015 at 5:54 AM, Andreas Karlsson <andreas@proxel.se> wrote: >> What I do not like though is how the path src/test/tables_fk/t/ tells us >> nothing about what features are of PostgreSQL are tested here. For this I >> personally prefer the earlier versions where I think that was clear. > > +Simple module used to check data dump ordering of pg_dump with tables > +linked with foreign keys. > The README mentions that this is to test pg_dump, perhaps referring to > the TAP tests makes sense as well? The comment is good, but what I personally do not like is that the path to the test suite is non-obvious and not self explanatory I would not expect src/test/tables_fk/t/ to test pg_dump and extensions. I find it to confusing to regard the test suite as testing an extension called "tables_fk" since in my mind that is just a necessary tool built to test something else. This is obviously a subjective thing, and I see that for example Heikki likes it the way it is. I am fine with setting this as ready for committer and and let a committer weigh in on the naming. >> Another thought: would it be worthwhile to also add an assertion to check if >> the data really was restored properly or would that just be redundant code? > > That seems worth doing, hence added something for it. Thanks for the suggestion. > > At the same time I have added an entry in .gitignore for the generated > tmp_check/. Nice changes. -- Andreas
On Fri, Jul 31, 2015 at 6:41 AM, Andreas Karlsson <andreas@proxel.se> wrote: > The comment is good, but what I personally do not like is that the path to > the test suite is non-obvious and not self explanatory I would not expect > src/test/tables_fk/t/ to test pg_dump and extensions. I find it to confusing > to regard the test suite as testing an extension called "tables_fk" since in > my mind that is just a necessary tool built to test something else. > > This is obviously a subjective thing, and I see that for example Heikki > likes it the way it is. I am fine with setting this as ready for committer > and and let a committer weigh in on the naming. Well, honestly, I would just have it in src/test/modules and call it a deal. Because it is now the only extension that has such interactions with perl tests. And because if modules/ gets bloated later on, we could extend prove_check to install modules from extra paths, just that it does not seem worth it to do it now for one module, and there is no guarantee that we'll have that many around. Of course there is no guarantee either that modules/ is not going to get bloated. Note as well that I will be fine with any decision taken by the committer who picks up this, this test case is useful by itself in any case. -- Michael
On Thu, Jul 30, 2015 at 5:35 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Note as well that I will be fine with any decision taken by the > committer who picks up this, this test case is useful by itself in any > case. And I just recalled that I actually did no tests for this thing on Windows. As this uses the TAP facility, I think that it makes most sense to run it with tapcheck instead of modulescheck in vcregress.pl because of its dependency with IPC::Run. The compilation with MSVC is fixed as well. -- Michael
Вложения
On 2015-08-02 19:15:58 -0700, Michael Paquier wrote: > +psql 'postgres', 'CREATE EXTENSION tables_fk'; > + > +# Insert some data before running the dump, this is needed to check > +# consistent data dump of tables with foreign key dependencies > +psql 'postgres', 'INSERT INTO cc_tab_fkey VALUES (1)'; > +psql 'postgres', 'INSERT INTO bb_tab_fkey VALUES (1)'; > +psql 'postgres', 'INSERT INTO aa_tab_fkey VALUES (1)'; > + > +# Create a table depending on a FK defined in the extension > +psql 'postgres', 'CREATE TABLE dd_tab_fkey (id int REFERENCES bb_tab_fkey(id))'; > +psql 'postgres', 'INSERT INTO dd_tab_fkey VALUES (1)'; > + > +# Take a dump then re-deploy it to a new database > +command_ok(['pg_dump', '-d', 'postgres', '-f', "$tempdir/dump.sql"], > + 'taken dump with installed extension'); > +command_ok(['createdb', 'foobar1'], 'created new database to redeploy dump'); > +command_ok(['psql', '--set=ON_ERROR_STOP=on', '-d', 'foobar1', '-f', > + "$tempdir/dump.sql"], 'dump restored'); > + > +# And check its content > +my $result = psql 'foobar1', > + q{SELECT id FROM aa_tab_fkey UNION ALL > +SELECT id FROM bb_tab_fkey UNION ALL > +SELECT id FROM cc_tab_fkey UNION ALL > +SELECT id FROM dd_tab_fkey}; > +is($result, qq(1 > +1 > +1 > +1), > + 'consistency of data restored'); > diff --git a/src/test/modules/tables_fk/tables_fk--1.0.sql b/src/test/modules/tables_fk/tables_fk--1.0.sql > new file mode 100644 > index 0000000..e424610 > --- /dev/null > +++ b/src/test/modules/tables_fk/tables_fk--1.0.sql > @@ -0,0 +1,20 @@ > +/* src/test/modules/tables_fk/tables_fk--1.0.sql */ > + > +-- complain if script is sourced in psql, rather than via CREATE EXTENSION > +\echo Use "CREATE EXTENSION tables_fk" to load this file. \quit > + > +CREATE TABLE IF NOT EXISTS cc_tab_fkey ( > + id int PRIMARY KEY > +); > + > +CREATE TABLE IF NOT EXISTS bb_tab_fkey ( > + id int PRIMARY KEY REFERENCES cc_tab_fkey(id) > +); > + > +CREATE TABLE IF NOT EXISTS aa_tab_fkey ( > + id int REFERENCES bb_tab_fkey(id) > +); > + > +SELECT pg_catalog.pg_extension_config_dump('aa_tab_fkey', ''); > +SELECT pg_catalog.pg_extension_config_dump('bb_tab_fkey', ''); > +SELECT pg_catalog.pg_extension_config_dump('cc_tab_fkey', ''); > diff --git a/src/test/modules/tables_fk/tables_fk.control b/src/test/modules/tables_fk/tables_fk.control Isn't a full test with a separate initdb, create extension etc. a really heavyhanded way to test this? I mean that's a test where the setup takes up to 10s, whereas the actual runtime is in the millisecond range? Adding tests in this manner doesn't seem scalable to me. I think we should rather add *one* test that does dump/restore over the normal regression test database. Something similar to the pg_upgrade tests. And then work at adding more content to the regression test database - potentially sourcing from src/test/modules. Greetings, Andres Freund
Andres Freund wrote: > Isn't a full test with a separate initdb, create extension etc. a really > heavyhanded way to test this? I mean that's a test where the setup takes > up to 10s, whereas the actual runtime is in the millisecond range? I spent some time looking over this patch yesterday, and that was one concern I had too. I was thinking that if we turn this into a single module that can contain several extensions, or several pg_dump-related tests, and then test multiple features without requiring an initdb for each, it would be more palatable. > Adding tests in this manner doesn't seem scalable to me. I was thinking in having this be renamed src/test/modules/extensions/ and then the extension contained here would be renamed ext001_fk_tables or something like that; later we could ext002_something for testing some other angle of extensions, not necessarily pg_dump related. > I think we should rather add *one* test that does dump/restore over the > normal regression test database. Something similar to the pg_upgrade > tests. And then work at adding more content to the regression test > database - potentially sourcing from src/test/modules. That's another option, but we've had this idea for many years now and it hasn't materialized. As I recall, Andrew Dunstan has a module that tests cross-version pg_upgrade and one thing he does is dump both and compare; the problem is that there are differences, so he keeps a count of how many lines he expect to differ between any two releases. Or something like that. While it's a good enough start, I don't think it's robust enough to be in core. How would we do it? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-09-02 14:30:33 -0300, Alvaro Herrera wrote: > I was thinking in having this be renamed src/test/modules/extensions/ > and then the extension contained here would be renamed ext001_fk_tables > or something like that; later we could ext002_something for testing some > other angle of extensions, not necessarily pg_dump related. The largest dataset we have for this is the current regression test database, it seems a waste not to include it... > That's another option, but we've had this idea for many years now and it > hasn't materialized. But that's just minimally more complex than what's currently done in that test and in the pg_upgrade test? > As I recall, Andrew Dunstan has a module that > tests cross-version pg_upgrade and one thing he does is dump both and > compare; the problem is that there are differences, so he keeps a count > of how many lines he expect to differ between any two releases. I'm not suggesting to do anything cross-release - that'd indeed be another ballpark. Just that instead of a pg_dump test that tests some individual things we have one that tests the whole regression test output and then does a diff? Greetings, Andres Freund
Andres Freund wrote: > > As I recall, Andrew Dunstan has a module that > > tests cross-version pg_upgrade and one thing he does is dump both and > > compare; the problem is that there are differences, so he keeps a count > > of how many lines he expect to differ between any two releases. > > I'm not suggesting to do anything cross-release - that'd indeed be > another ballpark. Ah, you're right. > Just that instead of a pg_dump test that tests some individual things we > have one that tests the whole regression test output and then does a > diff? Sorry if I'm slow -- A diff against what? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Sep 3, 2015 at 2:30 AM, Alvaro Herrera wrote: > Andres Freund wrote: >> Isn't a full test with a separate initdb, create extension etc. a really >> heavyhanded way to test this? I mean that's a test where the setup takes >> up to 10s, whereas the actual runtime is in the millisecond range? > > I spent some time looking over this patch yesterday, and that was one > concern I had too. I was thinking that if we turn this into a single > module that can contain several extensions, or several pg_dump-related > tests, and then test multiple features without requiring an initdb for > each, it would be more palatable. Yeah sure. That's a statement you could generalize for all the TAP tests, per se for example pg_rewind tests. >> Adding tests in this manner doesn't seem scalable to me. > > I was thinking in having this be renamed src/test/modules/extensions/ > and then the extension contained here would be renamed ext001_fk_tables > or something like that; later we could ext002_something for testing some > other angle of extensions, not necessarily pg_dump related. So, if I am following here correctly, what you are suggesting is: 1) create src/test/modules/extensions/, and include fk_tables in it for now 2) offer the possibility to run make check in this path, doing complicated tests on a single instance, pg_dump on an instance is one of them. Note that for now the TAP test machinery does not allow to share a single cluster instance across multiple pl scripts, and it seems to me that's what you are meaning here (in my opinion that's out of sight of this patch, and I don't think either we should do it this will make error handling a nightmare). And we can now have a single TAP script doing all the tests on a single instance. So which one are you suggesting? The patch proposed here is actually doing the second, initializing an instance and performing pg_dump on it. If we change it your way we should just rename the test script to something like 001_test_extensions.pl and mention in it that new tests for extensions should be included in it. Still I would imagine that each extension are actually going to need slightly differ test patterns to work or cover what they are intended to cover. See for example the set of tests added in the CREATE EXTENSION CASCADE patch: those are not adapted to run with the TAP machinery. I may be of course wrong, we may have other extensions in the future using that, still I am seeing none around for now. If you are worrying about run time, we could as well have make check add an EXTRA_INSTALL pointing to src/test/modules/extensions/, with an additional test, say extensions.sql that creates data based on it. Then we let pg_upgrade do the dump/restore checks. This way the run time of make check-world is minimally impacted, and the code paths we want tested are done by the buildfarm. That's ugly, still it won't impact performance. >> I think we should rather add *one* test that does dump/restore over the >> normal regression test database. Something similar to the pg_upgrade >> tests. And then work at adding more content to the regression test >> database - potentially sourcing from src/test/modules. > > That's another option, but we've had this idea for many years now and it > hasn't materialized. As I recall, Andrew Dunstan has a module that > tests cross-version pg_upgrade and one thing he does is dump both and > compare; the problem is that there are differences, so he keeps a count > of how many lines he expect to differ between any two releases. Or > something like that. While it's a good enough start, I don't think it's > robust enough to be in core. How would we do it? That does not seem plausible to me for an integration in core, you would need to keep switching between branches in a given git tree if the test is being done within a git repository, and that would be just not doable from a raw tarball which just stores one given version. -- Michael
On Thu, Sep 3, 2015 at 12:38 AM, Andres Freund wrote: > Isn't a full test with a separate initdb, create extension etc. a really > heavyhanded way to test this? I mean that's a test where the setup takes > up to 10s, whereas the actual runtime is in the millisecond range? > > Adding tests in this manner doesn't seem scalable to me. How to include such kind of tests is in the heart of the discussion since this patch has showed up. I think we are now close to 5 different opinions with 4 different hackers on the matter, the Riggs' theorem applies nicely. > I think we should rather add *one* test that does dump/restore over the > normal regression test database. Something similar to the pg_upgrade > tests. And then work at adding more content to the regression test > database - potentially sourcing from src/test/modules. If you are worrying about run time, pg_upgrade already exactly does that. What would be the point to double the amount of time to do the same thing in two different places? -- Michael
On Thu, Sep 3, 2015 at 10:35 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Sep 3, 2015 at 12:38 AM, Andres Freund wrote: >> Isn't a full test with a separate initdb, create extension etc. a really >> heavyhanded way to test this? I mean that's a test where the setup takes >> up to 10s, whereas the actual runtime is in the millisecond range? >> >> Adding tests in this manner doesn't seem scalable to me. > > How to include such kind of tests is in the heart of the discussion > since this patch has showed up. I think we are now close to 5 > different opinions with 4 different hackers on the matter, the Riggs' > theorem applies nicely. > >> I think we should rather add *one* test that does dump/restore over the >> normal regression test database. Something similar to the pg_upgrade >> tests. And then work at adding more content to the regression test >> database - potentially sourcing from src/test/modules. > > If you are worrying about run time, pg_upgrade already exactly does > that. What would be the point to double the amount of time to do the > same thing in two different places? So, to summarize the state of this patch whose status is now "Waiting on Author", we have the following possibilities: 1) Keep the test as-is, as an independent test of src/test/modules. 2) Integrate it in the test suite of src/test/regress and let pg_upgrade make the work with dump/restore. 3) Create a new facility as src/test/modules/extensions that does a run of the regression test suite with a set of extensions, and then a dump/restore. For now the only extension added in the installation is tables_fk. Future ones would be added here as well, though it is not clear to me what they are and if we'd have some (there may be... Or not). 4) Include it as a test of pg_dump in src/bin/pg_dump/t, with the extension located in for example t/extensions and extend prove_check to install as well any extensions in t/extensions. I would still go for 1), the infrastructures included by the other proposals may become useful depending on the types of tests that are added in the future, but it is still unclear what those tests are, and they may even need something else than what listed here (see that as an example http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net) to run properly. Regards, -- Michael
On 2015-09-07 22:55:50 +0900, Michael Paquier wrote: > So, to summarize the state of this patch whose status is now "Waiting > on Author", we have the following possibilities: > 1) Keep the test as-is, as an independent test of src/test/modules. I find that a bad option. A test that takes this long and has that much setup for such a marginal amount of coverage just doesn't seem worth it. > 2) Integrate it in the test suite of src/test/regress and let > pg_upgrade make the work with dump/restore. 2b) ... and create a single src/test/modules/pg_dumprestore test that initdbs, runs installcheck, dumps, loads and compares a repated dump. I think 2b) is by far the best choice. > I would still go for 1), the infrastructures included by the other > proposals may become useful depending on the types of tests that are > added in the future, but it is still unclear what those tests are, and > they may even need something else than what listed here (see that as > an example http://www.postgresql.org/message-id/54F62C3F.8070702@gmx.net) > to run properly. By that argument we're going to add more and more isolated tests and never merge anything. I fail to see a single benefit of this approach. Greetings, Andres Freund
On Wed, Sep 2, 2015 at 02:30:33PM -0300, Alvaro Herrera wrote: > > I think we should rather add *one* test that does dump/restore over the > > normal regression test database. Something similar to the pg_upgrade > > tests. And then work at adding more content to the regression test > > database - potentially sourcing from src/test/modules. > > That's another option, but we've had this idea for many years now and it > hasn't materialized. As I recall, Andrew Dunstan has a module that > tests cross-version pg_upgrade and one thing he does is dump both and > compare; the problem is that there are differences, so he keeps a count > of how many lines he expect to differ between any two releases. Or > something like that. While it's a good enough start, I don't think it's > robust enough to be in core. How would we do it? I have shell scripts that test pg_dump restore/upgrade of every supported PG version. I also have expected pg_dump output files for every major version. This is explained in src/bin/pg_upgrade/TESTING. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Wed, Sep 9, 2015 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote: > On 2015-09-07 22:55:50 +0900, Michael Paquier wrote: >> So, to summarize the state of this patch whose status is now "Waiting >> on Author", we have the following possibilities: >> 1) Keep the test as-is, as an independent test of src/test/modules. > > I find that a bad option. A test that takes this long and has that much > setup for such a marginal amount of coverage just doesn't seem worth it. > >> 2) Integrate it in the test suite of src/test/regress and let >> pg_upgrade make the work with dump/restore. > > 2b) ... and create a single src/test/modules/pg_dumprestore test that > initdbs, runs installcheck, dumps, loads and compares a repated dump. > > I think 2b) is by far the best choice. And I guess that the extensions tested should be located directly in this path, or we would need again to tweak the MSVC scripts... Attached is an attempt to solve things by converting the previous patch as proposed by Andres. A source and a target databases are used on a cluster, and their respective dumps are compared with File::Compare (available down to perl 5.8.8) at the end. Regards, -- Michael
Вложения
On 2015-09-09 10:48:24 +0900, Michael Paquier wrote: > On Wed, Sep 9, 2015 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote: > > On 2015-09-07 22:55:50 +0900, Michael Paquier wrote: > >> So, to summarize the state of this patch whose status is now "Waiting > >> on Author", we have the following possibilities: > >> 1) Keep the test as-is, as an independent test of src/test/modules. > > > > I find that a bad option. A test that takes this long and has that much > > setup for such a marginal amount of coverage just doesn't seem worth it. > > > >> 2) Integrate it in the test suite of src/test/regress and let > >> pg_upgrade make the work with dump/restore. > > > > 2b) ... and create a single src/test/modules/pg_dumprestore test that > > initdbs, runs installcheck, dumps, loads and compares a repated dump. > > > > I think 2b) is by far the best choice. > > And I guess that the extensions tested should be located directly in > this path, or we would need again to tweak the MSVC scripts... > Attached is an attempt to solve things by converting the previous > patch as proposed by Andres. A source and a target databases are used > on a cluster, and their respective dumps are compared with > File::Compare (available down to perl 5.8.8) at the end. Unless I miss something this isn't 2b) as it does *not* actually run the actual installcheck. Therefore the actual pg_dump/restore coverage is neglegible. Greetings, Andres Freund
On Wed, Sep 16, 2015 at 3:42 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-09-09 10:48:24 +0900, Michael Paquier wrote: >> On Wed, Sep 9, 2015 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote: >> > On 2015-09-07 22:55:50 +0900, Michael Paquier wrote: >> >> So, to summarize the state of this patch whose status is now "Waiting >> >> on Author", we have the following possibilities: >> >> 1) Keep the test as-is, as an independent test of src/test/modules. >> > >> > I find that a bad option. A test that takes this long and has that much >> > setup for such a marginal amount of coverage just doesn't seem worth it. >> > >> >> 2) Integrate it in the test suite of src/test/regress and let >> >> pg_upgrade make the work with dump/restore. >> > >> > 2b) ... and create a single src/test/modules/pg_dumprestore test that >> > initdbs, runs installcheck, dumps, loads and compares a repated dump. >> > >> > I think 2b) is by far the best choice. >> >> And I guess that the extensions tested should be located directly in >> this path, or we would need again to tweak the MSVC scripts... >> Attached is an attempt to solve things by converting the previous >> patch as proposed by Andres. A source and a target databases are used >> on a cluster, and their respective dumps are compared with >> File::Compare (available down to perl 5.8.8) at the end. > > Unless I miss something this isn't 2b) as it does *not* actually run > the actual installcheck. Therefore the actual pg_dump/restore coverage > is neglegible. Hm. OK. I didn't get your message correctly, sorry for that. Would you be fine then to have a pg_regress command using parallel_schedule + an extra schedule launching tests related to the extensions in src/test/modules/pg_dumprestore then? The choice of parallel_schedule is based on what Windows does, aka this schedule is used in the equivalent of make check in vcregress.pl. The TAP script then simply initializes the cluster, runs pg_regress, and does the dump/restore job. There is no real need to worry about setval as dump is not taken from a standby.. -- Michael
On Wed, Sep 16, 2015 at 8:00 PM, Michael Paquier wrote: > Hm. OK. I didn't get your message correctly, sorry for that. Would you > be fine then to have a pg_regress command using parallel_schedule + an > extra schedule launching tests related to the extensions in > src/test/modules/pg_dumprestore then? The choice of parallel_schedule > is based on what Windows does, aka this schedule is used in the > equivalent of make check in vcregress.pl. The TAP script then simply > initializes the cluster, runs pg_regress, and does the dump/restore > job. There is no real need to worry about setval as dump is not taken > from a standby.. So, here we go. I have found something quite interesting when playing with the patch attached: dump does not guarantee the column ordering across databases for some inherited tables, see that example from the main regression test suite the following diff between a dump taken from a source database and a target database where the source dump has been restored in first: -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 3, 'mumble', NULL); -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 4, NULL, NULL); -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, 'bumble', NULL); -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, NULL, NULL); +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 3, NULL, 'mumble'); +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 4, NULL, NULL); +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, 'bumble'); +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, NULL); Problem is similar with --column-inserts, --inserts and COPY. We could use --exclude-table like in the patch attached when taking the dump from source database but that's grotty, or we could improve pg_dump itself, though it may not be worth it for just this purpose. Thoughts? -- Michael
Вложения
On 2015-09-16 22:18:55 -0700, Michael Paquier wrote: > So, here we go. Cool. > I have found something quite interesting when playing with the patch > attached: dump does not guarantee the column ordering across databases > for some inherited tables, see that example from the main regression > test suite the following diff between a dump taken from a source > database and a target database where the source dump has been restored > in first: > -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 3, 'mumble', NULL); > -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 4, NULL, NULL); > -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, 'bumble', NULL); > -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, NULL, NULL); > +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 3, NULL, 'mumble'); > +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 4, NULL, NULL); > +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, 'bumble'); > +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, NULL); > > Problem is similar with --column-inserts, --inserts and COPY. We could > use --exclude-table like in the patch attached when taking the dump > from source database but that's grotty, or we could improve pg_dump > itself, though it may not be worth it for just this purpose. > Thoughts? Hm. To me that sounds like a moderately serious bug worth fixing. I mean if you have a COPY command that worked before a dump/restore but that's wrong afterwards, it's pretty ugly, no? Greetings, Andres Freund
On Thu, Sep 17, 2015 at 7:24 AM, Andres Freund wrote: > On 2015-09-16 22:18:55 -0700, Michael Paquier wrote: >> Problem is similar with --column-inserts, --inserts and COPY. We could >> use --exclude-table like in the patch attached when taking the dump >> from source database but that's grotty, or we could improve pg_dump >> itself, though it may not be worth it for just this purpose. >> Thoughts? > > Hm. To me that sounds like a moderately serious bug worth fixing. I mean > if you have a COPY command that worked before a dump/restore but that's > wrong afterwards, it's pretty ugly, no? COPY or INSERT include the column list in dumps, so that's not really an issue here, what is potentially a problem (looking at that freshly) is --inserts with data-only dumps though. So yes we had better fix it and perhaps consider a backpatch... -- Michael
On Thu, Sep 17, 2015 at 9:47 AM, Michael Paquier wrote: > COPY or INSERT include the column list in dumps, so that's not really > an issue here, what is potentially a problem (looking at that freshly) > is --inserts with data-only dumps though. So yes we had better fix it > and perhaps consider a backpatch... When adding a column to a parent table, attnum gets confused on the child table... Take this example: =# create table aa (a int, b int); CREATE TABLE =# create table bb (c int) inherits (aa); CREATE TABLE =# alter table aa add column d text; ALTER TABLE =# select relname, attname, attnum from pg_attribute join pg_class on (attrelid = oid) where attrelid in( 'bb'::regclass, 'aa'::regclass) and attnum > 0;relname | attname | attnum ---------+---------+--------aa | d | 3aa | b | 2aa | a | 1bb | d | 4bb | c | 3bb | b | 2bb | a | 1 (7 rows) When this is dumped and restored on another database the ordering gets different, c and d are switched for child relation bb here:relname | attname | attnum ---------+---------+--------aa | d | 3aa | b | 2aa | a | 1bb | c | 4bb | d | 3bb | b | 2bb | a | 1 (7 rows) pg_dump relies on attnum to define the column ordering, so one possibility would be to do things more consistently at backend level. Thoughts? -- Michael
On 17-09-2015 14:21, Michael Paquier wrote: > pg_dump relies on attnum to define the column ordering, so one > possibility would be to do things more consistently at backend level. > Thoughts? > According to your example, problem is the columns from the parent table "aa" are added _before_ declaring the inherited table "bb". Then, an attnum from column "d" (part of parent table "aa") is assigned to a lower number than in the original table "bb". Someone can say that we could assign an attnum for column "d" considering all of the inheritance tree. However, attnum is used as an index to arrays (we could bloat some of those) and some logic rely on it to count the number of columns. It would become tablecmds.c into an spaghetti. IMHO a possible way to solve it is adding support for logical column ordering. An ALTER TABLE command (emitted if a parameter was informed) during dump could handle it. BTW, last thread [1] about logical column ordering seems to have died a few months ago. Alvaro? [1] http://www.postgresql.org/message-id/20141209174146.GP1768@alvh.no-ip.org -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
Euler Taveira wrote: > On 17-09-2015 14:21, Michael Paquier wrote: > >pg_dump relies on attnum to define the column ordering, so one > >possibility would be to do things more consistently at backend level. We discussed this in some other thread, not long ago. I looked briefly in the archives but couldn't find it. I think the conclusion was something along the lines of "hmm, tough". > Someone can say that we could assign an attnum for column "d" considering > all of the inheritance tree. However, attnum is used as an index to arrays > (we could bloat some of those) and some logic rely on it to count the number > of columns. It would become tablecmds.c into an spaghetti. We don't need any more spaghetti there, thanks! > IMHO a possible way to solve it is adding support for logical column > ordering. An ALTER TABLE command (emitted if a parameter was informed) > during dump could handle it. BTW, last thread [1] about logical column > ordering seems to have died a few months ago. Alvaro? Tomas Vondra also worked a bit on this patch, and we eventually gave up on it due to lack of time. We might be able to get back on it someday, but do not hold your breath. If you want the current bug fixed, do not wait for logical column numbering. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Euler Taveira wrote:
> On 17-09-2015 14:21, Michael Paquier wrote:
> >pg_dump relies on attnum to define the column ordering, so one
> >possibility would be to do things more consistently at backend level.
We discussed this in some other thread, not long ago. I looked briefly
in the archives but couldn't find it. I think the conclusion was
something along the lines of "hmm, tough".
That's hours, even days of fun ahead for sure.
> Someone can say that we could assign an attnum for column "d" considering
> all of the inheritance tree. However, attnum is used as an index to arrays
> (we could bloat some of those) and some logic rely on it to count the number
> of columns. It would become tablecmds.c into an spaghetti.
That was my first impression, but that's just a band-aid. Switching the logic in stable branches to assign a correct attnum for a child table, while switching on the way the attnum referring to the parent entries is a scary idea. I would suspect that this would break many other things for fixing a narrow case, and it is still possible for the user to get away by using COPY or INSERT with the list of columns listed when taking a dump.
We don't need any more spaghetti there, thanks!
Agreed.
> IMHO a possible way to solve it is adding support for logical column
> ordering. An ALTER TABLE command (emitted if a parameter was informed)
> during dump could handle it. BTW, last thread [1] about logical column
> ordering seems to have died a few months ago. Alvaro?
Tomas Vondra also worked a bit on this patch, and we eventually gave up
on it due to lack of time. We might be able to get back on it someday,
but do not hold your breath. If you want the current bug fixed, do not
wait for logical column numbering.
Honestly I have the feeling that the discussion of this thread gets unproductive, let's not forget that the patch presented on this thread is just aiming at adding one test case to ensure that extensions using dumpable relations with FKs get correctly dumped, which is to ensure that we do not reintroduce a bug that existed in the extension facility since its introduction in 9.1. That being said, the patch is just fine for this purpose, but that's just my opinion.
--
--
Michael
On 2015-09-26 21:54:46 +0900, Michael Paquier wrote: > On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > We discussed this in some other thread, not long ago. I looked briefly > > in the archives but couldn't find it. I think the conclusion was > > something along the lines of "hmm, tough". > That's hours, even days of fun ahead for sure. To me that's a somewhat serious bug, and something that we probably need to address at some point. > Honestly I have the feeling that the discussion of this thread gets > unproductive, let's not forget that the patch presented on this thread is > just aiming at adding one test case to ensure that extensions using > dumpable relations with FKs get correctly dumped, which is to ensure that > we do not reintroduce a bug that existed in the extension facility since > its introduction in 9.1. That being said, the patch is just fine for this > purpose, but that's just my opinion. It's an unsustainable test model. Adding own test runners, directories, initdb etc. for a simple regression test of a couple lines won't hurt for maybe the first five but after that it starts to get unmaintainable. Greetings, Andres Freund
On Sat, Sep 26, 2015 at 10:22 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-09-26 21:54:46 +0900, Michael Paquier wrote: >> On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre@2ndquadrant.com> >> wrote: >> > We discussed this in some other thread, not long ago. I looked briefly >> > in the archives but couldn't find it. I think the conclusion was >> > something along the lines of "hmm, tough". > >> That's hours, even days of fun ahead for sure. > > To me that's a somewhat serious bug, and something that we probably need > to address at some point. Well, addressing it at the root of the problem is... Let's say... Really tough. I have put my head on this stuff for a couple of hours and tried to draw up a couple of ways to fix this. First, even if pg_dump relies heavily on the assumption that attributes need to be ordered by attnum, I thought that it would have been possible to use attinhcount to order the columns in the same way when taking the dump from a source db and a target (where dump of source has been restored to). This would work if there is no more than 1 level of child relations, but with grand-child relations the order gets messed up again. Locating a fix on the backend side would make things for pg_dump easier, an idea would be to simply reorder the attribute numbers when a column is added to a parent table. For example with something like that: CREATE TABLE test_parent (c1 integer, c2 integer); CREATE TABLE test_child_1 (c3 integer) inherits (test_parent); CREATE TABLE test_child_2 (c4 integer) inherits (test_child_1); ALTER TABLE test_parent ADD COLUMN c5 integer; We get the following attribute ordering: =# SELECT c.relname, a.attname, a.attnum FROM pg_attribute a JOIN pg_class c ON (a.attrelid = c.oid) WHERE a.attrelid IN ('test_parent'::regclass, 'test_child_1'::regclass, 'test_child_2'::regclass) AND a.attnum > 0 ORDER BY c.relname, a.attnum; relname | attname | attnum --------------+---------+-------- test_child_1 | c1 | 1 test_child_1 | c2 | 2 test_child_1 | c3 | 3 test_child_1 | c5 | 4 test_child_2 | c1 | 1 test_child_2 | c2 | 2 test_child_2 | c3 | 3 test_child_2 | c4 | 4 test_child_2 | c5 | 5 test_parent | c1 | 1 test_parent | c2 | 2 test_parent | c5 | 3 (12 rows) Now, what we could do on a child relation when a new attribute in its parent relation is to increment the attnum of the existing columns with attinhcount = 0 by 1, and insert the new column at the end of the existing ones where attinhcount > 0, to give something like that: relname | attname | attnum --------------+---------+-------- test_child_1 | c1 | 1 test_child_1 | c2 | 2 test_child_1 | c5 | 3 test_child_1 | c3 | 4 test_child_2 | c1 | 1 test_child_2 | c2 | 2 test_child_2 | c3 | 3 test_child_2 | c5 | 4 test_child_2 | c4 | 5 test_parent | c1 | 1 test_parent | c2 | 2 test_parent | c5 | 3 (12 rows) Looking at tablecmds.c, this could be intuitively done as a part of ATExecAddColumn by scanning the attributes of the child relation from the end. But it is of course not that simple, a lot of code paths rely on the attnum of the current attributes. One is CookedConstraint, but that's a can of worms for back branches. Another bandaid solution, that would be less invasive for a backpatch is to reproduce the sequence of DDL commands within the dump itself: we would need to dump attinhcount in getTableAttrs and use it to guess what are the columns on the child relations that have been added on a parent relation after the children have been created. This would not solve the case of data-only dumps containing INSERT queries that have no column names on databases restored from past schema dumps though. Perhaps you did not look at the last patch I sent on this thread, but I changed it so as a schedule is used with a call to pg_regress. That's a more scalable approach as you were concerned about as we can plug in more easily new modules and new regression tests without modifying the perl script used to kick the tests, though it does not run the main regression test suite and it actually cannot run it, except with a modified schedule or with a filter of the child-parent tables. Patch is attached for consideration, which looks like a good base for future support, feel free to disagree :) Thanks, -- Michael
Вложения
On Tue, Sep 29, 2015 at 9:39 PM, Michael Paquier wrote: > Perhaps you did not look at the last patch I sent on this thread, but > I changed it so as a schedule is used with a call to pg_regress. > That's a more scalable approach as you were concerned about as we can > plug in more easily new modules and new regression tests without > modifying the perl script used to kick the tests, though it does not > run the main regression test suite and it actually cannot run it, > except with a modified schedule or with a filter of the child-parent > tables. Patch is attached for consideration, which looks like a good > base for future support, feel free to disagree :) It is a bit sad to say, but this patch aiming at adding a test case for an uncovered code path of pg_dump has not reached an agreement. The most simple way to reduce the overhead of this test would be to include something in the main regression test suite and let pg_dump call incorporated in pg_upgrade test do the work. This has been suggested upthread already as one way to do things. Doing diff comparision of pg_dump/restore on the regression database data set would have been nice, but the column ordering of inherited tables when a column is added to a child is a can of worms, so that's not something this patch should try to address. So marking it as returned with feedback or moving it to next CF? -- Michael