Обсуждение: BUG #9616: Materialized view with indexes unable to load from pg_dump

Поиск
Список
Период
Сортировка

BUG #9616: Materialized view with indexes unable to load from pg_dump

От
jesse.denardo@myfarms.com
Дата:
The following bug has been logged on the website:

Bug reference:      9616
Logged by:          Jesse Denardo
Email address:      jesse.denardo@myfarms.com
PostgreSQL version: 9.3.3
Operating system:   Arch Linux
Description:

Linux: Linux hostname 3.13.5-1-ARCH #1 SMP PREEMPT Sun Feb 23 00:25:24 CET
2014 x86_64 GNU/Linux
PostgreSQL: PostgreSQL 9.3.3 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 4.8.2 20140206 (prerelease), 64-bit

I have created several materialized views on one database. To copy data to a
new database, I dump the schema(s) of choice using pg_dump and pipe the
result to psql. The exact command looks like:

$ pg_dump -b -n <schema> -O -x -h localhost -U <user> <db> | psql -h <other
host> -d <other db> -U <other user>

The materialized views get copied and restored just fine with one exception:
The one mat view that has indexes. Here is the result of \d on this mat
view:

# \d example_mat_view;
Materialized view "example_mat_view"
    Column    |       Type        | Modifiers
--------------+-------------------+-----------
 f_id         | integer           |
 f_name       | character varying | collate C
 s_z          | integer[]         |
 c_id         | integer           |
Indexes:
    "example_mat_view_c_id" btree (c_id)
    "example_mat_view_f_id" btree (f_id)

Analyzing the output of pg_dump shows the following:

--
-- Name: example_mat_view; Type: TABLE; Schema: <schema>; Owner: -;
Tablespace:~
--

CREATE TABLE example_mat_view (
    f_id integer,
    f_name character varying COLLATE pg_catalog."C",
    s_z integer[],
    c_id integer
);

--
-- Name: example_mat_view_c_id; Type: INDEX; Schema: <schema>; Owner: -;
Tablespace:~
--

CREATE INDEX example_mat_view_c_id ON example_mat_view USING btree (c_id);


--
-- Name: example_mat_view_f_id; Type: INDEX; Schema: <schema>; Owner: -;
Tablespace:~
--

CREATE INDEX example_mat_view_f_id ON example_mat_view USING btree (f_id);

--
-- Name: _RETURN; Type: RULE; Schema: <schema>; Owner: -
--

CREATE RULE "_RETURN" AS
    ON SELECT TO example_mat_view DO INSTEAD  <sql query>
   ;

--
-- Name: example_mat_view; Type: MATERIALIZED VIEW DATA; Schema: <schema>;
Owner: -
--

REFRESH MATERIALIZED VIEW example_mat_view;



Attempting to restore this dump via psql gives the following errors:

ERROR:  could not convert table "example_mat_view" to a view because it has
indexes
ERROR:  "example_mat_view" is not a materialized view


My independent testing showed that pg_dump usually dumps materialized views
using CREATE MATERIALIZED VIEW, not tables that are later converted into mat
views. I confirmed that the mat view was created normally with CREATE
MATERIALIZED VIEW example_mat_view AS <sql query>;

Re: BUG #9616: Materialized view with indexes unable to load from pg_dump

От
Tom Lane
Дата:
jesse.denardo@myfarms.com writes:
> I have created several materialized views on one database. To copy data to a
> new database, I dump the schema(s) of choice using pg_dump and pipe the
> result to psql. The exact command looks like:

> $ pg_dump -b -n <schema> -O -x -h localhost -U <user> <db> | psql -h <other
> host> -d <other db> -U <other user>

> The materialized views get copied and restored just fine with one exception:
> The one mat view that has indexes. Here is the result of \d on this mat
> view:

I can't reproduce this here; it seems clear that there's some triggering
condition other than the mere presence of indexes.  Could you submit a
*self contained* example (preferably, a SQL script that makes a database
that pg_dump misbehaves on)?

The output looks a little bit like pg_dump might be trying to break some
circular dependency involving the matview.  But it's not obvious how
you'd have gotten into such a situation, and we certainly haven't been
shown all the moving parts.

            regards, tom lane

Re: BUG #9616: Materialized view with indexes unable to load from pg_dump

От
Jesse Denardo
Дата:
I've managed to consistently reproduce the error using a condensed script:

File test_mat.sql:

CREATE SCHEMA test_mat;
CREATE TABLE test_mat.a (
    id integer,
    name character varying
);
ALTER TABLE ONLY test_mat.a ADD CONSTRAINT a_pkey PRIMARY KEY (id);
CREATE MATERIALIZED VIEW test_mat.mat AS (
    SELECT id, name FROM test_mat.a GROUP BY id
);

Execute the following commands to create, dump, drop, and reload the schema:

$ psql -d db -f test_mat.sql
$ pg_dump -n test_mat db > test_mat_dump.sql (notice how the dump creates
"mat" as CREATE TABLE rather than CREATE MATERIALIZED VIEW)
$ psql -d db -c "DROP SCHEMA test_mat CASCADE"
$ psql -d db -f test_mat_dump.sql

This last command produces:

psql:test_mat_dump.sql:82: ERROR:  "mat" is not a table or materialized view

"mat" ends up being a view, rather than a materialized view.

If I add the following line to the end of test_mat.sql and repeat the
entire procedure, I get the error I described earlier:

CREATE INDEX test_idx ON test_mat.mat (id);

Produces:

psql:test_mat_dump.sql:82: ERROR:  could not convert table "mat" to a view
because it has indexes
psql:test_mat_dump.sql:89: ERROR:  "mat" is not a materialized view


--
Jesse Denardo


On Tue, Mar 18, 2014 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> jesse.denardo@myfarms.com writes:
> > I have created several materialized views on one database. To copy data
> to a
> > new database, I dump the schema(s) of choice using pg_dump and pipe the
> > result to psql. The exact command looks like:
>
> > $ pg_dump -b -n <schema> -O -x -h localhost -U <user> <db> | psql -h
> <other
> > host> -d <other db> -U <other user>
>
> > The materialized views get copied and restored just fine with one
> exception:
> > The one mat view that has indexes. Here is the result of \d on this mat
> > view:
>
> I can't reproduce this here; it seems clear that there's some triggering
> condition other than the mere presence of indexes.  Could you submit a
> *self contained* example (preferably, a SQL script that makes a database
> that pg_dump misbehaves on)?
>
> The output looks a little bit like pg_dump might be trying to break some
> circular dependency involving the matview.  But it's not obvious how
> you'd have gotten into such a situation, and we certainly haven't been
> shown all the moving parts.
>
>                         regards, tom lane
>
>

Re: BUG #9616: Materialized view with indexes unable to load from pg_dump

От
Tom Lane
Дата:
Jesse Denardo <jesse.denardo@myfarms.com> writes:
> [ pg_dump fails to dump this sanely: ]

> CREATE SCHEMA test_mat;
> CREATE TABLE test_mat.a (
>     id integer,
>     name character varying
> );
> ALTER TABLE ONLY test_mat.a ADD CONSTRAINT a_pkey PRIMARY KEY (id);
> CREATE MATERIALIZED VIEW test_mat.mat AS (
>     SELECT id, name FROM test_mat.a GROUP BY id
> );

OK, the reason why this is odd is that the view definition requires that
a.id be a primary key.  Because primary keys are supposed to be emitted
in the post-data section of the dump, while the CREATE MAT VIEW command
will be emitted pre-data, we have a dump-object sorting failure, which
manifests as a dependency loop involving the matview, its rule, a's
primary key index, and the section boundary pseudo-objects.

The code in pg_dump_sort.c assumes that any relation with an ON SELECT
rule must be a regular view, and so it tries to break the loop by
splitting the view into a CREATE TABLE command followed by a CREATE RULE
command.  Of course, that results in SQL that recreates a plain view not a
matview; not to mention your original complaint that it fails when there
are indexes on the matview.

I've developed the attached draft patch that fixes this by allowing the
CREATE MATERIALIZED VIEW command to be postponed into the post-data
section.  I think this is pretty darn ugly, since people won't expect such
a categorization, but I'm not sure we have any choice so far as existing
branches are concerned.

If we had a CREATE OR REPLACE MATERIALIZED VIEW type of command, we could
imagine fixing this by emitting a dummy create command in the pre-data
section, along the lines of

     CREATE MATERIALIZED VIEW mv AS SELECT
         NULL::coltype1 AS colname1,
         NULL::coltype2 AS colname2,
        ...
     WITH NO DATA;

and then overwriting that with CREATE OR REPLACE MATERIALIZED VIEW at the
point where it's safe to give the real view definition.  (If we were to
write code for that, I'd be pretty inclined to change the dumping of
separated views to look similar, instead of the hack they are now.)

Of course, you could argue that that wouldn't be materially less confusing
than the other way.  But I suspect we are going to end up doing something
of the sort anyhow, because I have little faith that the technique in this
patch will fix every case of circularities involving matviews.

Comments?  Anybody want to try to fix this in another way?

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4001f3f..2653ef0 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** getTables(Archive *fout, int *numTables)
*** 4796,4801 ****
--- 4796,4803 ----
              selectDumpableTable(&tblinfo[i]);
          tblinfo[i].interesting = tblinfo[i].dobj.dump;

+         tblinfo[i].postponed_def = false; /* might get set during sort */
+
          /*
           * Read-lock target tables to make sure they aren't DROPPED or altered
           * in schema before we get around to dumping them.
*************** dumpTableSchema(Archive *fout, TableInfo
*** 13611,13617 ****
              (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
                   tbinfo->rolname,
                 (strcmp(reltypename, "TABLE") == 0) ? tbinfo->hasoids : false,
!                  reltypename, SECTION_PRE_DATA,
                   q->data, delq->data, NULL,
                   NULL, 0,
                   NULL, NULL);
--- 13613,13620 ----
              (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
                   tbinfo->rolname,
                 (strcmp(reltypename, "TABLE") == 0) ? tbinfo->hasoids : false,
!                  reltypename,
!                  tbinfo->postponed_def ? SECTION_POST_DATA : SECTION_PRE_DATA,
                   q->data, delq->data, NULL,
                   NULL, 0,
                   NULL, NULL);
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 398eca1..e01015e 100644
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
*************** typedef struct _tableInfo
*** 241,247 ****
      char       *reltablespace;    /* relation tablespace */
      char       *reloptions;        /* options specified by WITH (...) */
      char       *checkoption;    /* WITH CHECK OPTION */
!     char       *toast_reloptions;        /* ditto, for the TOAST table */
      bool        hasindex;        /* does it have any indexes? */
      bool        hasrules;        /* does it have any rules? */
      bool        hastriggers;    /* does it have any triggers? */
--- 241,247 ----
      char       *reltablespace;    /* relation tablespace */
      char       *reloptions;        /* options specified by WITH (...) */
      char       *checkoption;    /* WITH CHECK OPTION */
!     char       *toast_reloptions;        /* WITH options for the TOAST table */
      bool        hasindex;        /* does it have any indexes? */
      bool        hasrules;        /* does it have any rules? */
      bool        hastriggers;    /* does it have any triggers? */
*************** typedef struct _tableInfo
*** 254,262 ****
      /* these two are set only if table is a sequence owned by a column: */
      Oid            owning_tab;        /* OID of table owning sequence */
      int            owning_col;        /* attr # of column owning sequence */
!     int            relpages;

      bool        interesting;    /* true if need to collect more data */

      /*
       * These fields are computed only if we decide the table is interesting
--- 254,263 ----
      /* these two are set only if table is a sequence owned by a column: */
      Oid            owning_tab;        /* OID of table owning sequence */
      int            owning_col;        /* attr # of column owning sequence */
!     int            relpages;        /* table's size in pages (from pg_class) */

      bool        interesting;    /* true if need to collect more data */
+     bool        postponed_def;    /* matview must be postponed into post-data */

      /*
       * These fields are computed only if we decide the table is interesting
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index bda3535..3590932 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*************** repairTypeFuncLoop(DumpableObject *typeo
*** 798,803 ****
--- 798,804 ----
   * will be an implicit dependency in the other direction, we need to break
   * the loop.  If there are no other objects in the loop then we can remove
   * the implicit dependency and leave the ON SELECT rule non-separate.
+  * This applies to matviews, as well.
   */
  static void
  repairViewRuleLoop(DumpableObject *viewobj,
*************** repairViewRuleLoop(DumpableObject *viewo
*** 814,820 ****
   * Because findLoop() finds shorter cycles before longer ones, it's likely
   * that we will have previously fired repairViewRuleLoop() and removed the
   * rule's dependency on the view.  Put it back to ensure the rule won't be
!  * emitted before the view...
   */
  static void
  repairViewRuleMultiLoop(DumpableObject *viewobj,
--- 815,823 ----
   * Because findLoop() finds shorter cycles before longer ones, it's likely
   * that we will have previously fired repairViewRuleLoop() and removed the
   * rule's dependency on the view.  Put it back to ensure the rule won't be
!  * emitted before the view.
!  *
!  * Note: this approach does *not* work for matviews, at the moment.
   */
  static void
  repairViewRuleMultiLoop(DumpableObject *viewobj,
*************** repairViewRuleMultiLoop(DumpableObject *
*** 842,847 ****
--- 845,874 ----
  }

  /*
+  * If a matview is involved in a multi-object loop, we can't currently fix
+  * that by splitting off the rule.    As a stopgap, we try to fix it by
+  * dropping the constraint that the matview be dumped in the pre-data section.
+  * This is sufficient to handle cases where a matview depends on some unique
+  * index, as can happen if it has a GROUP BY for example.
+  *
+  * Note that the "next object" is not necessarily the matview itself;
+  * it could be the matview's rowtype, for example.  We may come through here
+  * several times while removing all the pre-data linkages.
+  */
+ static void
+ repairMatViewBoundaryMultiLoop(DumpableObject *matviewobj,
+                                DumpableObject *boundaryobj,
+                                DumpableObject *nextobj)
+ {
+     TableInfo  *matviewinfo = (TableInfo *) matviewobj;
+
+     /* remove boundary's dependency on object after it in loop */
+     removeObjectDependency(boundaryobj, nextobj->dumpId);
+     /* mark matview as postponed into post-data section */
+     matviewinfo->postponed_def = true;
+ }
+
+ /*
   * Because we make tables depend on their CHECK constraints, while there
   * will be an automatic dependency in the other direction, we need to break
   * the loop.  If there are no other objects in the loop then we can remove
*************** repairDependencyLoop(DumpableObject **lo
*** 956,965 ****
          return;
      }

!     /* View and its ON SELECT rule */
      if (nLoop == 2 &&
          loop[0]->objType == DO_TABLE &&
          loop[1]->objType == DO_RULE &&
          ((RuleInfo *) loop[1])->ev_type == '1' &&
          ((RuleInfo *) loop[1])->is_instead &&
          ((RuleInfo *) loop[1])->ruletable == (TableInfo *) loop[0])
--- 983,994 ----
          return;
      }

!     /* View (including matview) and its ON SELECT rule */
      if (nLoop == 2 &&
          loop[0]->objType == DO_TABLE &&
          loop[1]->objType == DO_RULE &&
+         (((TableInfo *) loop[0])->relkind == 'v' ||        /* RELKIND_VIEW */
+          ((TableInfo *) loop[0])->relkind == 'm') &&    /* RELKIND_MATVIEW */
          ((RuleInfo *) loop[1])->ev_type == '1' &&
          ((RuleInfo *) loop[1])->is_instead &&
          ((RuleInfo *) loop[1])->ruletable == (TableInfo *) loop[0])
*************** repairDependencyLoop(DumpableObject **lo
*** 970,975 ****
--- 999,1006 ----
      if (nLoop == 2 &&
          loop[1]->objType == DO_TABLE &&
          loop[0]->objType == DO_RULE &&
+         (((TableInfo *) loop[1])->relkind == 'v' ||        /* RELKIND_VIEW */
+          ((TableInfo *) loop[1])->relkind == 'm') &&    /* RELKIND_MATVIEW */
          ((RuleInfo *) loop[0])->ev_type == '1' &&
          ((RuleInfo *) loop[0])->is_instead &&
          ((RuleInfo *) loop[0])->ruletable == (TableInfo *) loop[1])
*************** repairDependencyLoop(DumpableObject **lo
*** 978,989 ****
          return;
      }

!     /* Indirect loop involving view and ON SELECT rule */
      if (nLoop > 2)
      {
          for (i = 0; i < nLoop; i++)
          {
!             if (loop[i]->objType == DO_TABLE)
              {
                  for (j = 0; j < nLoop; j++)
                  {
--- 1009,1021 ----
          return;
      }

!     /* Indirect loop involving view (but not matview) and ON SELECT rule */
      if (nLoop > 2)
      {
          for (i = 0; i < nLoop; i++)
          {
!             if (loop[i]->objType == DO_TABLE &&
!                 ((TableInfo *) loop[i])->relkind == 'v')        /* RELKIND_VIEW */
              {
                  for (j = 0; j < nLoop; j++)
                  {
*************** repairDependencyLoop(DumpableObject **lo
*** 1000,1005 ****
--- 1032,1061 ----
          }
      }

+     /* Indirect loop involving matview and data boundary */
+     if (nLoop > 2)
+     {
+         for (i = 0; i < nLoop; i++)
+         {
+             if (loop[i]->objType == DO_TABLE &&
+                 ((TableInfo *) loop[i])->relkind == 'm')        /* RELKIND_MATVIEW */
+             {
+                 for (j = 0; j < nLoop; j++)
+                 {
+                     if (loop[j]->objType == DO_PRE_DATA_BOUNDARY)
+                     {
+                         DumpableObject *nextobj;
+
+                         nextobj = (j < nLoop - 1) ? loop[j + 1] : loop[0];
+                         repairMatViewBoundaryMultiLoop(loop[i], loop[j],
+                                                        nextobj);
+                         return;
+                     }
+                 }
+             }
+         }
+     }
+
      /* Table and CHECK constraint */
      if (nLoop == 2 &&
          loop[0]->objType == DO_TABLE &&