Обсуждение: pg_relation_size locking
Until recently, pg_relation_size used SearchSysCache to locate the relation to examine, and calculated the file location from that information. Starting with dbsize.c V1.5 (committed after Beta2), relation_open(.., AccessShareLock) is used. This is very unfortunate because it will not allow to observe a table growing while it is populated, e.g. with a lengthy COPY; pg_relation_size will be blocked. After reverting to 1.4, everything was fine again. Can we have this reverted/fixed? Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Until recently, pg_relation_size used SearchSysCache to locate the > relation to examine, and calculated the file location from that > information. Starting with dbsize.c V1.5 (committed after Beta2), > relation_open(.., AccessShareLock) is used. This is very unfortunate > because it will not allow to observe a table growing while it is > populated, e.g. with a lengthy COPY; pg_relation_size will be blocked. Nonsense. > After reverting to 1.4, everything was fine again. > Can we have this reverted/fixed? Can we have the actual problem explained? regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>Until recently, pg_relation_size used SearchSysCache to locate the >>relation to examine, and calculated the file location from that >>information. Starting with dbsize.c V1.5 (committed after Beta2), >>relation_open(.., AccessShareLock) is used. This is very unfortunate >>because it will not allow to observe a table growing while it is >>populated, e.g. with a lengthy COPY; pg_relation_size will be blocked. > > > Nonsense. Ahem. I'm running Slony against a big replication set. While slon runs COPY foo(colnamelist) FROM STDIN, I can't execute pg_relation_size(foo_oid). pg_locks will show that the AccessShareLock on foo is not granted. Problem is gone with reverted dbsize.c Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> Nonsense. > Ahem. > I'm running Slony against a big replication set. While slon runs COPY > foo(colnamelist) FROM STDIN, I can't execute pg_relation_size(foo_oid). > pg_locks will show that the AccessShareLock on foo is not granted. That's only possible if Slony is taking AccessExclusive lock; if so, your gripe is properly directed to the Slony folks, not to pg_relation_size which is acting as a good database citizen should. Certainly a plain COPY command does not take AccessExclusive. regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>Tom Lane wrote: >> >>>Nonsense. > > >>Ahem. > > >>I'm running Slony against a big replication set. While slon runs COPY >>foo(colnamelist) FROM STDIN, I can't execute pg_relation_size(foo_oid). >>pg_locks will show that the AccessShareLock on foo is not granted. > > > That's only possible if Slony is taking AccessExclusive lock; if so, > your gripe is properly directed to the Slony folks, not to > pg_relation_size which is acting as a good database citizen should. More precisely, it executes TRUNCATE;COPY at the same time; there might be additional locks to prevent using the table. Still, I see no reason why pg_relation_size shouldn't continue to use SearchSysCache as id did for years now. There's no sense in using locking mechanisms on table foo while reading file system data; pg_class is sufficient to locate the table's files. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> That's only possible if Slony is taking AccessExclusive lock; if so, >> your gripe is properly directed to the Slony folks, not to >> pg_relation_size which is acting as a good database citizen should. > More precisely, it executes TRUNCATE;COPY at the same time; there might > be additional locks to prevent using the table. Still, I see no reason > why pg_relation_size shouldn't continue to use SearchSysCache as id did > for years now. There's no sense in using locking mechanisms on table foo > while reading file system data; pg_class is sufficient to locate the > table's files. The fact that the contrib version did things incorrectly for years is no justification for not fixing it at the time it's taken into the core. You have to have a lock to ensure that the table even exists, let alone that you are looking at the right set of disk files. In the above example, the contrib code would have not done the right thing at all --- if I'm not mistaken, it would have kept handing back the size of the original, pre-TRUNCATE file, since the new pg_class row with the new relfilenode isn't committed yet. So it wouldn't have done what you wish anyway. regards, tom lane
Tom Lane wrote: > In the above example, the contrib code would have not done the right > thing at all --- if I'm not mistaken, it would have kept handing back > the size of the original, pre-TRUNCATE file, since the new pg_class > row with the new relfilenode isn't committed yet. So it wouldn't have > done what you wish anyway. It wouldn't have worked anyway because it used the Oid to search the file, not the relfilenode. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > > The problem with the original coding was that it used the table Oid to > look up the file name, which is wrong. (Test it with a table that has > been clustered or an index that has been reindexed.) Um, can't test at the moment. The oldcode used pg_class->relfilnode, which delivers "Name of the on-disk file of this relation" according to the docs. What's wrong with that? regards, Andreas
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>Tom Lane wrote: >> >>>That's only possible if Slony is taking AccessExclusive lock; if so, >>>your gripe is properly directed to the Slony folks, not to >>>pg_relation_size which is acting as a good database citizen should. > > >>More precisely, it executes TRUNCATE;COPY at the same time; there might >>be additional locks to prevent using the table. Still, I see no reason >>why pg_relation_size shouldn't continue to use SearchSysCache as id did >>for years now. There's no sense in using locking mechanisms on table foo >>while reading file system data; pg_class is sufficient to locate the >>table's files. > > > The fact that the contrib version did things incorrectly for years is > no justification for not fixing it at the time it's taken into the core. > You have to have a lock to ensure that the table even exists, let alone > that you are looking at the right set of disk files. This would require a lock on pg_class, not table foo, no? > In the above example, the contrib code would have not done the right > thing at all --- if I'm not mistaken, it would have kept handing back > the size of the original, pre-TRUNCATE file, since the new pg_class > row with the new relfilenode isn't committed yet. Hm, I see the issue. Interesting enough, I *do* see the size growing. OTOH, when running BEGIN;TRUNCATE against a test table and retrieving pg_relation_size returns the previous relfilenode and size as expected. Regards, Andreas
Andreas Pflug wrote: > Until recently, pg_relation_size used SearchSysCache to locate the > relation to examine, and calculated the file location from that > information. Starting with dbsize.c V1.5 (committed after Beta2), > relation_open(.., AccessShareLock) is used. This is very unfortunate > because it will not allow to observe a table growing while it is > populated, e.g. with a lengthy COPY; pg_relation_size will be blocked. > After reverting to 1.4, everything was fine again. The diff: http://projects.commandprompt.com/projects/public/pgsql/changeset/23120 The problem with the original coding was that it used the table Oid to look up the file name, which is wrong. (Test it with a table that has been clustered or an index that has been reindexed.) We could use a SysCache on filenode, if there was one. Unfortunately I don't think we have it. > Can we have this reverted/fixed? If you can see a way without reintroducing the old bugs, let me know. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> You have to have a lock to ensure that the table even exists, let alone >> that you are looking at the right set of disk files. > This would require a lock on pg_class, not table foo, no? No, the convention is that you take a lock on the relation you're interested in. The fact that some of the information you care about is in pg_class is incidental. There is actually stuff going on behind the scenes to make sure that you get up-to-date info when you do LockRelation; looking at the pg_class row does *not* by itself guarantee that. That is, when you SearchSysCache you might get a row that was good at the start of your transaction but no longer is; relation_open with a lock guarantees that you get a relation descriptor that is currently correct. > Hm, I see the issue. Interesting enough, I *do* see the size growing. > OTOH, when running BEGIN;TRUNCATE against a test table and retrieving > pg_relation_size returns the previous relfilenode and size as expected. That's a bit curious. If they just did TRUNCATE then COPY, the commit of the TRUNCATE should have released the lock. If the TRUNCATE wasn't committed yet, then how are you able to pick up the correct relfilenode to look at? regards, tom lane
[Resend: apparently there's a problem with my mail server] Andreas Pflug wrote: > Until recently, pg_relation_size used SearchSysCache to locate the > relation to examine, and calculated the file location from that > information. Starting with dbsize.c V1.5 (committed after Beta2), > relation_open(.., AccessShareLock) is used. This is very unfortunate > because it will not allow to observe a table growing while it is > populated, e.g. with a lengthy COPY; pg_relation_size will be blocked. > After reverting to 1.4, everything was fine again. The diff: http://projects.commandprompt.com/projects/public/pgsql/changeset/23120 The problem with the original coding was that it used the table Oid to look up the file name, which is wrong. (Test it with a table that has been clustered or an index that has been reindexed.) We could use a SysCache on filenode, if there was one. Unfortunately I don't think we have it. > Can we have this reverted/fixed? If you can see a way without reintroducing the old bugs, let me know. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>Tom Lane wrote: >> >>>You have to have a lock to ensure that the table even exists, let alone >>>that you are looking at the right set of disk files. > > >>This would require a lock on pg_class, not table foo, no? > > > No, the convention is that you take a lock on the relation you're > interested in. So pgAdmin violates the convention, because it doesn't hold a lock an a table when reengineering its attributes.... Since pg_relation_size is a pure metadata function, I don't think the convention hits here (. > >>Hm, I see the issue. Interesting enough, I *do* see the size growing. >>OTOH, when running BEGIN;TRUNCATE against a test table and retrieving >>pg_relation_size returns the previous relfilenode and size as expected. > > > That's a bit curious. If they just did TRUNCATE then COPY, the commit > of the TRUNCATE should have released the lock. If the TRUNCATE wasn't > committed yet, then how are you able to pick up the correct relfilenode > to look at? The truncate is buried in a function, I suspect that actually no truncate happened on an empty table. Regards, Andreas
Andreas Pflug wrote: > Alvaro Herrera wrote: > > > >The problem with the original coding was that it used the table Oid to > >look up the file name, which is wrong. (Test it with a table that has > >been clustered or an index that has been reindexed.) > > Um, can't test at the moment. The oldcode used pg_class->relfilnode, > which delivers "Name of the on-disk file of this relation" according to > the docs. What's wrong with that? Hum, nothing that I can see, but I changed that code precisely because somebody complained that it didn't work after truncating. Do you mean "oldcode" as "the contrib code", or "the initially integrated in backend code"? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Andreas Pflug wrote: > >>Alvaro Herrera wrote: >> >>>The problem with the original coding was that it used the table Oid to >>>look up the file name, which is wrong. (Test it with a table that has >>>been clustered or an index that has been reindexed.) >> >>Um, can't test at the moment. The oldcode used pg_class->relfilnode, >>which delivers "Name of the on-disk file of this relation" according to >>the docs. What's wrong with that? > > > Hum, nothing that I can see, but I changed that code precisely because > somebody complained that it didn't work after truncating. Do you mean > "oldcode" as "the contrib code", or "the initially integrated in backend > code"? Both, esp. backend/utils/adt/dbsize.c V1.4. and contrib/dbsize/dbsize.c from 8.0.5. You might have been irritated by the naming: relnodeOid = pg_class->relfilenode; (..) PG_RETURN_INT64(calculate_relation_size(tblspcOid, relnodeOid)); Regards, Andreas