Обсуждение: Postgresql 9.1 pg_last_xact_replay_timestamp limitations
Hi everyone, I am very pleased to see the addition of the pg_last_xact_replay_timestamp function in Postgresql 9.1 since this, in combinationwith hot standby and WAL log streaming, will seriously boost the performance of our postgresql database cluster.pg_last_xact_replay_timestamp is important to us because the client application keeps a cache and makes queries withthis structure: select stuff from table_name where not_modified_since > $last_not_modified_since_value_we_gave_to_the_client; This way the client application only gets recent changes. $last_not_modified_since_value_we_gave_to_the_client is simply"now()" on master databases. In case of queries made on read-only (hot standby) databases, pg_last_xact_replay_timestamp()will be used. However, pg_last_xact_replay_timestamp() returns null when the server is restarteduntil a new transaction is streamed to the hot standby server. It might take a long time before this happens. Becauseof this, we can't rely this function completely. Since forcing an update on the master database is not a clean solution, another possibility would be to create a custom functionthat takes the value of pg_last_xact_replay_timestamp() and save it on disk. If the value is null (the server wasrestarted), we then read and return of last value stored on disk instead. Is there any better way? Also, is there anyplans to make pg_last_xact_replay_timestamp() reliable even after a restart? Thank you, Gabi Julien
On Wed, Dec 8, 2010 at 1:31 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: > pg_last_xact_replay_timestamp() returns null when the server is restarted until a new transaction is streamed to the hotstandby server. It might take a long time before this happens. Because of this, we can't rely this function completely. I couldn't reproduce this. Could you provide a self-contained test case? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tuesday 07 December 2010 21:58:56 you wrote: > On Wed, Dec 8, 2010 at 1:31 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: > > pg_last_xact_replay_timestamp() returns null when the server is restarted until a new transaction is streamed to thehot standby server. It might take a long time before this happens. Because of this, we can't rely this function completely. > > I couldn't reproduce this. Could you provide a self-contained test case? I have merge the pg_last_xact_replay_timestamp path (7ba6e4f0e08bd7bdf4d12974ac1e37fb0459c97c) to 9.0 to get that resultso maybe my tests are invalid although the patch looks simple enough. I'll try to reproduce using 9.1alpha. What Ihave is 1 master and 1 slave. I do not use WAL log shipping, only streaming. Here's my recovery.conf on the slave: ======== standby_mode = 'on' primary_conninfo = 'host=master_host_name port=5432' trigger_file = '/opt/postgresql/data/finish.replication' ======== The master postgresql.conf is fairly normal except for this: ======= wal_level = hot_standby ======= Same for the slave except for this: ======= hot_standby = on ======= Now if I do: ============ master# /etc/init.d/postgresql start slave# /etc/init.d/postgresql start slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" pg_last_xact_replay_timestamp | not_modified_since -------------------------------+------------------------------- | 2010-12-08 16:06:09.920219+00 master# psql -hlocalhost my_db -c "create table trigger_transaction_shipping(a numeric); drop table trigger_transaction_shipping;" DROP TABLE slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" pg_last_xact_replay_timestamp | not_modified_since -------------------------------+------------------------------- 2010-12-08 16:07:34.408892+00 | 2010-12-08 16:08:07.969588+00 ============ Only after the first update from the master do I get my pg_last_xact_replay_timestamp timestamp. Regards, Gabi Julien
I just tried with postgresql 9.1alpha from http://www.enterprisedb.com/products/pgdevdownload.do (linux x86-32): postgres=# select pg_last_xact_replay_timestamp(), now() as not_modified_since; ERROR: function pg_last_xact_replay_timestamp() does not exist LINE 1: select pg_last_xact_replay_timestamp(), now() as not_modifie... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. A bug in the package? I guess they must have forgot to run genbki.sh and the function is missing from ./share/postgresql/postgres.bki.If I add the line manually and create the data folder after, it is still not working. A few precisions concerning my postgresql.conf I mentionned earlier: master as: ===== wal_level = hot_standby max_wal_senders = 30 wal_keep_segments = 100 ===== slave as: ===== hot_standby = on ===== On Wednesday 08 December 2010 11:37:51 Gabi Julien wrote: > On Tuesday 07 December 2010 21:58:56 you wrote: > > On Wed, Dec 8, 2010 at 1:31 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: > > > pg_last_xact_replay_timestamp() returns null when the server is restarted until a new transaction is streamed to thehot standby server. It might take a long time before this happens. Because of this, we can't rely this function completely. > > > > I couldn't reproduce this. Could you provide a self-contained test case? > > I have merge the pg_last_xact_replay_timestamp path (7ba6e4f0e08bd7bdf4d12974ac1e37fb0459c97c) to 9.0 to get that resultso maybe my tests are invalid although the patch looks simple enough. I'll try to reproduce using 9.1alpha. What Ihave is 1 master and 1 slave. I do not use WAL log shipping, only streaming. Here's my recovery.conf on the slave: > > ======== > standby_mode = 'on' > primary_conninfo = 'host=master_host_name port=5432' > trigger_file = '/opt/postgresql/data/finish.replication' > ======== > > The master postgresql.conf is fairly normal except for this: > > ======= > wal_level = hot_standby > ======= > > Same for the slave except for this: > > ======= > hot_standby = on > ======= > > Now if I do: > > ============ > master# /etc/init.d/postgresql start > > slave# /etc/init.d/postgresql start > slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" > pg_last_xact_replay_timestamp | not_modified_since > -------------------------------+------------------------------- > | 2010-12-08 16:06:09.920219+00 > > master# psql -hlocalhost my_db -c "create table trigger_transaction_shipping(a numeric); drop table trigger_transaction_shipping;" > DROP TABLE > > slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" > pg_last_xact_replay_timestamp | not_modified_since > -------------------------------+------------------------------- > 2010-12-08 16:07:34.408892+00 | 2010-12-08 16:08:07.969588+00 > ============ > > Only after the first update from the master do I get my pg_last_xact_replay_timestamp timestamp. > > Regards, > Gabi Julien > >
Gabi Julien <gabi.julien@broadsign.com> writes: > I just tried with postgresql 9.1alpha from http://www.enterprisedb.com/products/pgdevdownload.do (linux x86-32): > postgres=# select pg_last_xact_replay_timestamp(), now() as not_modified_since; > ERROR: function pg_last_xact_replay_timestamp() does not exist > LINE 1: select pg_last_xact_replay_timestamp(), now() as not_modifie... > ^ > HINT: No function matches the given name and argument types. You might need to add explicit type casts. > A bug in the package? No, only lack of a time machine. That function was added on 2010-11-09 according to the git logs. alpha2 froze at the end of October. regards, tom lane
On Thu, Dec 9, 2010 at 1:37 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: > slave# /etc/init.d/postgresql start > slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" > pg_last_xact_replay_timestamp | not_modified_since > -------------------------------+------------------------------- > | 2010-12-08 16:06:09.920219+00 pg_last_xact_replay_timestamp returns the timestamp of last *replayed* transaction. So it returns NULL until at least one transaction has been replayed. In your case, I guess that you started the master and standby from the same initial database cluster or clean-shutdowned one. In this case, since the standby has no transaction to replay right after the startup, you got NULL until you executed the write query on the master. We should return the timestamp of last valid checkpoint rather than NULL in that case? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wednesday 08 December 2010 21:58:46 you wrote: > On Thu, Dec 9, 2010 at 1:37 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: > > slave# /etc/init.d/postgresql start > > slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" > > pg_last_xact_replay_timestamp | not_modified_since > > -------------------------------+------------------------------- > > | 2010-12-08 16:06:09.920219+00 > We should return the timestamp of last valid checkpoint rather than NULL in that > case? Well, I think this behavior would be more appreciated by postgresql users in general. The case where the slave can be restartedafter a clean shutdown is rare but we need to consider it nonetheless. In my case I implemented a custom functionthat reads the last returned timestamp from a new file on disk. This is not a perfect solution since the value returnedmight be older then the actual state of the replication but it's good enough for my needs. Regards, Gabi Julien
On Fri, Dec 10, 2010 at 1:24 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: > On Wednesday 08 December 2010 21:58:46 you wrote: >> On Thu, Dec 9, 2010 at 1:37 AM, Gabi Julien <gabi.julien@broadsign.com> wrote: >> > slave# /etc/init.d/postgresql start >> > slave# psql -hlocalhost my_db -c "select pg_last_xact_replay_timestamp(), now() as not_modified_since;" >> > pg_last_xact_replay_timestamp | not_modified_since >> > -------------------------------+------------------------------- >> > | 2010-12-08 16:06:09.920219+00 > >> We should return the timestamp of last valid checkpoint rather than NULL in that >> case? > > Well, I think this behavior would be more appreciated by postgresql users in general. The case where the slave can be restartedafter a clean shutdown is rare but we need to consider it nonetheless. In my case I implemented a custom functionthat reads the last returned timestamp from a new file on disk. This is not a perfect solution since the value returnedmight be older then the actual state of the replication but it's good enough for my needs. The second question is; What should be returned when the server has been started normally without recovery? NULL? The timestamp of last valid checkpoint? The third question is; What should be returned while replaying WAL records which exist between REDO starting point and checkpoint? In this case, it seems bad to return the timestamp of the checkpoint whenever there is no replay transaction, since the result timestamp would go back once at least one transaction has been replayed before reaching the checkpoint record. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center