Обсуждение: Mysterious DB reset
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.
This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:
DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';
Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.
Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.
So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------
Вложения
Israel Brewster wrote: > So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly causethis behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especiallywhile there is an active connection? Thanks for any ideas, however wild or off the wall :-) Is this running off a NFS mount or something? I'm wondering about the filesystem getting unmounted and an empty copy of the database being in the mountpoint. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)
That is odd. Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.
So if all the data went missing, and the sequence reset, the only thing I can think of is:
Someone ran:
TRUNCATE data RESTART IDENTITY;
or someone restored the table structure from a backup that deleted the original table.
Do you log DDL?
Was the table partitioned?
You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.
Thom
On 03/05/2014 10:22 AM, Israel Brewster wrote: > I have a Postgresql 9.2.4 database containing real-time tracking data > for our aircraft for the past week (theoretically). It is populated by > two different processes: one that runs every few minutes, retrieving > data from a number of sources and storing it in the DB, and one that has > an "always on" connection to the DB streaming data into the database in > realtime (often several records per second). To keep the database size > manageable I have a cron job that runs every evening to delete all > records that are more than a week old, after archiving a subset of them > in permanent storage. > > This morning my boss e-mailed me, complaining that only a couple of > aircraft were showing up in the list (SELECT distinct(tail) FROM data > being the command that populates the list). Upon looking at the data I > saw that it only went back to 4am this morning, rather than the week I > was expecting. My first thought was "Oh, I must have a typo in my > cleanup routine, such that it is deleting all records rather than only > those a week old, and it's just that no one has noticed until now". So I > looked at that, but changing the delete to a select appeared to produce > the proper results, in that no records were selected: Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM today, no records exist before 7 days ago also or am I missing something? > > DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days'; > > Then I noticed something even more odd. My database has an id column, > which is defined as a SERIAL. As we all know, a serial is a > monotonically increasing number that is not affected by deletes. > However, the oldest record in my database, from 4am this morning, had an > id of 1. Even though I KNOW there was data in the system yesterday. Even > if my DELETE command was wrong and deleted ALL records, that shouldn't > have reset the SERIAL column to 1! I also know that I have not been in > the database mucking around with the sequence value - to be completely > honest, I don't even know the exact command to reset it - I'd have to > google it if I wanted to. A sequence is just a special table. So what does SELECT * from the sequence show? > > Also odd is that my cleanup script runs at 1am. I have records of there > being new data in the database up to 3:51am, but the oldest record > currently in the DB is from 4:45am (as specified by the default of now() > on the column). So I know records were added after my delete command > ran, but before this reset occurred. I am not sure what you are calling the 'reset'? Did something happen between 3:51 AM and 4:45 AM? Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM? > > So my question is, aside from someone going in and mucking about in the > wee hours of the morning, what could possibly cause this behavior? What > sort of event could cause all data to be deleted from the table, and the > sequence to be reset? Especially while there is an active connection? > Thanks for any ideas, however wild or off the wall :-) What is in the Postgres/system logs for the time period(s) you mention? > > ----------------------------------------------- > Israel Brewster > Computer Support Technician II > Era Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7250 x7293 > ----------------------------------------------- > > > > -- Adrian Klaver adrian.klaver@aklaver.com
For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appearsto be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 03/05/2014 10:22 AM, Israel Brewster wrote: >> My first thought was "Oh, I must have a typo in my >> cleanup routine, such that it is deleting all records rather than only >> those a week old, and it's just that no one has noticed until now". So I >> looked at that, but changing the delete to a select appeared to produce >> the proper results, in that no records were selected: > > Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM today,no records exist before 7 days ago also or am I missing something? If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to confirmthe delete is correct, and therefore not the problem. > A sequence is just a special table. > > So what does SELECT * from the sequence show? tracking=> SELECT * FROM data_id_seq; sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled| is_called ---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+----------- data_id_seq | 1184 | 1 | 1 | 9223372036854775807 | 1 | 1 | 16 | f | t > >> >> Also odd is that my cleanup script runs at 1am. I have records of there >> being new data in the database up to 3:51am, but the oldest record >> currently in the DB is from 4:45am (as specified by the default of now() >> on the column). So I know records were added after my delete command >> ran, but before this reset occurred. > > I am not sure what you are calling the 'reset'? > Did something happen between 3:51 AM and 4:45 AM? Yes: All my data was deleted and the sequence reset to 1. > Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM? As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist inthe local database any more. > >> >> So my question is, aside from someone going in and mucking about in the >> wee hours of the morning, what could possibly cause this behavior? What >> sort of event could cause all data to be deleted from the table, and the >> sequence to be reset? Especially while there is an active connection? >> Thanks for any ideas, however wild or off the wall :-) > > What is in the Postgres/system logs for the time period(s) you mention? The postgres log has a lot of errors in it, some of which MAY explain the issue. For example: cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 WARNING: transaction log file "000000010000000400000094" could not be archived: too many failures LOG: received smart shutdown request LOG: autovacuum launcher shutting down LOG: shutting down LOG: database system is shut down However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries, orfrom back before I got the mount for the logs working. At this time, the mount point IS working correctly, and from whatI can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5, 9:21, whenI was messing with it yesterday), however, so there are no entries from this morning when it happened again. I don'tsee anything of interest in the syslog or messages log. > >> >> ----------------------------------------------- >> Israel Brewster >> Computer Support Technician II >> Era Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7250 x7293 >> ----------------------------------------------- >> >> >> >> > > > -- > Adrian Klaver > adrian.klaver@aklaver.com
On Mar 5, 2014, at 10:01 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Israel Brewster wrote: > >> So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possiblycause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to bereset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-) > > Is this running off a NFS mount or something? I'm wondering about the > filesystem getting unmounted and an empty copy of the database being in > the mount point. Good thought - I hadn't thought of a file system level reset. Unfortunately the data folder is on the root partition, whichis a standard SATA hard drive, and the date on the data folder shows February 27th, so apparently the folder itselfat least hasn't been re-created. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general ----------------------------------------------- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 -----------------------------------------------
Вложения
On Mar 5, 2014, at 10:01 AM, Thom Brown <thom@linux.com> wrote:
On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)That is odd. Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.
Not famillar with a logged vs unlogged table (still learning all the features of PostgreSQL), but as you said the sequence resetting is rather odd.
So if all the data went missing, and the sequence reset, the only thing I can think of is:Someone ran:TRUNCATE data RESTART IDENTITY;
Considering that I'm the only one in the company that knows SQL at all beyond a simple single-table select (I keep having to explain joins and how they aren't evil to the other programmer here), not likely :-)
or someone restored the table structure from a backup that deleted the original table.
Now that's a thought...Maybe my backup routine is working backwards (pushing from the backup archive to the primary machine rather than from the primary to the backup). I did switch primary and secondary machines a while ago, but I thought I had checked that. What would be the effect of overwriting the data files while the database is active?
Do you log DDL?
Not sure what that is, so I'll assume no :-)
Was the table partitioned?
Nope.
You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.
I'll look at updating. Thanks.
--
Thom
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------
Вложения
On 3/6/14, Israel Brewster <israel@eraalaska.net> wrote: > LOG: received smart shutdown request > LOG: autovacuum launcher shutting down > LOG: shutting down > LOG: database system is shut down > > However, there are no timestamps on any of the entries (can I fix that?) Yes, change the log_line_prefix in the postgresql.conf file and reload it. I use: log_line_prefix = '%m %u ' You might also want to use this, at least temporarily: log_statement = all -- Mike Nolan
On 03/06/2014 09:33 AM, Israel Brewster wrote: > For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appearsto be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 > > On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> On 03/05/2014 10:22 AM, Israel Brewster wrote: >>> My first thought was "Oh, I must have a typo in my >>> cleanup routine, such that it is deleting all records rather than only >>> those a week old, and it's just that no one has noticed until now". So I >>> looked at that, but changing the delete to a select appeared to produce >>> the proper results, in that no records were selected: >> >> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM today,no records exist before 7 days ago also or am I missing something? > > If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to confirmthe delete is correct, and therefore not the problem. > >> A sequence is just a special table. >> >> So what does SELECT * from the sequence show? > > tracking=> SELECT * FROM data_id_seq; > sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled| is_called > ---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+----------- > data_id_seq | 1184 | 1 | 1 | 9223372036854775807 | 1 | 1 | 16 | f | t > > >> >>> >>> Also odd is that my cleanup script runs at 1am. I have records of there >>> being new data in the database up to 3:51am, but the oldest record >>> currently in the DB is from 4:45am (as specified by the default of now() >>> on the column). So I know records were added after my delete command >>> ran, but before this reset occurred. >> >> I am not sure what you are calling the 'reset'? >> Did something happen between 3:51 AM and 4:45 AM? > > Yes: All my data was deleted and the sequence reset to 1. > >> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM? > > As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist inthe local database any more. Well something is happening. See my notes on logging below to help track down the cause. > >> >>> >>> So my question is, aside from someone going in and mucking about in the >>> wee hours of the morning, what could possibly cause this behavior? What >>> sort of event could cause all data to be deleted from the table, and the >>> sequence to be reset? Especially while there is an active connection? >>> Thanks for any ideas, however wild or off the wall :-) >> >> What is in the Postgres/system logs for the time period(s) you mention? > > The postgres log has a lot of errors in it, some of which MAY explain the issue. For example: > > cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied > LOG: archive command failed with exit code 1 > DETAIL: The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 > WARNING: transaction log file "000000010000000400000094" could not be archived: too many failures > LOG: received smart shutdown request > LOG: autovacuum launcher shutting down > LOG: shutting down > LOG: database system is shut down Might be good to explain your archive setup. > > However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries,or from back before I got the mount for the logs working. At this time, the mount point IS working correctly, andfrom what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5,9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened again.I don't see anything of interest in the syslog or messages log. Yes you can, timestamps and a lot more. For all the details go here: http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html At the least I would: log_statement = 'mod' log_connections = on log_disconnections = on log_line_prefix = '%u-%m-%x' > >> >>> >>> ----------------------------------------------- >>> Israel Brewster >>> Computer Support Technician II >>> Era Alaska >>> 5245 Airport Industrial Rd >>> Fairbanks, AK 99709 >>> (907) 450-7250 x7293 >>> ----------------------------------------------- >>> >>> >>> >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com > > >
On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 03/06/2014 09:33 AM, Israel Brewster wrote: >> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 >> >> On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> >>> On 03/05/2014 10:22 AM, Israel Brewster wrote: >>>> My first thought was "Oh, I must have a typo in my >>>> cleanup routine, such that it is deleting all records rather than only >>>> those a week old, and it's just that no one has noticed until now". So I >>>> looked at that, but changing the delete to a select appeared to produce >>>> the proper results, in that no records were selected: >>> >>> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM today,no records exist before 7 days ago also or am I missing something? >> >> If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to confirmthe delete is correct, and therefore not the problem. >> >>> A sequence is just a special table. >>> >>> So what does SELECT * from the sequence show? >> >> tracking=> SELECT * FROM data_id_seq; >> sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled| is_called >> ---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+----------- >> data_id_seq | 1184 | 1 | 1 | 9223372036854775807 | 1 | 1 | 16 | f | t >> >> >>> >>>> >>>> Also odd is that my cleanup script runs at 1am. I have records of there >>>> being new data in the database up to 3:51am, but the oldest record >>>> currently in the DB is from 4:45am (as specified by the default of now() >>>> on the column). So I know records were added after my delete command >>>> ran, but before this reset occurred. >>> >>> I am not sure what you are calling the 'reset'? >>> Did something happen between 3:51 AM and 4:45 AM? >> >> Yes: All my data was deleted and the sequence reset to 1. >> >>> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM? >> >> As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist inthe local database any more. > > Well something is happening. See my notes on logging below to help track down the cause. Yep. > >> >>> >>>> >>>> So my question is, aside from someone going in and mucking about in the >>>> wee hours of the morning, what could possibly cause this behavior? What >>>> sort of event could cause all data to be deleted from the table, and the >>>> sequence to be reset? Especially while there is an active connection? >>>> Thanks for any ideas, however wild or off the wall :-) >>> >>> What is in the Postgres/system logs for the time period(s) you mention? >> >> The postgres log has a lot of errors in it, some of which MAY explain the issue. For example: >> >> cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied >> LOG: archive command failed with exit code 1 >> DETAIL: The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 >> WARNING: transaction log file "000000010000000400000094" could not be archived: too many failures >> LOG: received smart shutdown request >> LOG: autovacuum launcher shutting down >> LOG: shutting down >> LOG: database system is shut down > > Might be good to explain your archive setup. Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is also ahot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE idin (...) ) Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts (suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) > >> >> However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries,or from back before I got the mount for the logs working. At this time, the mount point IS working correctly, andfrom what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5,9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened again.I don't see anything of interest in the syslog or messages log. > > Yes you can, timestamps and a lot more. For all the details go here: > > http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html > > At the least I would: > > log_statement = 'mod' > > log_connections = on > log_disconnections = on > > log_line_prefix = '%u-%m-%x' I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information. Thanksfor the link and information! > >> >>> >>>> >>>> ----------------------------------------------- >>>> Israel Brewster >>>> Computer Support Technician II >>>> Era Alaska >>>> 5245 Airport Industrial Rd >>>> Fairbanks, AK 99709 >>>> (907) 450-7250 x7293 >>>> ----------------------------------------------- >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Adrian Klaver >>> adrian.klaver@aklaver.com ----------------------------------------------- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 -----------------------------------------------
Вложения
On 03/06/2014 10:43 AM, Israel Brewster wrote: > On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> >> Well something is happening. See my notes on logging below to help track down the cause. > > Yep. > >> >> Might be good to explain your archive setup. > > Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is alsoa hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. > > Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE idin (...) ) > > Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts (suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. > > None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data Are all the scripts running from one machine? If so, have you checked that the times are set correctly on the various machines? > > Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end result first and then work backwards from there. > >> > > I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information. Thanksfor the link and information! > >
On Mar 6, 2014, at 10:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 03/06/2014 10:43 AM, Israel Brewster wrote: >> On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> > >>> >>> Well something is happening. See my notes on logging below to help track down the cause. >> >> Yep. >> > >>> >>> Might be good to explain your archive setup. >> >> Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is alsoa hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. >> >> Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE idin (...) ) >> >> Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts (suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. >> >> None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data > > Are all the scripts running from one machine? > If so, have you checked that the times are set correctly on the various machines? Three different machines (due to OS requirements), but yeah the times all appear to be correct. > > >> >> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) > > 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end resultfirst and then work backwards from there. Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in the"is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D > >> >>> > >> >> I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information.Thanks for the link and information! >> >> > ----------------------------------------------- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 -----------------------------------------------
Вложения
On 03/06/2014 11:09 AM, Israel Brewster wrote: > On Mar 6, 2014, at 10:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> >> Are all the scripts running from one machine? >> If so, have you checked that the times are set correctly on the various machines? > > Three different machines (due to OS requirements), but yeah the times all appear to be correct. So it appears you will have to wait to see what the logging reports. Should have mentioned that you need to make sure you do something like pg_ctl reload on the Postgres server to get the postgressql.conf changes to take effect. Another thought. Might not be bad idea to grep your scripts for ALTER, SEQUENCE or any other relevant keywords. Just in case something slipped in you where not aware of. > >> >> >>> >>> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) >> >> 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end resultfirst and then work backwards from there. > > Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in the"is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D I understood the basics of what you are doing. The details probably can wait pending the log information. Hey, all of us are beginners/learning in something. > > > ----------------------------------------------- > Israel Brewster > Computer Support Technician II > Era Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7250 x7293 > ----------------------------------------------- >
Israel Brewster-2 wrote > So my question is, aside from someone going in and mucking about in the > wee hours of the morning, what could possibly cause this behavior? What > sort of event could cause all data to be deleted from the table, and the > sequence to be reset? Especially while there is an active connection? > Thanks for any ideas, however wild or off the wall :-) Nothing so far makes this likely but any chance there is some kind of Virtual Machine setup in place where all the changes from a given day are being lost because the VM is resetting back to "factory defaults"? Also, you say you perform daily pg_dumps. Have you tried loading these up and see what their contents are? Ultimately the log files are going to be needed to do any meaningful forensic work though. Any chance you may been debugging the wrong box/database? That is always something worth verifying and usually not the difficult. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Mysterious-DB-reset-tp5794868p5795031.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 03/06/2014 09:33 AM, Israel Brewster wrote: > For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appearsto be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 > > > Also odd is that my cleanup script runs at 1am. I have records of there > being new data in the database up to 3:51am, but the oldest record > currently in the DB is from 4:45am (as specified by the default of now() > on the column). So I know records were added after my delete command > ran, but before this reset occurred. > A shot in the dark... Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that connect. I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after 4am. Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to the log file being rotated. Cheers, Steve
So the logs have given me a partial answer, specifically WHO and WHAT. I still have to track down WHY: [unknown]-2014-03-07 04:40:02.407 AKST-0LOG: connection received: host=[local] postgres-2014-03-07 04:40:02.409 AKST-0LOG: connection authorized: user=postgres database=tracking postgres-2014-03-07 04:40:02.521 AKST-0LOG: statement: DROP INDEX public.timerecp_idx; postgres-2014-03-07 04:40:02.548 AKST-0LOG: statement: DROP INDEX public.tail_idx; postgres-2014-03-07 04:40:02.553 AKST-0LOG: statement: DROP INDEX public.syncd_idx; postgres-2014-03-07 04:40:02.562 AKST-0LOG: statement: DROP INDEX public.pointtime_idx; postgres-2014-03-07 04:40:02.571 AKST-0LOG: statement: ALTER TABLE ONLY public.tails DROP CONSTRAINT tails_pkey; postgres-2014-03-07 04:40:02.577 AKST-0LOG: statement: ALTER TABLE ONLY public.data DROP CONSTRAINT data_pkey; postgres-2014-03-07 04:40:02.588 AKST-0LOG: statement: ALTER TABLE public.tails ALTER COLUMN id DROP DEFAULT; postgres-2014-03-07 04:40:02.593 AKST-0LOG: statement: ALTER TABLE public.data ALTER COLUMN id DROP DEFAULT; postgres-2014-03-07 04:40:02.601 AKST-0LOG: statement: DROP SEQUENCE public.tails_id_seq; postgres-2014-03-07 04:40:02.611 AKST-0LOG: statement: DROP TABLE public.tails; postgres-2014-03-07 04:40:02.619 AKST-0LOG: statement: DROP SEQUENCE public.data_id_seq; postgres-2014-03-07 04:40:02.627 AKST-0LOG: statement: DROP TABLE public.data; postgres-2014-03-07 04:40:02.643 AKST-0LOG: statement: DROP EXTENSION plpgsql; postgres-2014-03-07 04:40:02.654 AKST-0LOG: statement: DROP SCHEMA public; postgres-2014-03-07 04:40:02.663 AKST-0LOG: statement: CREATE SCHEMA public; postgres-2014-03-07 04:40:02.671 AKST-0LOG: statement: ALTER SCHEMA public OWNER TO postgres; postgres-2014-03-07 04:40:02.671 AKST-0LOG: statement: COMMENT ON SCHEMA public IS 'standard public schema'; postgres-2014-03-07 04:40:02.679 AKST-0LOG: statement: CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog; postgres-2014-03-07 04:40:02.689 AKST-0LOG: statement: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language'; postgres-2014-03-07 04:40:02.697 AKST-0LOG: statement: CREATE TABLE data ( id bigint NOT NULL, tail character varying(16) NOT NULL, timerecp timestamp without time zone DEFAULT now() NOT NULL, altitude integer, pointtime timestamp without time zone, lat numeric(7,5) NOT NULL, lng numeric(8,5) NOT NULL, speed integer, heading integer, source character varying(64), syncd boolean DEFAULT false ); postgres-2014-03-07 04:40:02.707 AKST-0LOG: statement: ALTER TABLE public.data OWNER TO tracking; postgres-2014-03-07 04:40:02.715 AKST-0LOG: statement: CREATE SEQUENCE data_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; postgres-2014-03-07 04:40:02.723 AKST-0LOG: statement: ALTER TABLE public.data_id_seq OWNER TO tracking; postgres-2014-03-07 04:40:02.732 AKST-0LOG: statement: ALTER SEQUENCE data_id_seq OWNED BY data.id; postgres-2014-03-07 04:40:02.740 AKST-0LOG: statement: CREATE TABLE tails ( id integer NOT NULL, tailnum character varying(8), trackingdevicenumber character varying(256) ); postgres-2014-03-07 04:40:02.749 AKST-0LOG: statement: ALTER TABLE public.tails OWNER TO tracking; postgres-2014-03-07 04:40:02.757 AKST-0LOG: statement: CREATE SEQUENCE tails_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; postgres-2014-03-07 04:40:02.766 AKST-0LOG: statement: ALTER TABLE public.tails_id_seq OWNER TO tracking; postgres-2014-03-07 04:40:02.774 AKST-0LOG: statement: ALTER SEQUENCE tails_id_seq OWNED BY tails.id; postgres-2014-03-07 04:40:02.782 AKST-0LOG: statement: ALTER TABLE ONLY data ALTER COLUMN id SET DEFAULT nextval('data_id_seq'::regclass); postgres-2014-03-07 04:40:02.791 AKST-0LOG: statement: ALTER TABLE ONLY tails ALTER COLUMN id SET DEFAULT nextval('tails_id_seq'::regclass); postgres-2014-03-07 04:40:02.799 AKST-0LOG: statement: ALTER TABLE ONLY data ADD CONSTRAINT data_pkey PRIMARY KEY (id); postgres-2014-03-07 04:40:03.066 AKST-0LOG: statement: ALTER TABLE ONLY tails ADD CONSTRAINT tails_pkey PRIMARY KEY (id); postgres-2014-03-07 04:40:03.116 AKST-0LOG: statement: CREATE INDEX pointtime_idx ON data USING btree (pointtime); postgres-2014-03-07 04:40:03.158 AKST-0LOG: statement: CREATE INDEX syncd_idx ON data USING btree (syncd); postgres-2014-03-07 04:40:03.208 AKST-0LOG: statement: CREATE INDEX tail_idx ON data USING btree (tail); postgres-2014-03-07 04:40:03.258 AKST-0LOG: statement: CREATE INDEX timerecp_idx ON data USING btree (timerecp); postgres-2014-03-07 04:40:03.309 AKST-0LOG: statement: REVOKE ALL ON SCHEMA public FROM PUBLIC; postgres-2014-03-07 04:40:03.317 AKST-0LOG: statement: REVOKE ALL ON SCHEMA public FROM postgres; postgres-2014-03-07 04:40:03.325 AKST-0LOG: statement: GRANT ALL ON SCHEMA public TO postgres; postgres-2014-03-07 04:40:03.333 AKST-0LOG: statement: GRANT ALL ON SCHEMA public TO PUBLIC; postgres-2014-03-07 04:40:03.342 AKST-0LOG: disconnection: session time: 0:00:00.935 user=postgres database=tracking host=[local] So that definitely answers what is happening: Every morning at 4:40, the tables get dropped an re-created. Wow. I KNOW Inever wrote a script to do that! I'm sort of thinking I somehow have a pg_dump with a -c flag (or perhaps the file fromsuch) that is being loaded in, but I still don't know how or why. Closer though. Thanks for the suggestions! ----------------------------------------------- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 ----------------------------------------------- On Mar 6, 2014, at 10:34 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 03/06/2014 11:09 AM, Israel Brewster wrote: >> On Mar 6, 2014, at 10:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> > >>> >>> Are all the scripts running from one machine? >>> If so, have you checked that the times are set correctly on the various machines? >> >> Three different machines (due to OS requirements), but yeah the times all appear to be correct. > > So it appears you will have to wait to see what the logging reports. Should have mentioned that you need to make sure youdo something like pg_ctl reload on the Postgres server to get the postgressql.conf changes to take effect. > > > Another thought. Might not be bad idea to grep your scripts for ALTER, SEQUENCE or any other relevant keywords. Just incase something slipped in you where not aware of. > >> >>> >>> >>>> >>>> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself ratherthan having formal training :-) I'm DEFINITELY open to suggestions :-) >>> >>> 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the endresult first and then work backwards from there. >> >> Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in the"is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D > > I understood the basics of what you are doing. The details probably can wait pending the log information. Hey, all of usare beginners/learning in something. > >> > >> >> ----------------------------------------------- >> Israel Brewster >> Computer Support Technician II >> Era Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7250 x7293 >> ----------------------------------------------- >> >
Вложения
On Mar 6, 2014, at 1:25 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 03/06/2014 09:33 AM, Israel Brewster wrote: >> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 >> >> >> Also odd is that my cleanup script runs at 1am. I have records of there >> being new data in the database up to 3:51am, but the oldest record >> currently in the DB is from 4:45am (as specified by the default of now() >> on the column). So I know records were added after my delete command >> ran, but before this reset occurred. >> > A shot in the dark... > > Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that connect.I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after 4am. Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that ran the following line at 4:40am: /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres tracking It must have been left over from before I got streaming replication up and working, when this box was still the backup serverand not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped machines. Apparentlywhen you do a pg_dump with the -c flag from a server to itself, it does the clean before reading the data, resultingin a new empty database. Thanks again for all the suggestions! > > Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to thelog file being rotated. > > Cheers, > Steve > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On 03/07/2014 11:08 AM, Israel Brewster wrote: > On Mar 6, 2014, at 1:25 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > >> On 03/06/2014 09:33 AM, Israel Brewster wrote: >>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 >>> >>> >>> Also odd is that my cleanup script runs at 1am. I have records of there >>> being new data in the database up to 3:51am, but the oldest record >>> currently in the DB is from 4:45am (as specified by the default of now() >>> on the column). So I know records were added after my delete command >>> ran, but before this reset occurred. >>> >> A shot in the dark... >> >> Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that connect.I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after 4am. > > Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that ran the following line at 4:40am: > > /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres tracking > > It must have been left over from before I got streaming replication up and working, when this box was still the backupserver and not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped machines.Apparently when you do a pg_dump with the -c flag from a server to itself, it does the clean before reading thedata, resulting in a new empty database. Thanks again for all the suggestions! No, -c just tells pg_dump to output clean commands and really only has meaning when you do a plain text dump as above. pg_dump does not clean the data from the running server on its own. What did the cleaning was immediately piping to output of the dump to psql. So basically you created a plain text dump file and fed it back to the server and the dump file included commands to clean out objects. If you had not used -c you would have gotten a bunch of duplicate <something> errors. > >> >> Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to thelog file being rotated. >> >> Cheers, >> Steve >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > > -- Adrian Klaver adrian.klaver@aklaver.com
----------------------------------------------- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 ----------------------------------------------- On Mar 7, 2014, at 12:07 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 03/07/2014 11:08 AM, Israel Brewster wrote: >> On Mar 6, 2014, at 1:25 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: >> >>> On 03/06/2014 09:33 AM, Israel Brewster wrote: >>>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 >>>> >>>> >>>> Also odd is that my cleanup script runs at 1am. I have records of there >>>> being new data in the database up to 3:51am, but the oldest record >>>> currently in the DB is from 4:45am (as specified by the default of now() >>>> on the column). So I know records were added after my delete command >>>> ran, but before this reset occurred. >>>> >>> A shot in the dark... >>> >>> Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that connect.I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after 4am. >> >> Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that ran the following line at 4:40am: >> >> /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres tracking >> >> It must have been left over from before I got streaming replication up and working, when this box was still the backupserver and not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped machines.Apparently when you do a pg_dump with the -c flag from a server to itself, it does the clean before reading thedata, resulting in a new empty database. Thanks again for all the suggestions! > > No, -c just tells pg_dump to output clean commands and really only has meaning when you do a plain text dump as above.pg_dump does not clean the data from the running server on its own. What did the cleaning was immediately piping tooutput of the dump to psql. So basically you created a plain text dump file and fed it back to the server and the dumpfile included commands to clean out objects. If you had not used -c you would have gotten a bunch of duplicate <something>errors. Right, but I ended up with no data. So what I was getting at was that because I had the -c, which output clean commands,those clean commands were executed before pg_dump dumped the data. You are absolutely correct in saying that itwas only executed at all because it was piped to psql, I wasn't claiming otherwise :-) However, the sequence of eventsclearly is: 1) pg_dump outputs "clean" commands. The pipe to psql runs these, deleting the tables/data/etc. 2) pg_dumpoutputs the rebuild commands, which are piped to psql, which rebuilds the tables/etc 3) pg_dump tries to dump thedata, but there is none (other than the default starting data in the sequences) due to steps 1 and 2 already having beenexecuted (by the pipe to psql) on the same database that pg_dump is running on. Sorry for not being clear :-) > > >> >>> >>> Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to thelog file being rotated. >>> >>> Cheers, >>> Steve >>> >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general >> >> >> > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Вложения
On 03/07/2014 03:10 PM, Israel Brewster wrote: >>> /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres tracking >>> >>> It must have been left over from before I got streaming replication up and working, when this box was still the backupserver and not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped machines.Apparently when you do a pg_dump with the -c flag from a server to itself, it does the clean before reading thedata, resulting in a new empty database. Thanks again for all the suggestions! >> >> No, -c just tells pg_dump to output clean commands and really only has meaning when you do a plain text dump as above.pg_dump does not clean the data from the running server on its own. What did the cleaning was immediately piping tooutput of the dump to psql. So basically you created a plain text dump file and fed it back to the server and the dumpfile included commands to clean out objects. If you had not used -c you would have gotten a bunch of duplicate <something>errors. > > Right, but I ended up with no data. So what I was getting at was that because I had the -c, which output clean commands,those clean commands were executed before pg_dump dumped the data. You are absolutely correct in saying that itwas only executed at all because it was piped to psql, I wasn't claiming otherwise :-) However, the sequence of eventsclearly is: 1) pg_dump outputs "clean" commands. The pipe to psql runs these, deleting the tables/data/etc. 2) pg_dumpoutputs the rebuild commands, which are piped to psql, which rebuilds the tables/etc 3) pg_dump tries to dump thedata, but there is none (other than the default starting data in the sequences) due to steps 1 and 2 already having beenexecuted (by the pipe to psql) on the same database that pg_dump is running on. > > Sorry for not being clear :-) Yea, a snake eating its tail. > >> >> > -- Adrian Klaver adrian.klaver@aklaver.com
One point - a serial datatype will not increment to infinity, as it is effectively a 4 byte integer with a sequence imposed, it can only store values upto MAXINT (2147483647) . Above this it may well wrap around where MAXINT + 1 = 1
You can delay the problem (significantly) by using bigserial (8 byte integer) instead of serial - this has MAXINT=9223372036854775807
http://www.postgresql.org/docs/9.2/static/datatype-numeric.html
Otherwise you might run a cron job or trigger to reset the serial values & the sequence when you think it timely.
I can't see how this would cause the missing records though.
Cheers
--
Thom
-- You can delay the problem (significantly) by using bigserial (8 byte integer) instead of serial - this has MAXINT=9223372036854775807
http://www.postgresql.org/docs/9.2/static/datatype-numeric.html
Otherwise you might run a cron job or trigger to reset the serial values & the sequence when you think it timely.
I can't see how this would cause the missing records though.
Cheers
Brent Wood
Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529
Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Thom Brown [thom@linux.com]
Sent: Thursday, March 06, 2014 8:01 AM
To: Israel Brewster
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Mysterious DB reset
Sent: Thursday, March 06, 2014 8:01 AM
To: Israel Brewster
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Mysterious DB reset
On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)
That is odd. Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.
So if all the data went missing, and the sequence reset, the only thing I can think of is:
Someone ran:
TRUNCATE data RESTART IDENTITY;
or someone restored the table structure from a backup that deleted the original table.
Do you log DDL?
Was the table partitioned?
You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.
Thom
Please consider the environment before printing this email.
NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.