Here is the situation:
I have a standby postgres which is fed a WAL File every 2 minutes.
Whenever it is fed a WAL file it logs the following:
---
LOG: restored log file "000000010000000000000070" from archive
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000071 to pg_xlog/RECOVERYXLOG
LOG: restored log file "000000010000000000000071" from archive
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000072 to pg_xlog/RECOVERYXLOG
LOG: restored log file "000000010000000000000072" from archive
...
...
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000082 to pg_xlog/RECOVERYXLOG
LOG: restored log file "000000010000000000000082" from archive
---
I assume that the above situation is a happy postgres in a recovery
mode. The "copyWALFile" is my message in the serverlog.
After a while, the primary gives up. That is it goes down and I am not
able to pull any WAL file from the primary. So I tell the standby that
I do not have any WAL File to give.
----
LOG: could not open file "pg_xlog/000000010000000000000083" (log file
0, segment 131): No such file or directory
LOG: redo done at 0/8200D280
Main: Triggering recovery
PANIC: could not open file "pg_xlog/000000010000000000000082" (log
file 0, segment 130): No such file or directory
---
The issue above is that I do not have the "001...0083" file and I
return a "file not found". Further when the postgres asks me about
"001...0082", I do not have that either, since in the intervening
minutes, I have moved that file out of my /opt/data/mirror to
/opt/data/tape directory for long term tape storage. So how do I make
my standby postgres happy?
Having run into that situation, the standby also spits out the following:
---
LOG: could not open file "pg_xlog/000000010000000000000082" (log file
0, segment 130): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000010000000000000080" (log file
0, segment 128): No such file or directory
LOG: invalid secondary checkpoint record
---
What is happening is that the postgres is looking behind in time for
the "0001...0082" and "0001...0080" files.
The question I have is, how far does it look behind in time? Then I
have to be careful of when I move the WAL file out to tape. Further if
I know how far back in time I have to keep my WAL file, then I can
device an effective strategy of removing older files. That is if I
come and say that I generate WAL file every 2 minutes, then do I keep
10 files or 120 files?
Any insight on this will help.
Regards
Dhaval