Обсуждение: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY
If XLOG_DBASE_CREATE_FILE_COPY occurs between an incremental backup and its reference backup, every relation whose DB OID and tablespace OID match the corresponding values in that record should be backed up in full. Currently that's not happening, because the WAL summarizer doesn't see the XLOG_DBASE_CREATE_FILE_COPY as referencing any particular relfilenode and so basically ignores it. The same happens for XLOG_DBASE_CREATE_WAL_LOG, but that case is OK because that only covers creating the directory itself, not anything underneath it, and there will be separate WAL records telling us the relfilenodes created below the new directory and the pages modified therein. AFAICS, fixing this requires some way of noting in the WAL summary file that an entire directory got blown away. I chose to do that by setting the limit block to 0 for a fake relation with the given DB OID and TS OID and relfilenumber 0, which seems natural. Patch with test case attached. The test case in brief is: initdb -c summarize_wal=on # start the server in $PGDATA psql -c 'create database lakh oid = 100000 strategy = file_copy' postgres psql -c 'create table t1 (a int)' lakh pg_basebackup -cfast -Dt1 dropdb lakh psql -c 'create database lakh oid = 100000 strategy = file_copy' postgres pg_basebackup -cfast -Dt2 --incremental t1/backup_manifest pg_combinebackup t1 t2 -o result # stop the server, restart from the result directory psql -c 'select * from t1' lakh Without this patch, you get something like: ERROR: could not open file "base/100000/16388": No such file or directory ...because the catalog entries from before the database is dropped and recreated manage to end up in pg_combinebackup's output directory, which they should not. With the patch, you correctly get an error about t1 not existing. I thought about whether there were any other WAL records that have similar problems to XLOG_DBASE_CREATE_FILE_COPY and didn't come up with anything. If anyone knows of any similar cases, please let me know. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
On Fri, Feb 23, 2024 at 08:47:52PM +0530, Robert Haas wrote: > If XLOG_DBASE_CREATE_FILE_COPY occurs between an incremental backup > and its reference backup, every relation whose DB OID and tablespace > OID match the corresponding values in that record should be backed up > in full. Currently that's not happening, because the WAL summarizer > doesn't see the XLOG_DBASE_CREATE_FILE_COPY as referencing any > particular relfilenode and so basically ignores it. The same happens > for XLOG_DBASE_CREATE_WAL_LOG, but that case is OK because that only > covers creating the directory itself, not anything underneath it, and > there will be separate WAL records telling us the relfilenodes created > below the new directory and the pages modified therein. XLOG_DBASE_CREATE_WAL_LOG creates PG_VERSION in addition to creating the directory. I see your patch covers it. > I thought about whether there were any other WAL records that have > similar problems to XLOG_DBASE_CREATE_FILE_COPY and didn't come up > with anything. If anyone knows of any similar cases, please let me > know. Regarding records the summarizer potentially can't ignore that don't deal in relfilenodes, these come to mind: XLOG_DBASE_DROP - covered in this thread's patch XLOG_RELMAP_UPDATE XLOG_TBLSPC_CREATE XLOG_TBLSPC_DROP XLOG_XACT_PREPARE Also, any record that writes XIDs needs to update nextXid; likewise for other ID spaces. See the comment at "XLOG stuff" in heap_lock_tuple(). Perhaps you don't summarize past a checkpoint, making that irrelevant. If walsummarizer.c handles any of the above, my brief look missed it. I also didn't find the string "clog" or "slru" anywhere in dc21234 "Add support for incremental backup", 174c480 "Add a new WAL summarizer process.", or thread https://postgr.es/m/flat/CA%2BTgmoYOYZfMCyOXFyC-P%2B-mdrZqm5pP2N7S-r0z3_402h9rsA%40mail.gmail.com "trying again to get incremental backup". I wouldn't be surprised if you treat clog, pg_filenode.map, and/or 2PC state files as unconditionally non-incremental, in which case some of the above doesn't need explicit summarization code. I stopped looking for that logic, though. > --- a/src/backend/postmaster/walsummarizer.c > +++ b/src/backend/postmaster/walsummarizer.c > + * Technically, this special handling is only needed in the case of > + * XLOG_DBASE_CREATE_FILE_COPY, because that can create a whole bunch > + * of relation files in a directory without logging anything > + * specific to each one. If we didn't mark the whole DB OID/TS OID > + * combination in some way, then a tablespace that was dropped after s/tablespace/database/ I suspect. > + * the reference backup and recreated using the FILE_COPY method prior > + * to the incremental backup would look just like one that was never > + * touched at all, which would be catastrophic.
On Sat, Feb 24, 2024 at 10:05 AM Noah Misch <noah@leadboat.com> wrote: > Regarding records the summarizer potentially can't ignore that don't deal in > relfilenodes, these come to mind: > > XLOG_DBASE_DROP - covered in this thread's patch > XLOG_RELMAP_UPDATE > XLOG_TBLSPC_CREATE > XLOG_TBLSPC_DROP > XLOG_XACT_PREPARE At present, only relation data files are ever sent incrementally; I don't think any of these touch those. > Also, any record that writes XIDs needs to update nextXid; likewise for other > ID spaces. See the comment at "XLOG stuff" in heap_lock_tuple(). Perhaps you > don't summarize past a checkpoint, making that irrelevant. I'm not quite following this. It's true that we summarize from one redo pointer to the next; but also, our summary is only trying to ascertain which relation data blocks have been modified. Therefore, I don't understand the relevance of nextXid here. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, Feb 24, 2024 at 04:16:24PM +0530, Robert Haas wrote: > On Sat, Feb 24, 2024 at 10:05 AM Noah Misch <noah@leadboat.com> wrote: > > On Fri, Feb 23, 2024 at 08:47:52PM +0530, Robert Haas wrote: > > > I thought about whether there were any other WAL records that have > > > similar problems to XLOG_DBASE_CREATE_FILE_COPY and didn't come up > > > with anything. If anyone knows of any similar cases, please let me > > > know. > > > > Regarding records the summarizer potentially can't ignore that don't deal in > > relfilenodes, these come to mind: > > > > XLOG_DBASE_DROP - covered in this thread's patch > > XLOG_RELMAP_UPDATE > > XLOG_TBLSPC_CREATE > > XLOG_TBLSPC_DROP > > XLOG_XACT_PREPARE > > At present, only relation data files are ever sent incrementally; I > don't think any of these touch those. Agreed, those don't touch relation data files. I think you've got all relation data file mutations. XLOG_DBASE_CREATE_FILE_COPY and XLOG_DBASE_DROP are the only record types that touch a relation data file without mentioning it in XLogRecordBlockHeader, XACT_XINFO_HAS_RELFILELOCATORS, or an RM_SMGR_ID rlocator field. > > Also, any record that writes XIDs needs to update nextXid; likewise for other > > ID spaces. See the comment at "XLOG stuff" in heap_lock_tuple(). Perhaps you > > don't summarize past a checkpoint, making that irrelevant. > > I'm not quite following this. It's true that we summarize from one > redo pointer to the next; but also, our summary is only trying to > ascertain which relation data blocks have been modified. Therefore, I > don't understand the relevance of nextXid here. No relevance, given incremental backup is incremental with respect to relation data blocks only.
On Sat, Feb 24, 2024 at 12:10 PM Noah Misch <noah@leadboat.com> wrote: > Agreed, those don't touch relation data files. I think you've got all > relation data file mutations. XLOG_DBASE_CREATE_FILE_COPY and XLOG_DBASE_DROP > are the only record types that touch a relation data file without mentioning > it in XLogRecordBlockHeader, XACT_XINFO_HAS_RELFILELOCATORS, or an RM_SMGR_ID > rlocator field. Thanks for the review. I have committed this. -- Robert Haas EDB: http://www.enterprisedb.com