Re: backup manifests
От | David Steele |
---|---|
Тема | Re: backup manifests |
Дата | |
Msg-id | b2b696b4-b11f-e954-c86a-8252a62a2e40@pgmasters.net обсуждение исходный текст |
Ответ на | Re: backup manifests (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: backup manifests
(Robert Haas <robertmhaas@gmail.com>)
Re: backup manifests (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
Hi Robert, On 9/19/19 9:51 AM, Robert Haas wrote: > On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote: >> Also consider adding the timestamp. > > Sounds reasonable, even if only for the benefit of humans who might > look at the file. We can decide later whether to use it for anything > else (and third-party tools could make different decisions from core). > I assume we're talking about file mtime here, not file ctime or file > atime or the time the manifest was generated, but let me know if I'm > wrong. In my experience only mtime is useful. >> Based on my original calculations (which sadly I don't have anymore), >> the combination of SHA1, size, and file name is *extremely* unlikely to >> generate a collision. As in, unlikely to happen before the end of the >> universe kind of unlikely. Though, I guess it depends on your >> expectations for the lifetime of the universe. > What I'd say is: if > the probability of getting a collision is demonstrably many orders of > magnitude less than the probability of the disk writing the block > incorrectly, then I think we're probably reasonably OK. Somebody might > differ, which is perhaps a mild point in favor of LSN-based > approaches, but as a practical matter, if a bad block is a billion > times more likely to be the result of a disk error than a checksum > mismatch, then it's a negligible risk. Agreed. >> We include the version/sysid of the cluster to avoid mixups. It's a >> great extra check on top of references to be sure everything is kosher. > > I don't think it's a good idea to duplicate the information that's > already in the backup_label. Storing two copies of the same > information is just an invitation to having to worry about what > happens if they don't agree. OK, but now we have backup_label, tablespace_map, XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a backup.manifest file. I feel like we may be drowning in backup info files. >> I'd >> recommend JSON for the format since it is so ubiquitous and easily >> handles escaping which can be gotchas in a home-grown format. We >> currently have a format that is a combination of Windows INI and JSON >> (for human-readability in theory) and we have become painfully aware of >> escaping issues. Really, why would you drop files with '=' in their >> name in PGDATA? And yet it happens. > > I am not crazy about JSON because it requires that I get a json parser > into src/common, which I could do, but given the possibly-imminent end > of the universe, I'm not sure it's the greatest use of time. You're > right that if we pick an ad-hoc format, we've got to worry about > escaping, which isn't lovely. My experience is that JSON is simple to implement and has already dealt with escaping and data structure considerations. A home-grown solution will be at least as complex but have the disadvantage of being non-standard. >>> One thing I'm not quite sure about is where to store the backup >>> manifest. If you take a base backup in tar format, you get base.tar, >>> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace. >>> Does the backup manifest go into base.tar? Get written into a separate >>> file outside of any tar archive? Something else? And what about a >>> plain-format backup? I suppose then we should just write the manifest >>> into the top level of the main data directory, but perhaps someone has >>> another idea. >> >> We do: >> >> [backup_label]/ >> backup.manifest >> pg_data/ >> pg_tblspc/ >> >> In general, having the manifest easily accessible is ideal. > > That's a fine choice for a tool, but a I'm talking about something > that is part of the actual backup format supported by PostgreSQL, not > what a tool might wrap around it. The choice is whether, for a > tar-format backup, the manifest goes inside a tar file or as a > separate file. To put that another way, a patch adding backup > manifests does not get to redesign where pg_basebackup puts anything > else; it only gets to decide where to put the manifest. Fair enough. The point is to make the manifest easily accessible. I'd keep it in the data directory for file-based backups and as a separate file for tar-based backups. The advantage here is that we can pick a file name that becomes reserved which a tool can't do. Regards, -- -David david@pgmasters.net
В списке pgsql-hackers по дате отправления: