Re: WIP/PoC for parallel backup

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: WIP/PoC for parallel backup
Дата	7 октября 2019 г. 16:05:34
Msg-id	CA+TgmoaYkBLGkYDjHaN41+QFg0e9j-T+zWW8w2Z3Yv7hx+mAqg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>)
Ответы	Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>) Re: WIP/PoC for parallel backup (Ibrar Ahmed <ibrar.ahmad@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
>  is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.  I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other.  I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 07 октября 2019 г., 16:00:35
Сообщение: Re: stress test for parallel workers

Следующее

От: Robert Haas
Дата: 07 октября 2019 г., 16:09:32
Сообщение: Re: Shared memory

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WIP/PoC for parallel backup

Предыдущее

Следующее