Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Дата	23 марта 2017 г. 18:53:14
Msg-id	CAMsr+YEBnaccSo1gZampcWrDbwCL5a1h+4FPyMO=ReKQY-KLrA@mail.gmail.com обсуждение исходный текст
Ответ на	[HACKERS] GSOC'17 project introduction: Parallel COPY execution with errorshandling (Alexey Kondratov <kondratov.aleksey@gmail.com>)
Ответы	Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling (Stas Kelvich <stas.kelvich@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On 23 March 2017 at 19:33, Alexey Kondratov <kondratov.aleksey@gmail.com> wrote:

> (1) Add errors handling to COPY as a minimum program

Huge +1 if you can do it in an efficient way.

I think the main barrier to doing so is that the naïve approach
creates a subtransaction for every row, which is pretty dire in
performance terms and burns transaction IDs very rapidly.

Most of our datatype I/O functions, etc, have no facility for being
invoked in a mode where they fail nicely and clean up after
themselves. We rely on unwinding the subtransaction's memory context
for error handling, for releasing any LWLocks that were taken, etc.
There's no try_timestamptz_in function or anything, just
timestamptz_in, and it ERROR's if it doesn't like its input. You
cannot safely PG_TRY / PG_CATCH such an exception and continue
processing to, say, write another row.

Currently we also don't have a way to differentiate between

* "this row is structurally invalid" (wrong number of columns, etc)
* "this row is structually valid but has fields we could not parse
into their data types"
* "this row looks structurally valid and has data types we could
parse, but does not satisfy a constraint on the destination table"

Nor do we have a way to write to any kind of failure-log table in the
database, since a simple approach relies on aborting subtransactions
to clean up failed inserts so it can't write anything for failed rows.
Not without starting a 2nd subxact to record the failure, anyway.

So, having said why it's hard, I don't really have much for you in
terms of suggestions for ways forward. User-defined data types,
user-defined constraints and triggers, etc mean anything involving
significant interface changes will be a hard sell, especially in
something pretty performance-sensitive like COPY.

I guess it'd be worth setting out your goals first. Do you want to
handle all the kinds of problems above? Malformed rows, rows with
malformed field values, and rows that fail to satisfy a constraint? or
just some subset?

-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 23 марта 2017 г., 18:51:55
Сообщение: Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Следующее

От: Peter Eisentraut
Дата: 23 марта 2017 г., 19:00:16
Сообщение: Re: [HACKERS] Logical replication existing data copy

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Предыдущее

Следующее