Re: COPY enhancements

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: COPY enhancements
Дата
Msg-id 603c8f070910081143h2232509agd137f4023a4c6315@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COPY enhancements  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Oct 8, 2009 at 1:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Oct 8, 2009 at 12:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Another approach that was discussed earlier was to divvy the rows into
>>> batches.  Say every thousand rows you sub-commit and start a new
>>> subtransaction.  Up to that point you save aside the good rows somewhere
>>> (maybe a tuplestore).  If you get a failure partway through a batch,
>>> you start a new subtransaction and re-insert the batch's rows up to the
>>> bad row.  This could be pretty awful in the worst case, but most of the
>>> time it'd probably perform well.  You could imagine dynamically adapting
>>> the batch size depending on how often errors occur ...
>
>> Yeah, I think that's promising.  There is of course the possibility
>> that a row which previously succeeded could fail the next time around,
>> but most of the time that shouldn't happen, and it should be possible
>> to code it so that it still behaves somewhat sanely if it does.
>
> Actually, my thought was that failure to reinsert a previously good
> tuple should cause us to abort the COPY altogether.  This is a
> cheap-and-easy way of avoiding sorceror's apprentice syndrome.
> Suppose the failures are coming from something like out of disk space,
> transaction timeout, whatever ... a COPY that keeps on grinding no
> matter what is *not* ideal.

I think you handle that by putting a cap on the total number of errors
you're willing to accept (and in any event you'll always skip the row
that failed, so forward progress can't cease altogether).  For out of
disk space or transaction timeout, sure, but you might also have
things like a serialization error that occurs on the reinsert that
didn't occur on the original.  You don't want that to kill the whole
bulk load, I would think.

...Robert


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Issues for named/mixed function notation patch
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Using results from INSERT ... RETURNING