Re: Improve COPY performance for large data sets

Поиск
Список
Период
Сортировка
От Dimitri Fontaine
Тема Re: Improve COPY performance for large data sets
Дата
Msg-id 56D9574D-9EB3-410B-9FBA-B1C7329B9E81@hi-media.com
обсуждение исходный текст
Ответ на Re: Improve COPY performance for large data sets  (Bill Moran <wmoran@collaborativefusion.com>)
Список pgsql-performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 10 sept. 08 à 19:16, Bill Moran a écrit :
> There's a program called pgloader which supposedly is faster than
> copy.
> I've not used it so I can't say definitively how much faster it is.

In fact pgloader is using COPY under the hood, and doing so via a
network connection (could be unix domain socket), whereas COPY on the
server reads the file content directly from the local file. So no,
pgloader is not good for being faster than copy.

That said, pgloader is able to split the workload between as many
threads as you want to, and so could saturate IOs when the disk
subsystem performs well enough for a single CPU not to be able to
overload it. Two parallel loading mode are supported, pgloader will
either hav N parts of the file processed by N threads, or have one
thread read and parse the file then fill up queues for N threads to
send COPY commands to the server.

Now, it could be that using pgloader with a parallel setup performs
better than plain COPY on the server. This remains to get tested, the
use case at hand is said to be for hundreds of GB or some TB data
file. I don't have any facilities to testdrive such a setup...

Note that those pgloader parallel options have been asked by
PostgreSQL hackers in order to testbed some ideas with respect to a
parallel pg_restore, maybe re-explaining what have been implemented
will reopen this can of worms :)

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjINB0ACgkQlBXRlnbh1bmhkgCgu4TduBB0bnscuEsy0CCftpSp
O5IAoMsrPoXAB+SJEr9s5pMCYBgH/CNi
=1c5H
-----END PGP SIGNATURE-----

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Effects of setting linux block device readahead size
Следующее
От: "Scott Marlowe"
Дата:
Сообщение: Re: Improve COPY performance for large data sets