Re: Breaking up a PostgreSQL COPY command into chunks?

Поиск
Список
Период
Сортировка
От Victor Hooi
Тема Re: Breaking up a PostgreSQL COPY command into chunks?
Дата
Msg-id CAMnnoULRY8gdDboH=Z_cdsxVyRC5C7DuCMdb0nFjm8qzaEHOMw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Breaking up a PostgreSQL COPY command into chunks?  (Francisco Olarte <folarte@peoplecall.com>)
Ответы Re: Breaking up a PostgreSQL COPY command into chunks?  (Francisco Olarte <folarte@peoplecall.com>)
Список pgsql-general
Hi,

@Francisco - Yeah, the file is around 600 Mb currently, uncompressed.

You're right, our internet connection is going to be the limiting factor.

Essentially, the PostgreSQL server is in a datacentre, the server we're dumping to is in the office.

Running a script on the PostgreSQL server in the datacentre is going to be tricky (not so much technically, just from a procedures/security point of view).

Dumping to a spare table seems like an interesting point - so we'd just create the table, COPY the results to that table, then use LIMIT/OFFSET to paginate through that, then drop the table afterwards?

Currently, I'm doing a quick hack where we download an ordered list of the ids (auto-incrementing integer) into Python, chunk it up into groups of ids, then use a WHERE IN clause to download each chunk via COPY.

Would dumping to a spare table and paginating a better approach? Reasons? (Not challenging it, I just want to understand everything).

Cheers,
Victor


On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <folarte@peoplecall.com> wrote:
On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorhooi@yahoo.com> wrote:
> They think that it might be limited by the network, and how fast the
> PostgreSQL server can push the data across the internet. (The Postgres
> server and the box running the query are connected over the internet).

You previously said you had 600Mb. Over the internet. ¿ Is it a very
fat pipe ? Because otherwise the limitng factor is probably not the
speed at which postgres can push the resuts, but he throughput of your
link.

If, as you stated, you need a single transaction to get a 600Mb
snapshot I would recommend to dump it to disk, compressing on the fly
( you should get easily four o five fold reduction on a CSV file using
any decent compressor ), and then send the file. If you do not have
disk for the dump but can run programs near the server, you can try
compressing on the fly. If you have got none of this but have got
space for a spare table, use a select into, paginate this output and
drop it after. Or just look at the configs and set longer query times,
if your app NEEDS two hour queries, they can be enabled. But anyway,
doing a long transaction over the internet does not seem like a good
idea to me.

Francisco Olarte

В списке pgsql-general по дате отправления:

Предыдущее
От: Raphael Bauduin
Дата:
Сообщение: Re: problem with partitioned table and indexed json field
Следующее
От: si24
Дата:
Сообщение: Re: changing port numbers so pgbouncer can read geoserver and postgres