Обсуждение: Multiple COPY on the same table

Поиск
Список
Период
Сортировка

Multiple COPY on the same table

От
Ravi Krishna
Дата:
Can I split a large file into multiple files and then run copy using each file.  The table does not contain any 
serial or sequence column which may need serialization. Let us say I split a large file to 4 files.  Will the
performance boost by close to 4x??

ps: Pls ignore my previous post which was without a subject (due to mistake)

Re: Multiple COPY on the same table

От
Nicolas Paris
Дата:
> Can I split a large file into multiple files and then run copy using
> each file.  

AFAIK, copy command locks the table[1] while there is no mention of this
in the documentation[2]. 

>  Will the performance boost by close to 4x??

You might be interested in the pbBulkInsert tool[3] that allows parallel
copy with some succes accordingly to benchmarks. However, that tool does
not handle multiline csv. Because of that limitation I have been using
the standard copy command with binary format with some succes.


[1] https://grokbase.com/t/postgresql/pgsql-general/01597pv3qs/copy-locking
[2] https://www.postgresql.org/docs/current/static/sql-copy.html
[3] https://github.com/bytefish/PgBulkInsert


Re: [External] Multiple COPY on the same table

От
Vijaykumar Jain
Дата:
Hey Ravi,

What is the goal you are trying to achieve here.
To make pgdump/restore faster?
To make replication faster?
To make backup faster ?

Also no matter how small you split the files into, if network is your bottleneck then I am not sure you can attain n times the benefit my simply sending the files in parallel but yeah maybe some benefit.
But then for parallel processing you also need to ensure your server is having relevant resources or else it will just be a lot of context switching I guess ?
Pg dump has an option to dump in parallel
pgbasebackup is single threaded I read but pgbackrest can allow better parallel processing in backups.
There is also logical replication where you can selectively replicate your tables to avoid bandwidth issues.
I might have said a lot and nothing may be relevant, but you need to let us know the goal you want to achieve :)

Regards,
Vijay
From: Ravi Krishna <sravikrishna@aol.com>
Sent: Monday, August 20, 2018 8:24:35 PM
To: pgsql-general@lists.postgresql.org
Subject: [External] Multiple COPY on the same table
 
Can I split a large file into multiple files and then run copy using each file.  The table does not contain any 
serial or sequence column which may need serialization. Let us say I split a large file to 4 files.  Will the
performance boost by close to 4x??

ps: Pls ignore my previous post which was without a subject (due to mistake)

Re: [External] Multiple COPY on the same table

От
Ron
Дата:
Maybe he just has a large file that needs to be loaded into a table...

On 08/20/2018 11:47 AM, Vijaykumar Jain wrote:
Hey Ravi,

What is the goal you are trying to achieve here.
To make pgdump/restore faster?
To make replication faster?
To make backup faster ?

Also no matter how small you split the files into, if network is your bottleneck then I am not sure you can attain n times the benefit my simply sending the files in parallel but yeah maybe some benefit.
But then for parallel processing you also need to ensure your server is having relevant resources or else it will just be a lot of context switching I guess ?
Pg dump has an option to dump in parallel
pgbasebackup is single threaded I read but pgbackrest can allow better parallel processing in backups.
There is also logical replication where you can selectively replicate your tables to avoid bandwidth issues.
I might have said a lot and nothing may be relevant, but you need to let us know the goal you want to achieve :)

Regards,
Vijay
From: Ravi Krishna <sravikrishna@aol.com>
Sent: Monday, August 20, 2018 8:24:35 PM
To: pgsql-general@lists.postgresql.org
Subject: [External] Multiple COPY on the same table
 
Can I split a large file into multiple files and then run copy using each file.  The table does not contain any 
serial or sequence column which may need serialization. Let us say I split a large file to 4 files.  Will the
performance boost by close to 4x??

ps: Pls ignore my previous post which was without a subject (due to mistake)

--
Angular momentum makes the world go 'round.

Re: Multiple COPY on the same table

От
Adrian Klaver
Дата:
On 08/20/2018 08:56 AM, Nicolas Paris wrote:
>> Can I split a large file into multiple files and then run copy using
>> each file.
> 
> AFAIK, copy command locks the table[1] while there is no mention of this
> in the documentation[2].

[1] Is from Postgres 7.1(17 years ago). I suspect the conditions have 
changed at least a little:).

> 
>>   Will the performance boost by close to 4x??
> 
> You might be interested in the pbBulkInsert tool[3] that allows parallel
> copy with some succes accordingly to benchmarks. However, that tool does
> not handle multiline csv. Because of that limitation I have been using
> the standard copy command with binary format with some succes.
> 
> 
> [1] https://grokbase.com/t/postgresql/pgsql-general/01597pv3qs/copy-locking
> [2] https://www.postgresql.org/docs/current/static/sql-copy.html
> [3] https://github.com/bytefish/PgBulkInsert
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Multiple COPY on the same table

От
Christopher Browne
Дата:
On Mon, 20 Aug 2018 at 16:23, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 08/20/2018 08:56 AM, Nicolas Paris wrote:
> >> Can I split a large file into multiple files and then run copy using
> >> each file.
> >
> > AFAIK, copy command locks the table[1] while there is no mention of this
> > in the documentation[2].
>
> [1] Is from Postgres 7.1(17 years ago). I suspect the conditions have
> changed at least a little:).

oxrsdb-generated@localhost->  create temp table foo ();
CREATE TABLE
oxrsdb-generated@localhost->  begin;
BEGIN
oxrsdb-generated@localhost-> * \copy foo from '/dev/null';
COPY 0
oxrsdb-generated@localhost-> * select oid, relname from pg_class where
relname = 'foo';
  oid   | relname
--------+---------
 350686 | foo
(1 row)

oxrsdb-generated@localhost-> * select * from pg_locks ;
  locktype  | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |  pid
 |       mode       | granted | fastpath

------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------+----------
 relation   |   345742 |    11713 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     3455 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     2663 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     2662 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     2685 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     2684 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     2615 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |     1259 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
AccessShareLock  | t       | t
 relation   |   345742 |   350686 |      |       |            |
       |         |       |          | 2/200573           | 16754 |
RowExclusiveLock | t       | t
 virtualxid |          |          |      |       | 2/200573   |
       |         |       |          | 2/200573           | 16754 |
ExclusiveLock    | t       | t
(10 rows)

Table 'foo' has a RowExclusiveLock lock taken out as a consequence of
running COPY against it.

But that does not prevent other connections from concurrently writing
to the table.

Not all locks block other locks...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"