Обсуждение: Multiple COPY on the same table
Can I split a large file into multiple files and then run copy using each file. The table does not contain any
serial or sequence column which may need serialization. Let us say I split a large file to 4 files. Will the
performance boost by close to 4x??
ps: Pls ignore my previous post which was without a subject (due to mistake)
> Can I split a large file into multiple files and then run copy using > each file. AFAIK, copy command locks the table[1] while there is no mention of this in the documentation[2]. > Will the performance boost by close to 4x?? You might be interested in the pbBulkInsert tool[3] that allows parallel copy with some succes accordingly to benchmarks. However, that tool does not handle multiline csv. Because of that limitation I have been using the standard copy command with binary format with some succes. [1] https://grokbase.com/t/postgresql/pgsql-general/01597pv3qs/copy-locking [2] https://www.postgresql.org/docs/current/static/sql-copy.html [3] https://github.com/bytefish/PgBulkInsert
Hey Ravi,
What is the goal you are trying to achieve here.
To make pgdump/restore faster?
To make replication faster?
To make backup faster ?
Also no matter how small you split the files into, if network is your bottleneck then I am not sure you can attain n times the benefit my simply sending the files in parallel but yeah maybe some benefit.
But then for parallel processing you also need to ensure your server is having relevant resources or else it will just be a lot of context switching I guess ?
Pg dump has an option to dump in parallel
pgbasebackup is single threaded I read but pgbackrest can allow better parallel processing in backups.
There is also logical replication where you can selectively replicate your tables to avoid bandwidth issues.
I might have said a lot and nothing may be relevant, but you need to let us know the goal you want to achieve :)
Regards,
Vijay
What is the goal you are trying to achieve here.
To make pgdump/restore faster?
To make replication faster?
To make backup faster ?
Also no matter how small you split the files into, if network is your bottleneck then I am not sure you can attain n times the benefit my simply sending the files in parallel but yeah maybe some benefit.
But then for parallel processing you also need to ensure your server is having relevant resources or else it will just be a lot of context switching I guess ?
Pg dump has an option to dump in parallel
pgbasebackup is single threaded I read but pgbackrest can allow better parallel processing in backups.
There is also logical replication where you can selectively replicate your tables to avoid bandwidth issues.
I might have said a lot and nothing may be relevant, but you need to let us know the goal you want to achieve :)
Regards,
Vijay
From: Ravi Krishna <sravikrishna@aol.com>
Sent: Monday, August 20, 2018 8:24:35 PM
To: pgsql-general@lists.postgresql.org
Subject: [External] Multiple COPY on the same table
Sent: Monday, August 20, 2018 8:24:35 PM
To: pgsql-general@lists.postgresql.org
Subject: [External] Multiple COPY on the same table
Can I split a large file into multiple files and then run copy using each file. The table does not contain any
serial or sequence column which may need serialization. Let us say I split a large file to 4 files. Will the
performance boost by close to 4x??
ps: Pls ignore my previous post which was without a subject (due to mistake)
Maybe he just has a large file that needs to be loaded into a table...
On 08/20/2018 11:47 AM, Vijaykumar Jain wrote:
Hey Ravi,
What is the goal you are trying to achieve here.
To make pgdump/restore faster?
To make replication faster?
To make backup faster ?
Also no matter how small you split the files into, if network is your bottleneck then I am not sure you can attain n times the benefit my simply sending the files in parallel but yeah maybe some benefit.
But then for parallel processing you also need to ensure your server is having relevant resources or else it will just be a lot of context switching I guess ?
Pg dump has an option to dump in parallel
pgbasebackup is single threaded I read but pgbackrest can allow better parallel processing in backups.
There is also logical replication where you can selectively replicate your tables to avoid bandwidth issues.
I might have said a lot and nothing may be relevant, but you need to let us know the goal you want to achieve :)
Regards,
VijayFrom: Ravi Krishna <sravikrishna@aol.com>
Sent: Monday, August 20, 2018 8:24:35 PM
To: pgsql-general@lists.postgresql.org
Subject: [External] Multiple COPY on the same tableCan I split a large file into multiple files and then run copy using each file. The table does not contain anyserial or sequence column which may need serialization. Let us say I split a large file to 4 files. Will theperformance boost by close to 4x??ps: Pls ignore my previous post which was without a subject (due to mistake)
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 08/20/2018 08:56 AM, Nicolas Paris wrote: >> Can I split a large file into multiple files and then run copy using >> each file. > > AFAIK, copy command locks the table[1] while there is no mention of this > in the documentation[2]. [1] Is from Postgres 7.1(17 years ago). I suspect the conditions have changed at least a little:). > >> Will the performance boost by close to 4x?? > > You might be interested in the pbBulkInsert tool[3] that allows parallel > copy with some succes accordingly to benchmarks. However, that tool does > not handle multiline csv. Because of that limitation I have been using > the standard copy command with binary format with some succes. > > > [1] https://grokbase.com/t/postgresql/pgsql-general/01597pv3qs/copy-locking > [2] https://www.postgresql.org/docs/current/static/sql-copy.html > [3] https://github.com/bytefish/PgBulkInsert > > -- Adrian Klaver adrian.klaver@aklaver.com
On Mon, 20 Aug 2018 at 16:23, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 08/20/2018 08:56 AM, Nicolas Paris wrote: > >> Can I split a large file into multiple files and then run copy using > >> each file. > > > > AFAIK, copy command locks the table[1] while there is no mention of this > > in the documentation[2]. > > [1] Is from Postgres 7.1(17 years ago). I suspect the conditions have > changed at least a little:). oxrsdb-generated@localhost-> create temp table foo (); CREATE TABLE oxrsdb-generated@localhost-> begin; BEGIN oxrsdb-generated@localhost-> * \copy foo from '/dev/null'; COPY 0 oxrsdb-generated@localhost-> * select oid, relname from pg_class where relname = 'foo'; oid | relname --------+--------- 350686 | foo (1 row) oxrsdb-generated@localhost-> * select * from pg_locks ; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------+---------- relation | 345742 | 11713 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 3455 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 2663 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 2662 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 2685 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 2684 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 2615 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 1259 | | | | | | | | 2/200573 | 16754 | AccessShareLock | t | t relation | 345742 | 350686 | | | | | | | | 2/200573 | 16754 | RowExclusiveLock | t | t virtualxid | | | | | 2/200573 | | | | | 2/200573 | 16754 | ExclusiveLock | t | t (10 rows) Table 'foo' has a RowExclusiveLock lock taken out as a consequence of running COPY against it. But that does not prevent other connections from concurrently writing to the table. Not all locks block other locks... -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"