Обсуждение: pg_dump performance issues

Поиск
Список
Период
Сортировка

pg_dump performance issues

От
Dominique Devienne
Дата:
In the past, I've read [this post][1] from Marc Millas that reports `pg_dump ... | psql` at throughput around 500MB/s (5Gb/s) on a 10Gb/s network.

Today, I've tried pg_dump on a single schema of 25 tables, 88K rows, 70MB total (as reported by pg total relation sizes). 1 table of 225 rows contains blobs totaling 48MB of the 70MB by itself. The largest table has 40K rows.

And it takes around 10s (+/- 1s), generating a ~ 100MB dump file (or directory).
I tried all 4 formats, and even with -jN with N in 2,3,4 for -Fd. Little to no difference, both in duration and output sizes.
I tried against a v12 and v14 Linux servers (real hardware, server-class machines, in the $10K range each, 2 years old and 7 years old, running RedHat 7)
I'm on a 1Gb/s LAN.
The client side is a Windows 11 PRO desktop running 16.1 release builds of pg_dump (and my own client).

These results mean a throughput of only around 10MB/s, when considering the output dump, and less when considering the relation sizes (70MB)

Then I tried my own "dump", using COPY BINARY. And got a 58MB dump file in around 5-6s, with my libpq wrapper reporting metrics of the COPY itself (i.e. just the libpq COPY APIs used, independent of how I write the results to disk) in the 10-12MB/s throughput range. So basically not quite twice faster than pg_dump, with a resulting dump not quite half as small.

The reason I'm writing this is because I'm surprised how slow that is.
There's a chasm between 500MB/s and 10MB/s.
What could explain such slow performance?
Of pg_dump? And my own slightly faster dump?

Also, how come `-Fd -j4` is not any faster than the serial version,
especially when there's a table with few rows and large blobs that is 68% of the total?

What could be wrong to explain such poor throughput?
Especially given that even pg_dump itself is slow, thus it doesn't appear to be my code.

I'm concerned with these results, and would appreciate some expert advice.

Thanks, --DD

PS: The PostgreSQL installations are mostly "stock", with little to no tuning...

Re: pg_dump performance issues

От
Francisco Olarte
Дата:
Dominique:

On Thu, 22 Feb 2024 at 10:50, Dominique Devienne <ddevienne@gmail.com> wrote:
> In the past, I've read [this post][1] from Marc Millas that reports `pg_dump ... | psql` at throughput around 500MB/s
(5Gb/s)on a 10Gb/s network.
 

That is half the theoretical bandwidth,more or less.

> I tried against a v12 and v14 Linux servers (real hardware, server-class machines, in the $10K range each, 2 years
oldand 7 years old, running RedHat 7)
 
> I'm on a 1Gb/s LAN.
The you would max out at about 50 if getting the same behaviour as
your reference.

> The client side is a Windows 11 PRO desktop running 16.1 release builds of pg_dump (and my own client).

Before doing any more thing you should test the network throughput
between your client and servers, on a naked connection. There are
plenty of programs for these, even a ftp/scp/network file copy would
be a useful reference. You have not provided any and.....

> .... in the 10-12MB/s throughput range.....

This has the faint smell of a saturated 100Mbps link in the middle (12*8=96Mbps)

Make some tests of the network and measure ( and post them ) before,
so people can guesstimate something.

Francisco Olarte.



Re: pg_dump performance issues

От
Dominique Devienne
Дата:
On Thu, Feb 22, 2024 at 11:41 AM Francisco Olarte <folarte@peoplecall.com> wrote:
Dominique:
> .... in the 10-12MB/s throughput range.....

This has the faint smell of a saturated 100Mbps link in the middle (12*8=96Mbps)

Very good call Francisco! Thanks a lot.

Below is connecting to the v14 PostgreSQL host, from my Windows desktop.
So indeed, I'm not on 1Gb/s as I expected... I'll report back when this is fixed.

Thanks again, --DD

C:\Apps\iperf3>.\iperf3.exe -c XX.YY.ZZ.13
Connecting to host XX.YY.ZZ.13, port 5201
[  4] local XX.YY.ZZ.49 port 62617 connected to XX.YY.ZZ.13 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.01   sec  11.5 MBytes  95.9 Mbits/sec
[  4]   1.01-2.00   sec  11.4 MBytes  95.7 Mbits/sec
[  4]   2.00-3.01   sec  11.4 MBytes  94.9 Mbits/sec
[  4]   3.01-4.00   sec  11.2 MBytes  94.9 Mbits/sec
[  4]   4.00-5.01   sec  11.4 MBytes  94.9 Mbits/sec
[  4]   5.01-6.00   sec  11.2 MBytes  94.9 Mbits/sec
[  4]   6.00-7.01   sec  11.2 MBytes  94.0 Mbits/sec
[  4]   7.01-8.00   sec  11.4 MBytes  95.8 Mbits/sec
[  4]   8.00-9.00   sec  11.2 MBytes  94.7 Mbits/sec
[  4]   9.00-10.01  sec  11.4 MBytes  94.9 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.01  sec   113 MBytes  95.0 Mbits/sec                  sender
[  4]   0.00-10.01  sec   113 MBytes  94.8 Mbits/sec                  receiver

iperf Done. 

Re: pg_dump performance issues

От
Dominique Devienne
Дата:
On Thu, Feb 22, 2024 at 12:20 PM Dominique Devienne <ddevienne@gmail.com> wrote:
On Thu, Feb 22, 2024 at 11:41 AM Francisco Olarte <folarte@peoplecall.com> wrote:
Dominique:
> .... in the 10-12MB/s throughput range.....

This has the faint smell of a saturated 100Mbps link in the middle (12*8=96Mbps)

So indeed, I'm not on 1Gb/s as I expected... I'll report back when this is fixed.

So IT fixing the networking on my Windows machine is slower than I expected...

Thus I switched to a Linux client, which I verified using iperf3 really has 1Gbps.
And there I can read at 100MB/s, which is almost saturating the network link.
The Linux dumps are 45MB this time, instead of 100MB on Windows, as if the former is compressed, and the latter isn't.
Perhaps the (implicit?) compression of the dump is why it's 8x slower than mine? (I do fewer catalog queries I suspect)
And I see no speedup from parallelizing, unfortunately, which I find strange, as mentioned earlier. Any idea why?
Note that this is against the v12 server, and the client is still 16.1.

Applying a two-threaded producer-consumer scheme to my dump, to parallelize reads and writes, I might approach 100MB/s overall.
Before getting into parallelizing reads, which complicate matters, and my dump container's API is serial anyway (it's SQLite...)

So looks like I'm networking bound on 1Gbps, while Marc perhaps reached PostgreSQL's (or the disks'?) limits before saturating his 10Gbps networking.

Thanks, --DD

[ddevienne]$ my_dump ...
COPY:         25x (   88,073 rows,     56,768,104 bytes) in   0.537s (100.7 MB/s)
DUMP'd 88,023 rows (56,761,583 bytes) from 25 tables in 1.062s (51.0 MB/s)

[ddevienne]$ ll ....db
-rw-r--r-- 1 ddevienne users 58,351,616 Feb 22 15:10 ....db

[ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fc -n '"..."' -f ...-Fc.dump "host=acme dbname=copyperf"
7.561u 0.216s 0:08.46 91.8%     0+0k 0+91056io 0pf+0w
[ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -n '"..."' -f ...-Fd.dump "host=acme dbname=copyperf"
7.351u 0.219s 0:08.33 90.7%     0+0k 0+90808io 0pf+0w
[ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j2 -n '"..."' -f ...-Fdj2.dump "host=acme dbname=copyperf"
7.667u 0.228s 0:08.56 92.0%     0+0k 0+90808io 0pf+0w
[ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j3 -n '"..."' -f ...-Fdj3.dump "host=acme dbname=copyperf"
7.964u 0.247s 0:08.71 94.1%     0+0k 0+90808io 0pf+0w
[ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j4 -n '"..."' -f ...-Fdj4.dump "host=acme dbname=copyperf"
7.491u 0.243s 0:08.21 94.1%     0+0k 0+90808io 0pf+0w
[ddevienne]$ du -sm ...-F*
45      ...-Fc.dump
45      ...-Fd.dump
45      ...-Fdj2.dump
45      ...-Fdj3.dump
45      ...-Fdj4.dump
 

Re: pg_dump performance issues

От
Adrian Klaver
Дата:
On 2/22/24 06:40, Dominique Devienne wrote:
> On Thu, Feb 22, 2024 at 12:20 PM Dominique Devienne <ddevienne@gmail.com 
> <mailto:ddevienne@gmail.com>> wrote:
> 
>     On Thu, Feb 22, 2024 at 11:41 AM Francisco Olarte
>     <folarte@peoplecall.com <mailto:folarte@peoplecall.com>> wrote:
> 
>         Dominique:
>          > .... in the 10-12MB/s throughput range.....
> 
>         This has the faint smell of a saturated 100Mbps link in the
>         middle (12*8=96Mbps)
> 
> 
>     So indeed, I'm not on 1Gb/s as I expected... I'll report back when
>     this is fixed.
> 
> 
> So IT fixing the networking on my Windows machine is slower than I 
> expected...
> 
> Thus I switched to a Linux client, which I verified using iperf3 really 
> has 1Gbps.
> And there I can read at 100MB/s, which is almost saturating the network 
> link.
> The Linux dumps are 45MB this time, instead of 100MB on Windows, as if 
> the former is compressed, and the latter isn't.

Did you use the same pg_dump command on the Windows client?

Where was the Windows client Postgres software installed from?

Where was the Linux client Postgres software installed from?

> Perhaps the (implicit?) compression of the dump is why it's 8x slower 
> than mine? (I do fewer catalog queries I suspect)

 From

https://www.postgresql.org/docs/current/app-pgdump.html

d
directory

  <...>
  This format is compressed by default using gzip and also supports 
parallel dumps.

> And I see no speedup from parallelizing, unfortunately, which I find 
> strange, as mentioned earlier. Any idea why?

 From same link as above

-j njobs
--jobs=njobs

Run the dump in parallel by dumping njobs tables simultaneously. This 
option may reduce the time needed to perform the dump but it also 
increases the load on the database server. You can only use this option 
with the directory output format because this is the only output format 
where multiple processes can write their data at the same time.

<...>


I'm guessing on this small a database the overhead of -j is greater then 
the benefit derived.


> Note that this is against the v12 server, and the client is still 16.1.
> 
> Applying a two-threaded producer-consumer scheme to my dump, to 
> parallelize reads and writes, I might approach 100MB/s overall.
> Before getting into parallelizing reads, which complicate matters, and 
> my dump container's API is serial anyway (it's SQLite...)
> 
> So looks like I'm networking bound on 1Gbps, while Marc perhaps reached 
> PostgreSQL's (or the disks'?) limits before saturating his 10Gbps 
> networking.
> 
> Thanks, --DD
> 
> [ddevienne]$ my_dump ...
> COPY:         25x (   88,073 rows,     56,768,104 bytes) in   0.537s 
> (100.7 MB/s)
> DUMP'd 88,023 rows (56,761,583 bytes) from 25 tables in 1.062s (51.0 MB/s)
> 
> [ddevienne]$ ll ....db
> -rw-r--r-- 1 ddevienne users 58,351,616 Feb 22 15:10 ....db
> 
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fc -n 
> '"..."' -f ...-Fc.dump "host=acme dbname=copyperf"
> 7.561u 0.216s 0:08.46 91.8%     0+0k 0+91056io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -n 
> '"..."' -f ...-Fd.dump "host=acme dbname=copyperf"
> 7.351u 0.219s 0:08.33 90.7%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j2 
> -n '"..."' -f ...-Fdj2.dump "host=acme dbname=copyperf"
> 7.667u 0.228s 0:08.56 92.0%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j3 
> -n '"..."' -f ...-Fdj3.dump "host=acme dbname=copyperf"
> 7.964u 0.247s 0:08.71 94.1%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ time $PDGM_POSTGRESQL_DIR/bin/pg_dump --role="..." -Fd -j4 
> -n '"..."' -f ...-Fdj4.dump "host=acme dbname=copyperf"
> 7.491u 0.243s 0:08.21 94.1%     0+0k 0+90808io 0pf+0w
> [ddevienne]$ du -sm ...-F*
> 45      ...-Fc.dump
> 45      ...-Fd.dump
> 45      ...-Fdj2.dump
> 45      ...-Fdj3.dump
> 45      ...-Fdj4.dump

-- 
Adrian Klaver
adrian.klaver@aklaver.com