Re: pg_upgrade failing for 200+ million Large Objects

Поиск
Список
Период
Сортировка
От Robins Tharakan
Тема Re: pg_upgrade failing for 200+ million Large Objects
Дата
Msg-id CAEP4nAxbrDM5b4g-+pPskuLmZWphZiuGGV7qpB7aAZ08ZMZSjw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_upgrade failing for 200+ million Large Objects  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

On Thu, 28 Dec 2023 at 01:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robins Tharakan <tharakan@gmail.com> writes:
> Applying all 4 patches, I also see good performance improvement.
> With more Large Objects, although pg_dump improved significantly,
> pg_restore is now comfortably an order of magnitude faster.

Yeah.  The key thing here is that pg_dump can only parallelize
the data transfer, while (with 0004) pg_restore can parallelize
large object creation and owner-setting as well as data transfer.
I don't see any simple way to improve that on the dump side,
but I'm not sure we need to.  Zillions of empty objects is not
really the use case to worry about.  I suspect that a more realistic
case with moderate amounts of data in the blobs would make pg_dump
look better.


Thanks for elaborating, and yes pg_dump times do reflect that
expectation.

The first test involved a fixed number (32k) of
Large Objects (LOs) with varying sizes - I chose that number
intentionally since this was being tested on a 32vCPU instance
and the patch employs 1k batches.


We again see that pg_restore is an order of magnitude faster.

 LO Size (bytes)  restore-HEAD restore-patched  improvement (Nx)
               1    24.182         1.4          17x
              10    24.741         1.5          17x
             100    24.574         1.6          15x
           1,000    25.314         1.7          15x
          10,000    25.644         1.7          15x
         100,000    50.046         4.3          12x
       1,000,000   281.549        30.0           9x


pg_dump also sees improvements. Really small sized LOs
see a decent ~20% improvement which grows considerably as LOs
get bigger (beyond ~10-100kb).


 LO Size (bytes)  dump-HEAD  dump-patched    improvement (%)
               1    12.9          10.7          18%
              10    12.9          10.4          19%
             100    12.8          10.3          20%
           1,000    13.0          10.3          21%
          10,000    14.2          10.3          27%
         100,000    32.8          11.5          65%
       1,000,000   211.8          23.6          89%
 

To test pg_restore scaling, 1 Million LOs (100kb each)
were created and pg_restore times tested for increasing
concurrency (on a 192vCPU instance). We see major speedup
upto -j64 and the best time was at -j96, after which
performance decreases slowly - see attached image.

Concurrency    pg_restore-patched
    384              75.87
    352              75.63
    320              72.11
    288              70.05
    256              70.98
    224              66.98
    192              63.04
    160              61.37
    128              58.82
     96              58.55
     64              60.46
     32              77.29
     16             115.51
      8             203.48
      4             366.33



Test details:
- Command used to generate SQL - create 1k LOs of 1kb each
  - echo "SELECT lo_from_bytea(0, '\x`  printf 'ff%.0s' {1..1000}`') FROM generate_series(1,1000);" > /tmp/tempdel
- Verify the LO size: select pg_column_size(lo_get(oid));
- Only GUC changed: max_connections=1000 (for the last test)

-
Robins Tharakan
Amazon Web Services
 
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: POC: GROUP BY optimization
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Switching XLog source from archive to streaming when primary available