Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
От | Bharath Rupireddy |
---|---|
Тема | Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? |
Дата | |
Msg-id | CALj2ACVu6Tqda-8m17yxH2dTLz1XC1fSGvFEXvekaKbOhn7Epw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
(Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
|
Список | pgsql-hackers |
On Sun, Aug 7, 2022 at 3:19 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > A second thing is that pg_pwritev_with_retry_and_write_zeros() is > > designed to work on WAL segments initialization and it uses > > XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing > > in its name that tells us so. This makes me question whether > > file_utils.c is a good location for this second thing. Could a new > > file be a better location? We have a xlogutils.c in the backend, and > > a name similar to that in src/common/ would be one possibility. > > Yeah, I think it should probably be disconnected from XLOG_BLCKSZ, or > maybe it's OK to use BLCKSZ with a comment to say that it's a bit > arbitrary, or maybe it's better to define a new zero buffer of some > arbitrary size just in this code if that is too strange. We could > experiment with different size buffers to see how it performs, bearing > in mind that every time we double it you halve the number of system > calls, but also bearing in mind that at some point it's too much for > the stack. I can tell you that the way that code works today was not > really written with performance in mind (unlike, say, the code > reverted from 9.4 that tried to do this with posix_fallocate()), it > was just finding an excuse to call pwritev(), to exercise new fallback > code being committed for use by later AIO stuff (more patches coming > soon). The retry support was added because it seemed plausible that > some system out there would start to do short writes as we cranked up > the sizes for some implementation reason other than ENOSPC, so we > should make a reusable retry routine. Yes, doubling the zerobuffer size to say 2 * XLOG_BLCKSZ or 2 * BLCKSZ reduces the system calls to half (right now, pg_pwritev_with_retry() gets called 64 times per 16MB WAL file, it writes in the batches of 32 blocks per call). Is there a ready-to-use tool or script or specific settings for pgbench (pgbench command line options or GUC settings) that I can play with to measure the performance? > I think this should also handle the remainder after processing whole > blocks, just for completeness. If I call the code as presented with size > 8193, I think this code will only write 8192 bytes. Hm, I will fix it. > I think if this ever needs to work on O_DIRECT files there would be an > alignment constraint on the buffer and size, but I don't think we have > to worry about that for now. We can add a comment about the above limitation, if required. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
В списке pgsql-hackers по дате отправления: