Re: effective_io_concurrency and NVMe devices
От | Tomas Vondra |
---|---|
Тема | Re: effective_io_concurrency and NVMe devices |
Дата | |
Msg-id | f0c38d0f-191a-8183-0a44-c2e9d1e16d67@enterprisedb.com обсуждение исходный текст |
Ответ на | RE: effective_io_concurrency and NVMe devices (Jakub Wartak <Jakub.Wartak@tomtom.com>) |
Ответы |
RE: effective_io_concurrency and NVMe devices
(Jakub Wartak <Jakub.Wartak@tomtom.com>)
|
Список | pgsql-hackers |
On 6/7/22 15:29, Jakub Wartak wrote: > Hi Tomas, > >>> I have a machine here with 1 x PCIe 3.0 NVMe SSD and also 1 x PCIe 4.0 >>> NVMe SSD. I ran a few tests to see how different values of >>> effective_io_concurrency would affect performance. I tried to come up >>> with a query that did little enough CPU processing to ensure that I/O >>> was the clear bottleneck. >>> >>> The test was with a 128GB table on a machine with 64GB of RAM. I >>> padded the tuples out so there were 4 per page so that the aggregation >>> didn't have much work to do. >>> >>> The query I ran was: explain (analyze, buffers, timing off) select >>> count(p) from r where a = 1; > >> The other idea I had while looking at batching a while back, is that we should >> batch the prefetches. The current logic interleaves prefetches with other work - >> prefetch one page, process one page, ... But once reading a page gets >> sufficiently fast, this means the queues never get deep enough for >> optimizations. So maybe we should think about batching the prefetches, in some >> way. Unfortunately posix_fadvise does not allow batching of requests, but we >> can at least stop interleaving the requests. > > .. for now it doesn't, but IORING_OP_FADVISE is on the long-term horizon. > Interesting! Will take time to get into real systems, though. >> The attached patch is a trivial version that waits until we're at least >> 32 pages behind the target, and then prefetches all of them. Maybe give it a try? >> (This pretty much disables prefetching for e_i_c below 32, but for an >> experimental patch that's enough.) > > I've tried it at e_i_c=10 initially on David's setup.sql, and most defaults s_b=128MB, dbsize=8kb but with forced disabledparallel query (for easier inspection with strace just to be sure//so please don't compare times). > > run: > a) master (e_i_c=10) 181760ms, 185680ms, 185384ms @ ~ 340MB/s and 44k IOPS (~122k IOPS practical max here for libaio) > b) patched(e_i_c=10) 237774ms, 236326ms, ..as you stated it disabled prefetching, fadvise() not occurring > c) patched(e_i_c=128) 90430ms, 88354ms, 85446ms, 78475ms, 74983ms, 81432ms (mean=83186ms +/- 5947ms) @ ~570MB/s and 75kIOPS (it even peaked for a second on ~122k) > d) master (e_i_c=128) 116865ms, 101178ms, 89529ms, 95024ms, 89942ms 99939ms (mean=98746ms +/- 10118ms) @ ~510MB/s and 65kIOPS (rare peaks to 90..100k IOPS) > > ~16% benefit sounds good (help me understand: L1i cache?). Maybe it is worth throwing that patch onto more advanced / completeperformance test farm too ? (although it's only for bitmap heap scans) > > run a: looked interleaved as you said: > fadvise64(160, 1064157184, 8192, POSIX_FADV_WILLNEED) = 0 > pread64(160, "@\0\0\0\200\303/_\0\0\4\0(\0\200\0\0 \4 \0\0\0\0 \230\300\17@\220\300\17"..., 8192, 1064009728) = 8192 > fadvise64(160, 1064173568, 8192, POSIX_FADV_WILLNEED) = 0 > pread64(160, "@\0\0\0\0\0040_\0\0\4\0(\0\200\0\0 \4 \0\0\0\0 \230\300\17@\220\300\17"..., 8192, 1064026112) = 8192 > [..] > > BTW: interesting note, for run b, the avgrq-sz from extended iostat jumps is flipping between 16(*512=8kB) to ~256(*512=~128kB!)as if kernel was doing some own prefetching heuristics on and off in cycles, while when calling e_i_c/fadvise()is in action then it seems to be always 8kB requests. So with disabled fadivse() one IMHO might have problemsdeterministically benchmarking short queries as kernel voodoo might be happening (?) > Yes, kernel certainly does it's own read-ahead, which works pretty well for sequential patterns. What does blockdev --getra /dev/... say? -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: