Обсуждение: Early hint bit setting
I was thinking about what is the earliest time where we could set hint bits. This would be just after the commit has been made visible. When the transaction completes and commit confirmation is sent to the client the backend will usually go to sleep waiting on the network socket waiting for further commands. Because most clients wait for the commit confirmation before proceeding this means that we have atleast one network RTT before this backend is expected to respond again. The idea is to keep a small backend local ring buffer of pages that have been modified. When a transaction has just committed, we do a non-blocking read on the socket. When nothing is available we take the opportunity to go and set hint bits in the recently modified buffers. Hurting latency for single-threaded workloads using lots of transactions is bad. It follows that it would be a bad idea to do anything that could take a long time while waiting for the next command. Because early hinting is a performance optimisation we can safely skip it if it becomes bothersome. Anything that causes IO can take too long. So we only set the hint bits when the page is still in shared buffers to avoid reading in the page. Furthermore, we only hint the tuples that the recently completed transaction modified to avoid IO from CLOG (we could hint other tuples if their xid happens to be in the SLRU, but it probably won't be very useful). Hint bits are set sooner or later. Setting them earlier is a throughput win for any workload because we avoid generating extra load. We avoid doing any IO and we might save some so for IO this is a pure win. The hinting CPU work needs to be done sooner or later, so that's a tie, except for extremely bursty write heavy loads with lots of transactions. Memory loads could in principle hurt other backends. Refilling the whole last level cache of modern processors takes a few hundred microseconds at peak speed. If the WAL is on fast storage (BBWC, SSD) there's a pretty good chance that the page being hinted is still in the cpu cache, avoiding the memory bandwidth overhead. Abstraction wise, I think we need to set up a mechanism to run very short maintenance jobs from backends waiting for new commands. SocketBackend could check if there's anything to do, and call pq_getbyte_if_available if there is anything to do before proceeding to do it. Setting hint bits early would help workloads with small synchronously writing transactions. Async commits could also benefit from proactive hint bit setting, but this would require some global cooperation and isn't as clear of a win. One idea would be to copy the local ring buffer entries to a global one tagged with the LSN when the transaction has been made visible. When someone flushes xlog, they also check if it enables some background hinting and set the corresponding flag for any backend with spare cycles to pick up. Comments? Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On Wed, May 30, 2012 at 4:42 PM, Ants Aasma <ants@cybertec.at> wrote: > I was thinking about what is the earliest time where we could set hint > bits. This would be just after the commit has been made visible. When > the transaction completes and commit confirmation is sent to the > client the backend will usually go to sleep waiting on the network > socket waiting for further commands. Because most clients wait for the > commit confirmation before proceeding this means that we have atleast > one network RTT before this backend is expected to respond again. > > The idea is to keep a small backend local ring buffer of pages that > have been modified. When a transaction has just committed, we do a > non-blocking read on the socket. When nothing is available we take the > opportunity to go and set hint bits in the recently modified buffers. > > Hurting latency for single-threaded workloads using lots of > transactions is bad. It follows that it would be a bad idea to do > anything that could take a long time while waiting for the next > command. Because early hinting is a performance optimisation we can > safely skip it if it becomes bothersome. Anything that causes IO can > take too long. So we only set the hint bits when the page is still in > shared buffers to avoid reading in the page. Furthermore, we only hint > the tuples that the recently completed transaction modified to avoid > IO from CLOG (we could hint other tuples if their xid happens to be in > the SLRU, but it probably won't be very useful). > > Hint bits are set sooner or later. Setting them earlier is a > throughput win for any workload because we avoid generating extra > load. We avoid doing any IO and we might save some so for IO this is a > pure win. The hinting CPU work needs to be done sooner or later, so > that's a tie, except for extremely bursty write heavy loads with lots > of transactions. Memory loads could in principle hurt other backends. > Refilling the whole last level cache of modern processors takes a few > hundred microseconds at peak speed. If the WAL is on fast storage > (BBWC, SSD) there's a pretty good chance that the page being hinted is > still in the cpu cache, avoiding the memory bandwidth overhead. > > Abstraction wise, I think we need to set up a mechanism to run very > short maintenance jobs from backends waiting for new commands. > SocketBackend could check if there's anything to do, and call > pq_getbyte_if_available if there is anything to do before proceeding > to do it. > > Setting hint bits early would help workloads with small synchronously > writing transactions. Async commits could also benefit from proactive > hint bit setting, but this would require some global cooperation and > isn't as clear of a win. One idea would be to copy the local ring > buffer entries to a global one tagged with the LSN when the > transaction has been made visible. When someone flushes xlog, they > also check if it enables some background hinting and set the > corresponding flag for any backend with spare cycles to pick up. > > Comments? I think this is a really neat idea, and could solve a lot of problems.Since you don't have to do any clog checks (you knowwhen you commit) -- i think it's a win all around -- so much so that it might be worth seeing the worst case latency hit if you force one page out always before doing the socket check. Hm, could you shave cpu cycles by just storing the specific offsets of the hint bit bytes you want to set, or is that too hacky? merlin
On Thu, May 31, 2012 at 1:01 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > I think this is a really neat idea, and could solve a lot of problems. > Since you don't have to do any clog checks (you know when you commit) > -- i think it's a win all around -- so much so that it might be worth > seeing the worst case latency hit if you force one page out always > before doing the socket check. Hm, could you shave cpu cycles by just > storing the specific offsets of the hint bit bytes you want to set, or > is that too hacky? Maybe even do both. By default store tuple offsets, but when the last item was from the same page convert it to a page hinting request. I have a specific near-realtime datawarehouse workload in mind where bulk load is being constantly performed by smallish transactions. By having page granularity in the buffer almost all pages could be hinted before hitting the disk. The latency vs throughput tradeoff could possibly be per backend tunable. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On 5/30/12 4:42 PM, Ants Aasma wrote: > I was thinking about what is the earliest time where we could set hint > bits. This would be just after the commit has been made visible. Except that's only true when there are no other transactions running. That's been one of the big sticking points about tryingto proactively set hint bits; in a real system you're not going to gain very much unless you wait a while before settingthem. An interesting option might be to keep the first XID that dirtied a page and loop through all pages in the background lookingfor pages where first_dirty_xid is < the oldest running XID. Those pages would have hint bits that could be set. Whilescanning the page you would want to set first_dirty_xid to the oldest XID that could not be hinted. This is a modification of the idea to set hint bits when a page is on it's way out of the buffer; the advantage here is thatit would also handle pages that are too hot to leave the buffer. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Wed, Jun 6, 2012 at 5:41 PM, Jim Nasby <jim@nasby.net> wrote: > On 5/30/12 4:42 PM, Ants Aasma wrote: >> >> I was thinking about what is the earliest time where we could set hint >> bits. This would be just after the commit has been made visible. > > > Except that's only true when there are no other transactions running. That's > been one of the big sticking points about trying to proactively set hint > bits; in a real system you're not going to gain very much unless you wait a > while before setting them. are you sure? the relevant code to set hint bit during tuple scan looks like this: else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple))) { if (HeapTupleHeaderGetCmin(tuple)>= snapshot->curcid) return false; /* inserted after scan started */ if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */ return true; if (tuple->t_infomask & HEAP_IS_LOCKED) /* not deleter */ return true; Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI)); if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))) { /* deleting subtransactionmust have aborted */ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); return true; } if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid) return true; /* deleted after scan started */ else return false; /* deleted before scan started */ } else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple))) return false; else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple))) SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED, HeapTupleHeaderGetXmin(tuple)); else { /* it must have aborted or crashed */ SetHintBits(tuple,buffer, HEAP_XMIN_INVALID, InvalidTransactionId); return false; } The backend that commits the transaction knows that the transaction is committed and that it's not in progress (at least from itself). Why do you have to wait for other transactions in progress to finish? Setting the xmin committed bit doesn't keep you from checking the xmax based rules. merlin
On Wed, Jun 6, 2012 at 6:41 PM, Jim Nasby <jim@nasby.net> wrote: > Except that's only true when there are no other transactions running. That's > been one of the big sticking points about trying to proactively set hint > bits; in a real system you're not going to gain very much unless you wait a > while before setting them. No, the committed hint bit just means that the transaction is committed. You don't have to wait for it to be all-visible. I think my biggest concern about this is that it inevitably relies on some assumption about how much latency there will be before the client sends the next request. That strikes me as impossible to tune. On system A, connected to the Internet via an overloaded 56k modem link, you can get away with doing a huge amount of fiddling around while waiting for the next request. But on system B, which uses 10GE or Infiniband or local sockets, the acceptable latency will be much less.Even given identical hardware, scheduler behavior maymatter quite a lot - rumor has it that FreeBSD's scheduling latency may be significantly less than on Linux, although I have not verified it and rumor may lie. But the point is that whether or not this works out to a win on any given system seems like it will depend on an awful lot of stuff that we can't know or control. I would be more inclined to look at trying to make this happen in a background process, although that's not without its own challenges. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company