Обсуждение: RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)
> > BUT, do we know for sure that sleep(0) is not optimized in > > the library to just return? > > We can only do our best here. I think guessing whether other backends > are _about_ to commit is pretty shaky, and sleeping every time is a > waste. This seems the cleanest. A long ago you, Bruce, made me gift - book about transaction processing (thanks again -:)). This sleeping before fsync in commit is described there as standard technique. And the reason is cleanest. Men, cost of fsync is very high! { write (64 bytes) + fsync() } takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse performance when there is only one backend running but greatly increase overall performance for 100 simultaneous backends. Ie this delay is trade off to gain better scalability. I agreed that it must be configurable, smaller or probably 0 by default, use approximate # of simultaneously running backends for guessing (postmaster could maintain this number in shmem and backends could just read it without any locking - exact number is not required), good described as tuning patameter in documentation. Anyway I object sleep(0). Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > A long ago you, Bruce, made me gift - book about transaction processing > (thanks again -:)). This sleeping before fsync in commit is described > there as standard technique. And the reason is cleanest. > Men, cost of fsync is very high! { write (64 bytes) + fsync() } > takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse > performance when there is only one backend running but greatly > increase overall performance for 100 simultaneous backends. Ie this > delay is trade off to gain better scalability. > I agreed that it must be configurable, smaller or probably 0 by > default, use approximate # of simultaneously running backends for > guessing (postmaster could maintain this number in shmem and > backends could just read it without any locking - exact number is > not required), good described as tuning patameter in documentation. > Anyway I object sleep(0). Good points. Another idea that Bruce and I kicked around on the phone was to make the pre-fsync delay be self-adjusting; that is, it'd automatically move up and down based on system load. For example, you could keep track of the time since the last xact commit, and guess that the time to the next one will be similar. If that's greater than your intended sleep delay, forget the sleep and just fsync. But the shorter the time since the last commit, the longer you should be willing to delay. This'd need some experimentation to get right, but it seems a lot better than asking the dbadmin to pick a value. Another thing that should happen is that once someone fsyncs, all the other backends waiting should be awoken immediately, instead of waiting for their delays to time out. Not sure how doable this is --- there's no wait-for-semaphore-with-timeout in SysV IPC, is there? Perhaps we can distinguish the first waiter (the guy who will ultimately do the fsync, he's just hoping for some passengers) from the rest, who see that someone's already waiting for fsync and just wait for him to do it. Those other guys don't do a time wait, they sleep on a semaphore that the first waiter will release once he's done the fsync. regards, tom lane
> > sleep(3) should conform to POSIX specification, if anyone has the > > reference they can check it to see what the effect of sleep(0) > > should be. > > Yes, but Posix also specifies sched_yield() which rather explicitly > allows a process to yield its timeslice. No idea how well that is > supported. OK, I have a new idea. There are two parts to transaction commit. The first is writing all dirty buffers or log changes to the kernel, and second is fsync of the log file. I suggest having a per-backend shared memory byte that has the following values: START_LOG_WRITEWAIT_ON_FSYNCNOT_IN_COMMITbackend_number_doing_fsync I suggest that when each backend starts a commit, it sets its byte to START_LOG_WRITE. When it gets ready to fsync, it checks all backends. If all are NOT_IN_COMMIT, it does fsync and continues. If one or more are in START_LOG_WRITE, it waits until no one is in START_LOG_WRITE. It then checks all WAIT_ON_FSYNC, and if it is the lowest backend in WAIT_ON_FSYNC, marks all others with its backend number, and does fsync. It then clears all backends with its number to NOT_IN_COMMIT. Other backend will see they are not the lowest WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT so they can then continue, knowing their data was synced. This allows a single backend not to sleep, and allows multiple backends to bunch up only when they are all about to commit. The reason backend numbers are written is so other backends entering the commit code will not interfere with the backends performing fsync. Comments? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Added to TODO: * Delay fsync() when other backends are about to commit too [ Charset ISO-8859-1 unsupported, converting... ] > > > BUT, do we know for sure that sleep(0) is not optimized in > > > the library to just return? > > > > We can only do our best here. I think guessing whether other backends > > are _about_ to commit is pretty shaky, and sleeping every time is a > > waste. This seems the cleanest. > > A long ago you, Bruce, made me gift - book about transaction processing > (thanks again -:)). This sleeping before fsync in commit is described > there as standard technique. And the reason is cleanest. > Men, cost of fsync is very high! { write (64 bytes) + fsync() } > takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse > performance when there is only one backend running but greatly > increase overall performance for 100 simultaneous backends. Ie this > delay is trade off to gain better scalability. > > I agreed that it must be configurable, smaller or probably 0 by > default, use approximate # of simultaneously running backends for > guessing (postmaster could maintain this number in shmem and > backends could just read it without any locking - exact number is > not required), good described as tuning patameter in documentation. > Anyway I object sleep(0). > > Vadim > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hi there, I would like to inquire of any support for WinME to run PostgreSQL. Should anyone knows how, I would be grateful to ask for advice. I need to run PostgreSQL on my WinME box. -- Manny C. Cabido ==================================== e-mail:manny@tinago.msuiit.edu.ph manny@sun.msuiit.edu.ph =====================================