Обсуждение: Execution-time-sensitive timestamp regression tests
I've been puzzled by a couple of recent buildfarm failures: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_bat&dt=2008-05-22%2006:00:01 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquita&dt=2008-05-19%2020:00:03 Both fail the timestamp test like this: *** ./expected/timestamp.out Mon May 19 21:01:31 2008 --- ./results/timestamp.out Mon May 19 21:34:23 2008 *************** *** 36,42 **** SELECT count(*) AS None FROM TIMESTAMP_TBL WHERE d1 = timestamp without time zone 'now'; none ------ ! 0 (1 row) DELETE FROM TIMESTAMP_TBL; --- 36,42 ---- SELECT count(*) AS None FROM TIMESTAMP_TBL WHERE d1 = timestamp without time zone 'now'; none ------ ! 1 (1 row) DELETE FROM TIMESTAMP_TBL; which seemed a bit baffling until I looked into the test source: CREATE TABLE TIMESTAMP_TBL ( d1 timestamp(2) without time zone); INSERT INTO TIMESTAMP_TBL VALUES ('now'); -- six more inserts and three selects SELECT count(*) AS None FROM TIMESTAMP_TBL WHERE d1 = timestamp without time zone 'now';none ------ 0 (1 row) IOW, this test is assuming that the value of now() must change in the time it takes to execute a few SQL statements. The buildfarm reports indicate that it didn't change. Both of the observed failures are on Windows machines, where I'm told that the resolution of gettimeofday() is an abysmal 55msec, so it seems that the only surprise here is that we haven't seen the failure often before. It might be worth trying to improve the resolution of now() on Windows, though I didn't much care for the ideas Magnus had about how to do that when I asked him earlier today. In any case I think this regression test is broken-by-design (and so is the very similar code in the timestamptz test). Aside from the risk we see here, there are previously-recognized race conditions in the statements I omitted: if local midnight passes during the sequence of statements, you get a failure. What I'm considering doing is putting a BEGIN/COMMIT around the whole sequence, which will guarantee that now() does *not* advance, thus eliminating the midnight gotcha. This would mean that the expected output of the comparison to 'now' changes from 0 rows to 1 row. Objections, better ideas? regards, tom lane
Tom Lane wrote: > Both of the observed failures are on Windows machines, where I'm told > that the resolution of gettimeofday() is an abysmal 55msec, so it > seems that the only surprise here is that we haven't seen the failure > often before. Actually, reading up some more it seems the 55msec was for win98 systems. Modern systems usually have around 10ms. And I agree that it's strange we haven't seen it before. Could it be a side-effect from something that happened lately, or should we consider it purely random? > It might be worth trying to improve the resolution of now() on > Windows, though I didn't much care for the ideas Magnus had about how > to do that when I asked him earlier today. For the record, what we were talking about was snapshotting the time at backend start and then use QueryPerformanceCounter() to see what happened and do some calculation. The other option appears to be to use a multimedia timer, timeGetSystemTime(). But the comment for that one still says "The default precision of the timeGetTime function can be five milliseconds or more, depending on the machine. You can use the timeBeginPeriod and timeEndPeriod functions to increase the precision of timeGetTime. If you do so, the minimum difference between successive values returned by timeGetTime can be as large as the minimum period value set using timeBeginPeriod and timeEndPeriod. Use the QueryPerformanceCounter and QueryPerformanceFrequency functions to measure short time intervals at a high resolution, " So the bottom line is - if you want to have high performance timing, you really should use QueryPerformance...(). //Magnus
Magnus Hagander <magnus@hagander.net> writes: > And I agree that it's strange we haven't seen it before. Could it be a > side-effect from something that happened lately, or should we consider > it purely random? Well, it's possible we have seen it before and ignored it because it wasn't reproducible. It was only the coincidence of two buildfarm members showing the failure at the same time that drew my attention. I think Andrew has the ability to grep the buildfarm history to see if we've seen this before. Another possibility is that we did something that improved performance to get us below the threshold where this could happen; though I don't recall anything very likely. regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > >> And I agree that it's strange we haven't seen it before. Could it be a >> side-effect from something that happened lately, or should we consider >> it purely random? >> > > Well, it's possible we have seen it before and ignored it because it > wasn't reproducible. It was only the coincidence of two buildfarm > members showing the failure at the same time that drew my attention. > > I think Andrew has the ability to grep the buildfarm history to see > if we've seen this before. > > Another possibility is that we did something that improved performance > to get us below the threshold where this could happen; though I don't > recall anything very likely. > > rega > I can't find any other similar failures recently in the buildfarm history. One data point: the _bat buildfarm members are running on Windows under VMWare, so their clocks might not be very true. cheers andrew
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > What I'm considering doing is putting a BEGIN/COMMIT around the whole > sequence, which will guarantee that now() does *not* advance, thus > eliminating the midnight gotcha. This would mean that the expected > output of the comparison to 'now' changes from 0 rows to 1 row. > > Objections, better ideas? Alternatively we could do a pg_sleep(.1) to sleep for 100ms. It sounds like the ideal would be something like: insert 'now' pg_sleep(.1) begin insert 'now' select * from table -- expect 1 (not 0 or 2) ... the other tests you mention which get bitten by midnight end -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Magnus Hagander wrote: >For the record, what we were talking about was snapshotting the time at >backend start and then use QueryPerformanceCounter() to see what >happened and do some calculation. Although this might not be such a big issue for the regression tests: Be aware that the reliability of QueryPerformanceCounter() depends on the hardware: <http://support.microsoft.com/kb/274323/en-us> I used it in Munnin, but there were too many customer machines affected by this bug so that I had to abandom it. This was a few years ago, but I'm not sure whether the function can be trusted today (see for example this performance problem <http://support.microsoft.com/kb/895980/en-us>). Rainer
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Objections, better ideas? > Alternatively we could do a pg_sleep(.1) to sleep for 100ms. It sounds like > the ideal would be something like: > insert 'now' > pg_sleep(.1) > begin > insert 'now' > select * from table -- expect 1 (not 0 or 2) > ... the other tests you mention which get bitten by midnight > end Roger, will do. There actually still is a small risk in the sequence: if the BEGIN block starts *exactly* at midnight, to within the resolution of gettimeofday(), then 'now' and 'today' will yield the same value so the expected row counts will not be matched. This seems like an acceptably small probability to me. regards, tom lane