Обсуждение: pg_next_dst_boundary optimization
While profiling loading a table with timestamp columns with COPY, I spotted that a lot of time is spent in pg_next_dst_boundary. It scans a sorted array of time segments to find the one that the given timestamp falls within. That's slow when the array is big; GB timezone for example has > 200 segments. Attached is a patch to use binary search instead of linear search. On my laptop, it reduces the time required to load million rows to this test table: CREATE TABLE timestamptable ( id int, ts1 timestamp, ts2 timestamp, ts3 timestamp, ts4 timestamp ); from ~16.4s to ~11.9s. It would be nice to slip this into 8.3... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/timezone/localtime.c =================================================================== RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/timezone/localtime.c,v retrieving revision 1.17 diff -c -r1.17 localtime.c *** src/timezone/localtime.c 4 Aug 2007 19:29:25 -0000 1.17 --- src/timezone/localtime.c 18 Sep 2007 13:41:03 -0000 *************** *** 1008,1013 **** --- 1008,1014 ---- int i; int j; const pg_time_t t = *timep; + int high, low; sp = &tz->state; if (sp->timecnt == 0) *************** *** 1056,1064 **** return 1; } /* Else search to find the containing segment */ ! for (i = 1; i < sp->timecnt; ++i) ! if (t <= sp->ats[i]) ! break; j = sp->types[i - 1]; ttisp = &sp->ttis[j]; *before_gmtoff = ttisp->tt_gmtoff; --- 1057,1074 ---- return 1; } /* Else search to find the containing segment */ ! low = 1; ! high = sp->timecnt - 1; ! while (high > low) ! { ! int mid = low + ((high - low) / 2); ! if (t > sp->ats[mid]) ! low = mid + 1; ! else ! high = mid; ! } ! i = high; ! j = sp->types[i - 1]; ttisp = &sp->ttis[j]; *before_gmtoff = ttisp->tt_gmtoff;
Heikki Linnakangas wrote: > While profiling loading a table with timestamp columns with COPY, I > spotted that a lot of time is spent in pg_next_dst_boundary. It scans a > sorted array of time segments to find the one that the given timestamp > falls within. That's slow when the array is big; GB timezone for example > has > 200 segments. > > Attached is a patch to use binary search instead of linear search. Digging deeper into this, that function is essentially the same as the first part of localsub [1], which should also be changed to use binary search. In fact, that's exactly what localsub does in the most recent version of the tz library. Has anyone looked what other changes there's been to the code in tz library? Which version of tz is our code based on? [1] Tom's post introducing pg_next_dst_boundary: http://archives.postgresql.org/pgsql-hackers/2004-10/msg01066.php), -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > Has anyone looked what other changes there's been to the code in tz > library? Yeah, we need to re-sync that code --- one big item we are missing is support for 64-bit timezone data (hence, tz info beyond 2038). I don't really want to touch it for 8.3 though. There's been enough code drift on our side (mostly from well-meaning activities like pg_indent and ANSI-fying function headers) that a merge is going to be a bit painful, and risk introducing some bugs. We should hold that for a fresh devel cycle rather than try to cram it in at the last minute. > Which version of tz is our code based on? Whatever was current when we imported the code (4/2004 looks like). regards, tom lane
On Wed, 2007-09-19 at 12:25 +0100, Heikki Linnakangas wrote: > It would be nice to slip this into 8.3... +1 from me: the tzcode merge is pretty obviously 8.4 material, but it would be nice to get this perf tweak in for 8.3. -Neil