Обсуждение: lseek
Hi all, what might possibly be wrong with a database/statement what so ever, if a strace to a postgres process shows this over several ours without terminating? lseek(3, 0, SEEK_END) = 0 lseek(5, 0, SEEK_END) = 8192 lseek(6, 0, SEEK_END) = 0 lseek(9, 0, SEEK_END) = 0 lseek(10, 0, SEEK_END) = 0 lseek(11, 0, SEEK_END) = 0 lseek(13, 0, SEEK_END) = 0 lseek(14, 0, SEEK_END) = 0 lseek(15, 0, SEEK_END) = 0 lseek(16, 0, SEEK_END) = 0 lseek(17, 0, SEEK_END) = 0 lseek(18, 0, SEEK_END) = 0 lseek(19, 0, SEEK_END) = 0 lseek(20, 0, SEEK_END) = 0 lseek(21, 0, SEEK_END) = 11214848 lseek(22, 0, SEEK_END) = 0 lseek(23, 0, SEEK_END) = 0 lseek(24, 0, SEEK_END) = 0 lseek(25, 0, SEEK_END) = 0 lseek(26, 0, SEEK_END) = 0 lseek(27, 0, SEEK_END) = 0 lseek(28, 0, SEEK_END) = 0 lseek(29, 0, SEEK_END) = 0 lseek(30, 0, SEEK_END) = 0 lseek(31, 0, SEEK_END) = 0 lseek(32, 0, SEEK_END) = 0 lseek(3, 0, SEEK_END) = 0 lseek(5, 0, SEEK_END) = 8192 lseek(6, 0, SEEK_END) = 0 lseek(9, 0, SEEK_END) = 0 lseek(10, 0, SEEK_END) = 0 lseek(11, 0, SEEK_END) = 0 lseek(13, 0, SEEK_END) = 0 lseek(14, 0, SEEK_END) = 0 lseek(15, 0, SEEK_END) = 0 lseek(16, 0, SEEK_END) = 0 lseek(17, 0, SEEK_END) = 0 lseek(18, 0, SEEK_END) = 0 lseek(19, 0, SEEK_END) = 0 lseek(20, 0, SEEK_END) = 0 lseek(21, 0, SEEK_END) = 11214848 More info: the statement is an INSERT with some huge subselects, running every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all occasions, the job terminates in about 1-2 minutes - however, sometimes it just hangs. If terminated with kill <pid> and restarted, it always terminates. Clueless, Peter
Peter Hinse <loco@d0pefish.de> writes: > More info: the statement is an INSERT with some huge subselects, running > every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all > occasions, the job terminates in about 1-2 minutes - however, sometimes > it just hangs. If terminated with kill <pid> and restarted, it always > terminates. Define "huge" --- you mean a lot of relations in the query? If you have enough to trigger GEQO optimization, it could be that it's sometimes picking a bad plan. You might try raising the geqo threshold to more relations than that; or if this results in unacceptably long planning time, increase geqo_effort instead. regards, tom lane
Tom Lane schrieb: > Peter Hinse <loco@d0pefish.de> writes: >> More info: the statement is an INSERT with some huge subselects, running >> every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all >> occasions, the job terminates in about 1-2 minutes - however, sometimes >> it just hangs. If terminated with kill <pid> and restarted, it always >> terminates. > > Define "huge" --- you mean a lot of relations in the query? If you have > enough to trigger GEQO optimization, it could be that it's sometimes > picking a bad plan. You might try raising the geqo threshold to more > relations than that; or if this results in unacceptably long planning > time, increase geqo_effort instead. Hi Tom, thanks for your hints, I will check this with our R&D people. And yes: with "huge" I meant the number of relations. Regards, Peter
Tom Lane schrieb: > Peter Hinse <loco@d0pefish.de> writes: >> More info: the statement is an INSERT with some huge subselects, running >> every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all >> occasions, the job terminates in about 1-2 minutes - however, sometimes >> it just hangs. If terminated with kill <pid> and restarted, it always >> terminates. > > Define "huge" --- you mean a lot of relations in the query? If you have > enough to trigger GEQO optimization, it could be that it's sometimes > picking a bad plan. You might try raising the geqo threshold to more > relations than that; or if this results in unacceptably long planning > time, increase geqo_effort instead. After downgrading to 8.3.5, the massive problems we had with our system are gone. With 8.3.6, we had these hanging jobs in 100% of all runs (this is a DWH, the job calculates the ranking for users). We tried several settings for GEQO, that made it even worse - some jobs, that never had any problems, started to act weird (did not terminate) with the same lseek in strace. We will investigate this issue with some more test machines. Regards, Peter