Обсуждение: lseek

Поиск
Список
Период
Сортировка

lseek

От
Peter Hinse
Дата:
Hi all,

what might possibly be wrong with a database/statement what so ever, if
a strace to a postgres process shows this over several ours without
terminating?

lseek(3, 0, SEEK_END)                   = 0
lseek(5, 0, SEEK_END)                   = 8192
lseek(6, 0, SEEK_END)                   = 0
lseek(9, 0, SEEK_END)                   = 0
lseek(10, 0, SEEK_END)                  = 0
lseek(11, 0, SEEK_END)                  = 0
lseek(13, 0, SEEK_END)                  = 0
lseek(14, 0, SEEK_END)                  = 0
lseek(15, 0, SEEK_END)                  = 0
lseek(16, 0, SEEK_END)                  = 0
lseek(17, 0, SEEK_END)                  = 0
lseek(18, 0, SEEK_END)                  = 0
lseek(19, 0, SEEK_END)                  = 0
lseek(20, 0, SEEK_END)                  = 0
lseek(21, 0, SEEK_END)                  = 11214848
lseek(22, 0, SEEK_END)                  = 0
lseek(23, 0, SEEK_END)                  = 0
lseek(24, 0, SEEK_END)                  = 0
lseek(25, 0, SEEK_END)                  = 0
lseek(26, 0, SEEK_END)                  = 0
lseek(27, 0, SEEK_END)                  = 0
lseek(28, 0, SEEK_END)                  = 0
lseek(29, 0, SEEK_END)                  = 0
lseek(30, 0, SEEK_END)                  = 0
lseek(31, 0, SEEK_END)                  = 0
lseek(32, 0, SEEK_END)                  = 0
lseek(3, 0, SEEK_END)                   = 0
lseek(5, 0, SEEK_END)                   = 8192
lseek(6, 0, SEEK_END)                   = 0
lseek(9, 0, SEEK_END)                   = 0
lseek(10, 0, SEEK_END)                  = 0
lseek(11, 0, SEEK_END)                  = 0
lseek(13, 0, SEEK_END)                  = 0
lseek(14, 0, SEEK_END)                  = 0
lseek(15, 0, SEEK_END)                  = 0
lseek(16, 0, SEEK_END)                  = 0
lseek(17, 0, SEEK_END)                  = 0
lseek(18, 0, SEEK_END)                  = 0
lseek(19, 0, SEEK_END)                  = 0
lseek(20, 0, SEEK_END)                  = 0
lseek(21, 0, SEEK_END)                  = 11214848

More info: the statement is an INSERT with some huge subselects, running
every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all
occasions, the job terminates in about 1-2 minutes - however, sometimes
it just hangs. If terminated with kill <pid> and restarted, it always
terminates.

Clueless,

    Peter

Re: lseek

От
Tom Lane
Дата:
Peter Hinse <loco@d0pefish.de> writes:
> More info: the statement is an INSERT with some huge subselects, running
> every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all
> occasions, the job terminates in about 1-2 minutes - however, sometimes
> it just hangs. If terminated with kill <pid> and restarted, it always
> terminates.

Define "huge" --- you mean a lot of relations in the query?  If you have
enough to trigger GEQO optimization, it could be that it's sometimes
picking a bad plan.  You might try raising the geqo threshold to more
relations than that; or if this results in unacceptably long planning
time, increase geqo_effort instead.

            regards, tom lane

Re: lseek

От
Peter Hinse
Дата:
Tom Lane schrieb:
> Peter Hinse <loco@d0pefish.de> writes:
>> More info: the statement is an INSERT with some huge subselects, running
>> every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all
>> occasions, the job terminates in about 1-2 minutes - however, sometimes
>> it just hangs. If terminated with kill <pid> and restarted, it always
>> terminates.
>
> Define "huge" --- you mean a lot of relations in the query?  If you have
> enough to trigger GEQO optimization, it could be that it's sometimes
> picking a bad plan.  You might try raising the geqo threshold to more
> relations than that; or if this results in unacceptably long planning
> time, increase geqo_effort instead.

Hi Tom,

thanks for your hints, I will check this with our R&D people. And yes:
with "huge" I meant the number of relations.

Regards,

    Peter

Re: lseek

От
Peter Hinse
Дата:
Tom Lane schrieb:
> Peter Hinse <loco@d0pefish.de> writes:
>> More info: the statement is an INSERT with some huge subselects, running
>> every night on a PGSQL 8.3.6 on CentOS 4.7 x86_64. In 97% of all
>> occasions, the job terminates in about 1-2 minutes - however, sometimes
>> it just hangs. If terminated with kill <pid> and restarted, it always
>> terminates.
>
> Define "huge" --- you mean a lot of relations in the query?  If you have
> enough to trigger GEQO optimization, it could be that it's sometimes
> picking a bad plan.  You might try raising the geqo threshold to more
> relations than that; or if this results in unacceptably long planning
> time, increase geqo_effort instead.

After downgrading to 8.3.5, the massive problems we had with our system
are gone. With 8.3.6, we had these hanging jobs in 100% of all runs
(this is a DWH, the job calculates the ranking for users). We tried
several settings for GEQO, that made it even worse - some jobs, that
never had any problems, started to act weird (did not terminate) with
the same lseek in strace.

We will investigate this issue with some more test machines.

Regards,

    Peter