Обсуждение: Re: [ADMIN] ERROR: could not read block

Поиск
Список
Период
Сортировка

Re: [ADMIN] ERROR: could not read block

От
"Magnus Hagander"
Дата:
[copying this one over to hackers]

> Our DBAs reviewed the Microsoft documentation you referenced,
> modified the registry, and rebooted the OS.  We've been
> beating up on the database without seeing the error so far.
> We'll keep at it for a while.

Very interesting. As this seems to be a resource error, a couple of
questions. Sorry if you've already answered some of them, couldn't find
it in the archives.

1) Is this a dedicated pg server, or does it have something else on it?

2) We have to ask this - do you run any antivirus on it, that might nto
be releasing resources the right way? Anything else that might stick in
a kernel driver?

3) Are you hitting the database with many connections, or is this a
single/few connection scenario? Are the other connections typically
active when this shows up?


Seems like we could just retry when we get this failure. The question is
we need to do a small amount of sleep before we do? Also, we can't just
retry forever, there has to be some kind of end to it...
(If you read the SQL kb, it can be read as retrying is the correct
thing, because the bug in sql was that it didn't retry)

//Magnus

Re: [ADMIN] ERROR: could not read block

От
"Qingqing Zhou"
Дата:
""Magnus Hagander"" <mha@sollentuna.net> wrote
>
> Seems like we could just retry when we get this failure. The question is
> we need to do a small amount of sleep before we do? Also, we can't just
> retry forever, there has to be some kind of end to it...
> (If you read the SQL kb, it can be read as retrying is the correct
> thing, because the bug in sql was that it didn't retry)
>

Agree on the retry solution. Yes, two important factors are: intervals, 
times.  I suspect if it is a dedicated server, serveral retry can handle it. 
But for a server might running backup together, who knows how long we need. 
But in either way, I don't think an endless loop is needed -- at most 3 
minutes (since s_lock() does this :-)).

Also, this is a partial solution to the "invalid parameter" win32 IO 
problem. There are some other cases like ACESS_VIOLATION error need more 
evidence to pin down.

Regards,
Qingqing

P.s. Go to be out of town for several days ...