Обсуждение: return value from pq_putmessage() is widely ignored
pq_putmessage() is a macro which calls a function that is normally socket_putmessage(), which returns either 0 on success or EOF in the case of failure. Most callers ignore the return value, sometimes with an explicit cast to void, and other times without such a cast. As far as I can see, the only case where we actually do anything significant with the return value is in basebackup.c, where two of the calls to pq_putmessage() do this: if (pq_putmessage('d', buf, cnt)) ereport(ERROR, (errmsg("base backup could not send data, aborting backup"))); The other six calls in that file do not have a similar protection, and there does not seem to be any explanation of why those two places are special as compared either with other places in the same file or with other parts of PostgreSQL, so I'm a little bit confused as to what's going on here. Why do we return 0 or EOF only to ignore it, instead of throwing ERROR like we do in most places? One problem is that we might get into error recursion trouble: we don't want to try to send an ErrorResponse, fail, and then respond by generating another ErrorResponse, which will again fail, leading to blowing out the error stack. It's also worth considering that the error might be occurring halfway through sending some other protocol message, in which case we can't start a new protocol message without finishing the previous one. But the point is that in a case like this, we've lost the client connection anyway. It seems like what we ought to be doing is what ProcessInterrupts does in this situation: /* don't send to client, we already know the connection to be dead. */ whereToSendOutput = DestNone; ereport(FATAL, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("connection to client lost"))); Why don't we just make internal_flush() do that directly in the event of a failure, instead of setting flags that make it happen later and then returning an error indicator? Can it be unsafe to throw an error here? Do we have places that are calling pq_putmessage() inside critical sections or something? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > pq_putmessage() is a macro which calls a function that is normally > socket_putmessage(), which returns either 0 on success or EOF in the > case of failure. Most callers ignore the return value, sometimes with > an explicit cast to void, and other times without such a cast. As far > as I can see, the only case where we actually do anything significant > with the return value is in basebackup.c, where two of the calls to > pq_putmessage() do this: > if (pq_putmessage('d', buf, cnt)) > ereport(ERROR, > (errmsg("base > backup could not send data, aborting backup"))); A preliminary survey says that basebackup.c is wrong here, and it should be ignoring the return value just like the rest of the world. pqformat.c is of the opinion that pqcomm.c is taking care of it: (void) pq_putmessage(buf->cursor, buf->data, buf->len); /* no need to complain about any failure, since pqcomm.c already did */ and in fact that appears to be the case. As far as I can see, the only place that's doing anything appropriate with the result is socket_putmessage_noblock: res = pq_putmessage(msgtype, s, len); Assert(res == 0); /* should not fail when the message fits in * buffer */ Perhaps the value of that Assert is not worth the amount of confusion generated by having a result value, and we should just drop it and change pq_putmessage to return void. > One problem is that we might get into error recursion trouble: we > don't want to try to send an ErrorResponse, fail, and then respond by > generating another ErrorResponse, which will again fail, leading to > blowing out the error stack. Yup. This is why it's dealt with internally to pqcomm.c. regards, tom lane