Обсуждение: BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.

Поиск
Список
Период
Сортировка

BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18096
Logged by:          Masatoshi Fukunaga
Email address:      mah0x211@gmail.com
PostgreSQL version: 14.4
Operating system:   macOS 13.5.1, Ubuntu 22.04
Description:

When processing asynchronous commands, I call the `PQconsumeInput` and
`PQisBusy` functions to check if data has arrived, as shown below, but this
does not work correctly in edge trigger mode for epoll and kqueue.

In the edge trigger mode of epoll and kqueue, calls to the
`PQconsumeInput()` and `PQisBusy()` funct

I believe the following code is correct in the way it is instructed in the
manual.

> 34.4. Asynchronous Command Processing, the following is written.  
> https://www.postgresql.org/docs/current/libpq-async.html


```C
// on edge-trigger mode, this code does not work correctly
/**
 * check if the result is readable or not
 * @return 1: readable, 0: not readable, -1: error
 */
int is_readable(PGconn *conn) {
    if (!PQconsumeInput(conn)) {
        // caller should call PQerrorMessage to get error message
        return -1;
    } else if (!PQisBusy(conn)) {
        // caller can call PQgetResult to get the result
        return 1;
    }
    // caller should be wait for the socket to become readable
    return 0;
}
```

The `PQconsumeInput()` function reads input data by calling the
`pqReadData()` function internally and using the `pqsecure_read()` function
is used to read the input data.

However, the `pqReadData()` function will not call the `pqsecure_read()`
function until the `errno` is set to `EAGAIN` or `EWOULDBLOCK`, so if you
poll after the `PQisBusy()` call returns `1`, readable event will not fire
and will be permanently in a wait state.

By the way, I am aware that even if the result of a read by
`pqsecure_read()` does not result in `EAGAIN` or `EWOULDBLOCK`, the event
will still be raised if all the data in the socket has been read.

The problem seems to be that `PQisBusy()` is returning `1`, but the
preceding call to `PQconsumeInput()` has not read all the data in the
socket.

So, if I check the errno and branch the process as follows, it works fine.


```C
/**
 * check if the result is readable or not
 * @return 1: readable, 0: not readable, -1: error
 */
int is_readable(PGconn *conn) {
    int should_retry = 0;

RETRY:
    errno = 0;
    if (!PQconsumeInput(conn)) {
        // caller should call PQerrorMessage to get error message
        return -1;
    } 
    should_retry = errno != EAGAIN && errno != EWOULDBLOCK;

    if (!PQisBusy(conn)) {
        // caller can call PQgetResult to get the result
        return 1;
    } else if(should_retry) {
        // it is necessary to retry because the data has not been read
completely
        goto RETRY;
    }
    // caller should be wait for the socket to become readable
    return 0;
}
```


PG Bug reporting form <noreply@postgresql.org> writes:
> When processing asynchronous commands, I call the `PQconsumeInput` and
> `PQisBusy` functions to check if data has arrived, as shown below, but this
> does not work correctly in edge trigger mode for epoll and kqueue.

You have not really provided any evidence of a bug.  The contract
for PQconsumeInput is that it will consume *some* input if any
is available, not that it will consume *all* available input.
(I don't think there is much reason to try to change that.  In the
first place, there might not be enough buffer space, and in the
second place, even if it did consume all input, more might arrive
immediately after it looks.)

Without a self-contained test case, it's hard to be sure what is going
wrong for you; but my guess is that this is a bug in the way you are
checking for more available input rather than something libpq did
wrong.

> However, the `pqReadData()` function will not call the `pqsecure_read()`
> function until the `errno` is set to `EAGAIN` or `EWOULDBLOCK`,

Uh ... what?

            regards, tom lane



Thanks for the reply.
I'm not good at English, so I am using machine translation to correct it.
some sentences may be difficult to understand.

The following test code performs asynchronous operations using libpq along 
with either epoll or kqueue. I wasn't sure if it's appropriate to include 
a large code in an email, so I've uploaded the test code to a gist.


I can easily reproduce the issue on my macOS system which using kqueue, but 
it takes many runs to reproduce on my Linux system using epoll.

In the case of edge-triggered mode, `consume_input()` may fail depending on 
the amount of data received. This happens when `PQconsumeInput()` doesn't 
read all the data received on the socket (The size of the received data of 
the socket is larger than the size of the buffer area), and the subsequent 
call to `PQisBusy()` returns `1`. Then, waiting for a socket read event, 
it fails with a timeout.

In the case of level-triggered mode, there's no problem as events will be 
continuously generated while data remains in the socket.

The main issue here is whether to wait for data to arrive in the main loop or 
to call `PQconsumeInput()` again. This decision requires checking errno on 
the application side.

Is there any other way to resolve this issue?

-------------------------------

Masatoshi Fukunaga




2023年9月8日(金) 23:33 Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> When processing asynchronous commands, I call the `PQconsumeInput` and
> `PQisBusy` functions to check if data has arrived, as shown below, but this
> does not work correctly in edge trigger mode for epoll and kqueue.

You have not really provided any evidence of a bug.  The contract
for PQconsumeInput is that it will consume *some* input if any
is available, not that it will consume *all* available input.
(I don't think there is much reason to try to change that.  In the
first place, there might not be enough buffer space, and in the
second place, even if it did consume all input, more might arrive
immediately after it looks.)

Without a self-contained test case, it's hard to be sure what is going
wrong for you; but my guess is that this is a bug in the way you are
checking for more available input rather than something libpq did
wrong.

> However, the `pqReadData()` function will not call the `pqsecure_read()`
> function until the `errno` is set to `EAGAIN` or `EWOULDBLOCK`,

Uh ... what?

                        regards, tom lane