Обсуждение: Performance penalty when requesting text values in binary format
I'm the creator of the PostgreSQL driver pgx (https://github.com/jackc/pgx) for the Go language. I have found significant performance advantages to using the extended protocol and binary format values -- in particular for types such as timestamptz.
However, I was recently very surprised to find that it is significantly slower to select a text type value in the binary format. For an example case of selecting 1,000 rows each with 5 text columns of 16 bytes each the application time from sending the query to having received the entire response is approximately 16% slower. Here is a link to the test benchmark: https://github.com/jackc/pg_text_binary_bench
Given that the text and binary formats for the text type are identical I would not have expected any performance differences.
My C is rusty and my knowledge of the PG server internals is minimal but the performance difference appears to be that function textsend creates an extra copy where textout simply returns a pointer to the existing data. This seems to be superfluous.
I can work around this by specifying the format per result column instead of specifying binary for all but this performance bug / anomaly seemed worth reporting.
Jack
On Sat, 2020-05-16 at 20:12 -0500, Jack Christensen wrote: > I'm the creator of the PostgreSQL driver pgx (https://github.com/jackc/pgx) for the Go language. > I have found significant performance advantages to using the extended protocol and binary format > values -- in particular for types such as timestamptz. > > However, I was recently very surprised to find that it is significantly slower to select a text > type value in the binary format. For an example case of selecting 1,000 rows each with 5 text > columns of 16 bytes each the application time from sending the query to having received the > entire response is approximately 16% slower. Here is a link to the test benchmark: > https://github.com/jackc/pg_text_binary_bench > > Given that the text and binary formats for the text type are identical I would not have > expected any performance differences. > > My C is rusty and my knowledge of the PG server internals is minimal but the performance > difference appears to be that function textsend creates an extra copy where textout > simply returns a pointer to the existing data. This seems to be superfluous. > > I can work around this by specifying the format per result column instead of specifying > binary for all but this performance bug / anomaly seemed worth reporting. Did you profile your benchmark? It would be interesting to know where the time is spent. Yours, Laurenz Albe
On Mon, May 18, 2020 at 7:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Did you profile your benchmark?
It would be interesting to know where the time is spent.
Unfortunately, I have not. Fortunately, it appears that Tom Lane recognized this as a part of another issue and has prepared a patch.
Thanks,
Jack