Обсуждение: psql \copy from sends a lot of packets
Hi, I just noticed that if you load a file using psql: \copy <table> from <local file> it sends every line as a separate FE/BE protocol CopyData packet. That's pretty wasteful if the lines are narrow. The overhead of each CopyData packet is 5 bytes. To demonstrate, I generated a simple test file with the string "foobar" repeated 10 million times: $ perl -le 'for (1..10000000) { print "foobar" }' > /tmp/testdata and loaded that into a temp table with psql: create temporary table copytest (t text) on commit delete rows; \copy copytest from '/tmp/testdata'; I repeated and timed the \copy a few times; it takes about about 3 seconds on my laptop: postgres=# \copy copytest from '/tmp/testdata'; COPY 10000000 Time: 3039.625 ms (00:03.040) Wireshark says that that involved about 120 MB of network traffic. The size of the file on disk is only 70 MB. The attached patch modifies psql so that it buffers up 8 kB of data into each CopyData message, instead of sending one per line. That makes the operation faster: postgres=# \copy copytest from '/tmp/testdata'; COPY 10000000 Time: 2490.268 ms (00:02.490) And wireshark confirms that there's now only a bit over 70 MB of network traffic. I'll add this to the next commitfest. There's similar inefficiency in the server side in COPY TO, but I'll leave that for another patch. - Heikki
Вложения
Heikki Linnakangas <hlinnaka@iki.fi> writes: > I just noticed that if you load a file using psql: > it sends every line as a separate FE/BE protocol CopyData packet. > ... > I'll add this to the next commitfest. There's similar inefficiency in > the server side in COPY TO, but I'll leave that for another patch. The FE/BE protocol documentation is pretty explicit about this: Copy-in mode (data transfer to the server) is initiated when the backend executes a COPY FROM STDIN SQL statement. The backend sends a CopyInResponse message to the frontend. The frontend should then send zero or more CopyData messages, forming a stream of input data. (The message boundaries are not required to have anything to do with row boundaries, although that is often a reasonable choice.) ... Copy-out mode (data transfer from the server) is initiated when the backend executes a COPY TO STDOUT SQL statement. The backend sends a CopyOutResponse message to the frontend, followed by zero or more CopyData messages (always one per row), followed by CopyDone. So while changing psql isn't so much a problem, changing the server is a wire protocol break. Maybe we should do it anyway, but I'm not sure. regards, tom lane
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed The patch was marked as the one that needs review and doesn't currently have a reviewer, so I decided to take a look. The patch was tested on MacOS against master `e0271d5f`. It works fine and doesn't seem to contradict the current documentation. The future COPY TO patch may require some changes in the docs, as Tom pointed out. I also wonder if it may affect any 3rd party applications and if we care about this, but I suggest we discuss this when and if a corresponding patch will be proposed. The new status of this patch is: Ready for Committer
On 13/07/2021 14:52, Aleksander Alekseev wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: tested, passed > > The patch was marked as the one that needs review and doesn't currently have > a reviewer, so I decided to take a look. The patch was tested on MacOS against > master `e0271d5f`. It works fine and doesn't seem to contradict the current > documentation. Thanks for the review! I read through it myself one more time and spotted one bug: in interactive mode, the prompt was printed twice in the beginning of the operation. Fixed that, and pushed. - Heikki