Обсуждение: Sort performance
I'm not sure if this is good news or bad news. Either some kudos are due to the gang that worked on the external sort performance or something's very wrong with the qsort implementation in glibc because I'm seeing Postgres's external sort perform better than qsort. This is despite Postgres external sorts having to execute filesystem calls pushing buffers back and forth between user-space and kernel-space, which seems hard to believe. I feel like something's got to be pretty far wrong with the qsort call here for this to be possible. At first I chalked this up to qsort having O(n^2) behaviour occasionally but a) This is glibc where qsort is actually mergesort which should behave pretty similarly to Postgres's mergesort and b) the input data is randomized pretty well so it really ought be a problem even were it qsort. Mem Runs Time ---- ---- ---- 1MB 18 8.25s 10MB 3 5.6s 100MB qsort 6.1s The input is a table with one column, a text field. It contains /usr/share/dict/words ordered by random() and then repeated a bunch of times. (Sorry about the imprecision, I set this table up a while ago and don't remember exactly what I did). a The machine has plenty of RAM and isn't swapping or running any other services. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Mem Runs Time > ---- ---- ---- > 1MB 18 8.25s > 10MB 3 5.6s > 100MB qsort 6.1s I'm confused what this means exactly? Are you saying that in the first two cases, there were 18 and 3 sorted runs generated in the initial pass, and in the third case we just did the sort in memory using qsort? How many items are being sorted, exactly? Since it's text, it probably also makes a big difference what LC_COLLATE setting you are using. Non-C sort locale could mean that the strcoll() calls swamp all else. How long does it take sort(1) to do the same task? regards, tom lane