Обсуждение: order by question
Hello, I am confused by how postgres 8,4..13 is sorting my data. \d test Table "public.test" Column | Type | Modifiers --------+------+----------- data | text | select * from test order by data; data ---------- - -- 1 11 11F 1F a b C F -F Feneric Generic (14 rows) The first row is a single space, the next row a single -, the next two -- . What I don't understand is why the '-F', the ' Feneric' and the ' Generic' sort where they do. I would expect the output to be like this: data ---------- Feneric Generic - -- -F 1 11 11F 1F a b C F (14 rows) client_encoding ----------------- SQL_ASCII lc_collate ------------- en_US.UTF-8 foxboxconfig=# show lc_ctype; lc_ctype ------------- en_US.UTF-8 Thanks for any clarification. -- Stephen Clark
Steve Clark <sclark@netwolves.com> wrote: > I am confused by how postgres 8,4..13 is sorting my data. > select * from test order by data; > data > ---------- > > - > -- > 1 > 11 > 11F > 1F > a > b > C > F > -F > Feneric > Generic > (14 rows) > > The first row is a single space, the next row a single -, the next two -- . > What I don't understand is why the '-F', the ' Feneric' and > the ' Generic' sort where they do. > > I would expect the output to be like this: > > data > ---------- > > Feneric > Generic > - > -- > -F > 1 > 11 > 11F > 1F > a > b > C > F > (14 rows) > lc_collate > ------------- > en_US.UTF-8 PostgreSQL uses the OS collations. What you are getting matches my Ubuntu 14.04 machine: kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG en_US.UTF-8 kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX > > Feneric > Generic > - > -- > -F > 1 > 11 > 11F > 1F > a > b > C > F > XXX - -- 1 11 11F 1F a b C F -F Feneric Generic -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 08/07/2014 11:36 AM, Kevin Grittner wrote: > Steve Clark <sclark@netwolves.com> wrote: > >> I am confused by how postgres 8,4..13 is sorting my data. >> select * from test order by data; >> data >> ---------- >> >> - >> -- >> 1 >> 11 >> 11F >> 1F >> a >> b >> C >> F >> -F >> Feneric >> Generic >> (14 rows) >> >> The first row is a single space, the next row a single -, the next two -- . >> What I don't understand is why the '-F', the ' Feneric' and >> the ' Generic' sort where they do. >> >> I would expect the output to be like this: >> >> data >> ---------- >> >> Feneric >> Generic >> - >> -- >> -F >> 1 >> 11 >> 11F >> 1F >> a >> b >> C >> F >> (14 rows) >> lc_collate >> ------------- >> en_US.UTF-8 > PostgreSQL uses the OS collations. What you are getting matches my > Ubuntu 14.04 machine: > > kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG > en_US.UTF-8 > kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX >> Feneric >> Generic >> - >> -- >> -F >> 1 >> 11 >> 11F >> 1F >> a >> b >> C >> F >> XXX > - > -- > 1 > 11 > 11F > 1F > a > b > C > F > -F > Feneric > Generic > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Hi Kevin, Thanks for the response. I get the same thing as postgres if I sort from the command line too. But I don't understand why. I would expect '-F' to sort immediately after '-' and '--' not after 'F' as well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort immediately after ' ' (space). It is like the space character and the - in -Letter is ignored. -- Stephen Clark
On 08/07/2014 08:53 AM, Steve Clark wrote: > On 08/07/2014 11:36 AM, Kevin Grittner wrote: >> Steve Clark <sclark@netwolves.com> wrote: >> >> >> -- >> Kevin Grittner >> EDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> > Hi Kevin, > > Thanks for the response. I get the same thing as postgres if I sort from > the command line too. But I don't understand why. Because that is how en_US.UTF-8 sorts. You are probably looking for the 'C' sort order. For more explanations see: http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html http://superuser.com/questions/227925/in-utf-8-collation-why-11-is-less-then-1 > > I would expect '-F' to sort immediately after '-' and '--' not after 'F' as > well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort > immediately after ' ' (space). > > It is like the space character and the - in -Letter is ignored. See the second link above for why that is. > > -- Adrian Klaver adrian.klaver@aklaver.com
Steve Clark <sclark@netwolves.com> wrote: > It is like the space character and the - in -Letter is ignored. Yes, that is how the en_US collation is defined. I think the goal is to make it something like "phone book" ordering. If you still have a "white pages" book around, look at how a business name with a hyphen (or a hyphenated last name) is sorted there. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 08/07/2014 12:10 PM, Adrian Klaver wrote: > On 08/07/2014 08:53 AM, Steve Clark wrote: >> On 08/07/2014 11:36 AM, Kevin Grittner wrote: >>> Steve Clark <sclark@netwolves.com> wrote: >>> >>> -- >>> Kevin Grittner >>> EDB: http://www.enterprisedb.com >>> The Enterprise PostgreSQL Company >>> >> Hi Kevin, >> >> Thanks for the response. I get the same thing as postgres if I sort from >> the command line too. But I don't understand why. > Because that is how en_US.UTF-8 sorts. You are probably looking for the > 'C' sort order. For more explanations see: > > http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html > > http://superuser.com/questions/227925/in-utf-8-collation-why-11-is-less-then-1 > >> I would expect '-F' to sort immediately after '-' and '--' not after 'F' as >> well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort >> immediately after ' ' (space). >> >> It is like the space character and the - in -Letter is ignored. > See the second link above for why that is. > >> > Thanks Adrian and Kevin, for the explanations and links. They were very elucidating. At least now I understand what is happening. Thanks again. -- Stephen Clark
On 08/07/2014 11:36 AM, Kevin Grittner wrote: > Steve Clark <sclark@netwolves.com> wrote: > >> I am confused by how postgres 8,4..13 is sorting my data. >> select * from test order by data; >> data >> ---------- >> >> - >> -- >> 1 >> 11 >> 11F >> 1F >> a >> b >> C >> F >> -F >> Feneric >> Generic >> (14 rows) >> >> The first row is a single space, the next row a single -, the next two -- . >> What I don't understand is why the '-F', the ' Feneric' and >> the ' Generic' sort where they do. >> >> I would expect the output to be like this: >> >> data >> ---------- >> >> Feneric >> Generic >> - >> -- >> -F >> 1 >> 11 >> 11F >> 1F >> a >> b >> C >> F >> (14 rows) >> lc_collate >> ------------- >> en_US.UTF-8 > PostgreSQL uses the OS collations. What you are getting matches my > Ubuntu 14.04 machine: > > kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG > en_US.UTF-8 > kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX >> Feneric >> Generic >> - >> -- >> -F >> 1 >> 11 >> 11F >> 1F >> a >> b >> C >> F >> XXX > - > -- > 1 > 11 > 11F > 1F > a > b > C > F > -F > Feneric > Generic > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Hi Kevin, Thanks for the response. I get the same thing as postgres if I sort from the command line too. But I don't understand why. I would expect '-F' to sort immediately after '-' and '--' not after 'F' as well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort immediately after ' ' (space). It is like the space character and the - in -Letter is ignored. -- Stephen Clark