Обсуждение: Weird behaviour on a join with multiple keys
Hi, I'm getting unexpected results on a query which involves joining two tables on two common variables (firstname and lastname). This is the basic query: SELECT table1.lastname, table1.firstname FROM table1 INNER JOIN table2 ON (table2.name = table1.name AND table2.vorname = table1.vorname) This is returning many rows fewer than I expect and is ignoring a lot where table1.firstname = table2.firstname AND table1.lastname = table2.lastname. In fact when I extend the query by a WHERE clause such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are not returned by the original query. I'm not very au fait with the inner workings of PostgreSQL but EXPLAIN does not seem, to me at least, to provide an explanation for the missing results. "Merge Join (cost=1987.97..2121.24 rows=34 width=22)" " Merge Cond: (("outer"."?column3?" = "inner"."?column3?") AND ("outer"."?column4?" = "inner"."?column4?"))" " -> Sort (cost=364.97..375.99 rows=4409 width=22)" " Sort Key: (table1.lastname)::text, (table1.firstname)::text" " -> Seq Scan on table1 (cost=0.00..98.09 rows=4409 width=22)" " -> Sort (cost=1623.00..1667.00 rows=17599 width=21)" " Sort Key: (table2.lastname)::text, (table2.firstname)::text" " -> Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)" Am I missing something big and obvious here? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226
What happens if you do an outer join instead of an inner join? Charlie Clark wrote: > Hi, > > I'm getting unexpected results on a query which involves joining two > tables on two common variables (firstname and lastname). > > This is the basic query: > > SELECT table1.lastname, table1.firstname > FROM table1 > INNER JOIN table2 ON > (table2.name = table1.name > AND > table2.vorname = table1.vorname) > > This is returning many rows fewer than I expect and is ignoring a lot > where table1.firstname = table2.firstname AND table1.lastname = > table2.lastname. In fact when I extend the query by a WHERE clause > such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are > not returned by the original query. > > I'm not very au fait with the inner workings of PostgreSQL but EXPLAIN > does not seem, to me at least, to provide an explanation for the > missing results. > > "Merge Join (cost=1987.97..2121.24 rows=34 width=22)" > " Merge Cond: (("outer"."?column3?" = "inner"."?column3?") AND > ("outer"."?column4?" = "inner"."?column4?"))" > " -> Sort (cost=364.97..375.99 rows=4409 width=22)" > " Sort Key: (table1.lastname)::text, (table1.firstname)::text" > " -> Seq Scan on table1 (cost=0.00..98.09 rows=4409 width=22)" > " -> Sort (cost=1623.00..1667.00 rows=17599 width=21)" > " Sort Key: (table2.lastname)::text, (table2.firstname)::text" > " -> Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)" > > Am I missing something big and obvious here? > > Charlie > -- > Charlie Clark > Helmholtzstr. 20 > Düsseldorf > D- 40215 > Tel: +49-211-938-5360 > GSM: +49-178-782-6226 > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/08/07 16:09, Charlie Clark wrote: > Hi, > > I'm getting unexpected results on a query which involves joining two > tables on two common variables (firstname and lastname). > > This is the basic query: > > SELECT table1.lastname, table1.firstname > FROM table1 > INNER JOIN table2 ON > (table2.name = table1.name > AND > table2.vorname = table1.vorname) > > This is returning many rows fewer than I expect and is ignoring a lot > where table1.firstname = table2.firstname AND table1.lastname = > table2.lastname. Huh? Why should you? You're not joining on firstname and lastname. What happens if you do it like this: SELECT T1.LASTNAME, T2.FIRSTNAME FROM TABLE1 T1, TABLE2 T2 WHERE T1.NAME = T2.NAME AND T1.VORNAME = T2.VORNAME; -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFF8N1zS9HxQb37XmcRAp3wAKCRJ1kuoqbc8YPOZwx+53+JRqvD/ACfVvFy zK8u0+RYuMiBxEnURVc74Jc= =9oxj -----END PGP SIGNATURE-----
Charlie Clark <charlie@begeistert.org> writes: > I'm getting unexpected results on a query which involves joining two > tables on two common variables (firstname and lastname). That looks like it should work. Given that you describe the columns as "names" I'm supposing they are of textual datatypes. Maybe you have a messed-up encoding or locale situation that is causing the sorts to not work properly? What PG version is this exactly, on what platform, and what do "show lc_collate" and "show server_encoding" say? regards, tom lane
Am 09.03.2007 um 05:30 schrieb Tom Lane: > Charlie Clark <charlie@begeistert.org> writes: >> I'm getting unexpected results on a query which involves joining two >> tables on two common variables (firstname and lastname). > > That looks like it should work. Given that you describe the > columns as > "names" I'm supposing they are of textual datatypes. Maybe you have a > messed-up encoding or locale situation that is causing the sorts to > not > work properly? What PG version is this exactly, on what platform, and > what do "show lc_collate" and "show server_encoding" say? I'm running PostgreSQL 8.1.4 on Mac OS X psytec=# show lc_collate; lc_collate ------------- de_DE.UTF-8 (1 row) psytec=# show server_encoding; server_encoding ----------------- LATIN1 (1 row) I thought that it might be something to do with the encoding - one of the tables has just been imported and I had some "fun" doing that but it "looks" okay now. Is there a way of checking? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226
Charlie Clark <charlie@begeistert.org> writes: > psytec=# show lc_collate; > lc_collate > ------------- > de_DE.UTF-8 > (1 row) > psytec=# show server_encoding; > server_encoding > ----------------- > LATIN1 > (1 row) There's your problem right there. The string comparison routines are built on strcoll(), which is going to expect UTF8-encoded data because of the LC_COLLATE setting. If there are any high-bit-set LATIN1 characters in the database, they will most likely look like invalid encoding to strcoll(), and on most platforms that causes it to behave very oddly. You need to keep lc_collate (and lc_ctype) in sync with server_encoding. regards, tom lane
Am 09.03.2007 um 16:15 schrieb Tom Lane: >> psytec=# show lc_collate; >> lc_collate >> ------------- >> de_DE.UTF-8 >> (1 row) > >> psytec=# show server_encoding; >> server_encoding >> ----------------- >> LATIN1 >> (1 row) > > There's your problem right there. The string comparison routines are > built on strcoll(), which is going to expect UTF8-encoded data because > of the LC_COLLATE setting. If there are any high-bit-set LATIN1 > characters in the database, they will most likely look like invalid > encoding to strcoll(), and on most platforms that causes it to behave > very oddly. You need to keep lc_collate (and lc_ctype) in sync with > server_encoding. That does indeed seem to have been the problem even though the examples I was looking at were all using plain ASCII characters. Glad to know it wasn't a bug and to have learned something new. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226
Charlie Clark <charlie@begeistert.org> writes: > Am 09.03.2007 um 16:15 schrieb Tom Lane: >> There's your problem right there. The string comparison routines are >> built on strcoll(), which is going to expect UTF8-encoded data because >> of the LC_COLLATE setting. If there are any high-bit-set LATIN1 >> characters in the database, they will most likely look like invalid >> encoding to strcoll(), and on most platforms that causes it to behave >> very oddly. You need to keep lc_collate (and lc_ctype) in sync with >> server_encoding. > That does indeed seem to have been the problem even though the > examples I was looking at were all using plain ASCII characters. Glad > to know it wasn't a bug and to have learned something new. Well, it *is* a bug: we really shouldn't let you select incompatible locale and encoding settings. This gotcha has been known for a long time, but it's not clear that there's a bulletproof, portable way to determine which encoding a particular locale setting implies ... regards, tom lane