Обсуждение: I feel the need for speed. What am I doing wrong?

Поиск
Список
Период
Сортировка

I feel the need for speed. What am I doing wrong?

От
"Dann Corbit"
Дата:
I have a query using two postgres tables.
One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called
"CNX_DS2_53_SIS_STU_OPT_FEE_TB".
I am getting 3 times slower performance than Microsoft Access when
performing a left outer join.

Here are the tables in question:

connxdatasync=# \d "CNX_DS_53_SIS_STU_OPT_FEE_TB"  Table "CNX_DS_53_SIS_STU_OPT_FEE_TB"  Attribute    |     Type      |
Modifier
----------------+---------------+----------RT_REC_KEY     | character(14) |cnxarraycolumn | integer       |CRC
 | bigint        | not null 
connxdatasync=# \d "CNX_DS2_53_SIS_STU_OPT_FEE_TB"  Table "CNX_DS2_53_SIS_STU_OPT_FEE_TB"  Attribute    |     Type
|Modifier 
----------------+---------------+----------RT_REC_KEY     | character(14) |cnxarraycolumn | integer       |CRC
 | bigint        | not null 

Here is the query:
select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
"CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
"CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY"
and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;
Creating the following index had no effect on performance!
create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY",
"cnxarraycolumn", "CRC");
Both tables had 6139062 rows of data.
In this query ... all rows of data match perfectly, so no results are
returned.

Is there a way to reformulate this query so that it will use the index?


Re: I feel the need for speed. What am I doing wrong?

От
"Nigel J. Andrews"
Дата:
Added -general list so that the next followup can remove -hackers and everyone
there will have had notice.


On Mon, 6 Jan 2003, Dann Corbit wrote:
>
> I have a query using two postgres tables.
> One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called
> "CNX_DS2_53_SIS_STU_OPT_FEE_TB".
>
> I am getting 3 times slower performance than Microsoft Access when
> performing a left outer join.
>
> ...
>
> Here is the query:
> select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
> "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
> "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY"
> and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;
>
>
> Creating the following index had no effect on performance!
> create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY",
> "cnxarraycolumn", "CRC");
>
> Both tables had 6139062 rows of data.
>
> In this query ... all rows of data match perfectly, so no results are
> returned.

I suspect you get no results because it's unlikely b.oid will be null. Are you
sure the query is how it should be since you seem to be expecting no rows to be
returned and yet your reason for that doesn't match the query as shown. Without
the oid test I'd bet you get a result set of 6139062 rows.

> Is there a way to reformulate this query so that it will use the index?

Given the above comment I'd say no since the entirety of both tables will be
tested to make the result set.

Alternatively, if the query is right try something along the lines of:

SELECT a.blah, a.foo,
  FROM a, b
  WHERE a.blah = b.blah AND a.foo = b.foo AND b.oid IS NULL

if that doesn't use a query try pushing the null test into a subselect like:

SELECT a.blah, a.foo,
  FROM a, (SELECT * FROM b WHERE oid IS NULL) b
  WHERE a.blah = b.blah AND a.foo = b.foo


After that let's hope I haven't embarrassed myself.


--
Nigel J. Andrews


Re: I feel the need for speed. What am I doing wrong?

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
> Sent: Monday, January 06, 2003 4:58 PM
> To: Dann Corbit
> Cc: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
> Subject: Re: [HACKERS] I feel the need for speed. What am I
> doing wrong?
>
> Added -general list so that the next followup can remove
> -hackers and everyone there will have had notice.
>
>
> On Mon, 6 Jan 2003, Dann Corbit wrote:
> >
> > I have a query using two postgres tables.
> > One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called
> > "CNX_DS2_53_SIS_STU_OPT_FEE_TB".
> >
> > I am getting 3 times slower performance than Microsoft Access when
> > performing a left outer join.
> >
> > ...
> >
> > Here is the query:
> > select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
> > "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
> > "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" =
> b."RT_REC_KEY"
> > and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;
> >
> >
> > Creating the following index had no effect on performance! create
> > unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY",
> > "cnxarraycolumn", "CRC");
> >
> > Both tables had 6139062 rows of data.
> >
> > In this query ... all rows of data match perfectly, so no
> results are
> > returned.
>
> I suspect you get no results because it's unlikely b.oid will
> be null. Are you sure the query is how it should be since you
> seem to be expecting no rows to be returned and yet your
> reason for that doesn't match the query as shown. Without the
> oid test I'd bet you get a result set of 6139062 rows.

There are supposed to be no results for this particular query.
The data is unique with only RT_REC_KEY and cnxarraycolumn (I tried
using that as an index also).

The basic gist of it is like this:

I want to know where the keys match (e.g.: RT_REC_KEY + cnxarraycolumn)
but the CRC has changed (which will tell me what data has changed).
This is for a data synchronization system that uses PostgreSQL as a join
engine.  I store primary key data together with a 64 bit CRC in data
tables stored in PostgreSQL.  In this particular case, there were no
changes but there will be changes at other times.
> > Is there a way to reformulate this query so that it will use the
> > index?
>
> Given the above comment I'd say no since the entirety of both
> tables will be tested to make the result set.
>
> Alternatively, if the query is right try something along the lines of:
>
> SELECT a.blah, a.foo,
>   FROM a, b
>   WHERE a.blah = b.blah AND a.foo = b.foo AND b.oid IS NULL

OID is never null.   I don't think that this query is equivalent.  This
Oid is just the standard PostgreSQL Oid, and not some arbitrary field.
> if that doesn't use a query try pushing the null test into a
> subselect like:
>
> SELECT a.blah, a.foo,
>   FROM a, (SELECT * FROM b WHERE oid IS NULL) b
>   WHERE a.blah = b.blah AND a.foo = b.foo
>

OID is never null.   I don't think that this query is equivalent.
> After that let's hope I haven't embarrassed myself.

Probably, I did not explain myself clearly enough.


Re: I feel the need for speed. What am I doing wrong?

От
Tom Lane
Дата:
"Dann Corbit" <DCorbit@connx.com> writes:
> Creating the following index had no effect on performance!
> create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY",
> "cnxarraycolumn", "CRC");

What does EXPLAIN ANALYZE have to say about the query?  If you set
enable_seqscan = 0, does the explain result change?
        regards, tom lane


Re: I feel the need for speed. What am I doing wrong?

От
Tom Lane
Дата:
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
>> select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from
>> "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join
>> "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY"
>> and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ;

> I suspect you get no results because it's unlikely b.oid will be null.

Try "it's impossible for b.oid to be null --- unless a dummy b row is
being provided by the LEFT JOIN".  I interpret the purpose of the query
to be to look for "a" rows that have no matching "b" row.

Using OID for this is kind of cute, I guess, since it is guaranteed
not-null in a real row; he doesn't have to think about whether any of
his regular columns are not-null.

            regards, tom lane