Обсуждение: SQL performance issue with PostgreSQL compared to MySQL

Поиск
Список
Период
Сортировка

SQL performance issue with PostgreSQL compared to MySQL

От
Jeff Self
Дата:
This popped up on a local mailing list and I was alarmed at what results
the user was getting in regards to MySQL and PostgreSQL. It seems that
the query he was running on both databases was much faster in MySQL than
PostgreSQL. Here's what he wrote:

**************************************************************************
I'll post this message to the list, in case anyone wants to follow - or,
if it's too OT, let me know, and I'll stop. The query "SELECT authuser,
SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY traffic"
on a DB of approx. 1.8 million rows (same data in Pgsql and in mysql)
takes 1.83min. in mysql, and 7.36min. on pgsql. The mysql db is raw...no
indexes or anything 'tuning' done.  The pgsql db is indexed on the
'authuser' field, and I've run 'analyze logfile'. The machine is a PIII
600 w/728Mb RAM - and it's definitely CPU bound (both scream up to 100%
and stay ;). As far as effecient queries, I'm not too sure how much more
efficient that query can be made  - I'm a netadmin, not a DBA :)
**************************************************************************

My guess is he hasn't optimized PostgreSQL at all on his system. I will
try and find out more from him as to what version of PostgreSQL he's
running and try and get a copy of his postgresql.conf file. But can
anyone think of how the SQL statement could be written to be more
efficient? 

-- 
Jeff Self
Information Technology Analyst
Department of Personnel
City of Newport News
2400 Washington Avenue
Newport News, VA 23607
757-926-6930



Re: SQL performance issue with PostgreSQL compared to MySQL

От
Tom Lane
Дата:
Jeff Self <jself@nngov.com> writes:
> if it's too OT, let me know, and I'll stop. The query "SELECT authuser,
> SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY traffic"
> [is slow]

What datatype are the columns used in the query?  And what PG version
are we talking about?
        regards, tom lane


Re: SQL performance issue with PostgreSQL compared to

От
"Josh Berkus"
Дата:
Jeff,
**************************************************************************
> I'll post this message to the list, in case anyone wants to follow -
> or,
> if it's too OT, let me know, and I'll stop. The query "SELECT
> authuser,
> SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY
> traffic"
> on a DB of approx. 1.8 million rows (same data in Pgsql and in mysql)
> takes 1.83min. in mysql, and 7.36min. on pgsql. The mysql db is
> raw...no
> indexes or anything 'tuning' done.

First off, this is nonsense.  One of its benefits for web developers is
that MySQL automatically indexes everything.
> The pgsql db is indexed on the
> 'authuser' field, and I've run 'analyze logfile'. 

He also needs to index the bytes field and the traffic field.  And run
VACUUM, not just ANALYZE, if this is a high-activitly table, which I
suspect.

The machine is a
> PIII
> 600 w/728Mb RAM - and it's definitely CPU bound (both scream up to
> 100%
> and stay ;). As far as effecient queries, I'm not too sure how much
> more
> efficient that query can be made  - I'm a netadmin, not a DBA :)

We can tell.  To be blunt, MySQL is the database for non-DBAs.  He
should probably stick to using it rather than Postgres, which requires
some knowledge of performance tuning and query structure.

> My guess is he hasn't optimized PostgreSQL at all on his system. I
> will
> try and find out more from him as to what version of PostgreSQL he's
> running and try and get a copy of his postgresql.conf file. But can
> anyone think of how the SQL statement could be written to be more
> efficient? 

He also needs to up his sort_mem to the max his system and load will
allow.  He's sorting 1.8 million rows.

-Josh Berkus


Re: SQL performance issue with PostgreSQL compared to MySQL

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Jeff Self <jself@nngov.com> writes:
> > if it's too OT, let me know, and I'll stop. The query "SELECT authuser,
> > SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY traffic"
> > [is slow]
> 
> What datatype are the columns used in the query?  And what PG version
> are we talking about?

My guess is that is the lack of hashing for GROUP BY aggregating that is
killing us.  TODO has:
* Add hash for evaluating GROUP BY aggregates

and I specifically remember MySQL has such a hash. The big question is
how many unique values are there in the group.  If it is large, the
missing hashing could be the problem.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SQL performance issue with PostgreSQL compared to

От
Achilleus Mantzios
Дата:
On Wed, 19 Jun 2002, Josh Berkus wrote:

>  > The pgsql db is indexed on the
> > 'authuser' field, and I've run 'analyze logfile'. 
> 
> He also needs to index the bytes field and the traffic field.  And run
^^^^^^^

Index on an aggregate function???

I though we could only index plain normal column iscachable functions.

Am i missing something?

> 
> -Josh Berkus
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

-- 
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr



Re: SQL performance issue with PostgreSQL compared to

От
Josh Berkus
Дата:
Achilleus,

> >  > The pgsql db is indexed on the
> > > 'authuser' field, and I've run 'analyze logfile'.
> >
> > He also needs to index the bytes field and the traffic field.  And run
>                                                  ^^^^^^^
>
> Index on an aggregate function???
>
> I though we could only index plain normal column iscachable functions.

No, you're correct.   I meant just to index on the bytes field.

--
-Josh Berkus