Обсуждение: Query optimizing - paradox behave

Поиск
Список
Период
Сортировка

Query optimizing - paradox behave

От
"David M. Richter"
Дата:
(Here again; my email adress was killed)

Hallo !

I want to tune a database. There a many redundant datas in the database
, because of all the relations were consider as n:m relations. But the
most of them are 1:n Relations. So my approach was to cut the
redundancies to get more performance. But .. happens!

The query with the 3 tables is faster than the query with 2 tables.

That is paradox to the Explain output.
And: the real database functions like dbPG95GetIndex and all functions
defined by me are slower.
The whole program is slower than before.
I disabled all the index.(since with index the behavior is the same) The
database pacs ist only restructured. They have the same data. With
database pacs and compare a vacuum was made.
I looked at the user time , since system time is faked because my
testprogram hands over the control to the postmaster and the postmaster
is doing his own work. So I made a lot of tests to get a average
usertime. So escapes will be catched and eliminated.

Here are the tabledescriptions for the original database "compare":

tables i.e.
There is a n:m relationship between patient and study realized with
relpatient_study000 relationtable.

                    Table "patient"
      Attribute       |          Type          | Modifier
----------------------+------------------------+----------
 chilioid             | character varying(80)  |
 name                 | text                   |
 id                   | character varying(256) |
 birthdate            | date                   |
 birthtime            | time                   |
 sex                  | character(1)           |
 medicalrecordlocator | character varying(128) |

                       Table "study"
       Attribute        |          Type          | Modifier
------------------------+------------------------+----------
 chilioid               | character varying(80)  |
 instanceuid            | character varying(64)  |
 id                     | character varying(64)  |
 studydate              | date                   |
 studytime              | time                   |
 modality               | character varying(2)   |
 manufacturer           | character varying(128) |
 referingphysician      | text                   |
 description            | character varying(128) |
 manufacturersmodelname | character varying(128) |
 importtime             | double precision       |
 chilisenderid          | character varying(80)  |
 accessionnumber        | character varying(64)  |
 institutionname        | character varying(128) |
 workflowstate          | character varying(8)   |
 flags                  | character varying(8)   |
 performingphysician    | character varying(128) |
 reportingphysician     | character varying(128) |


         Table "relpatient_study000"
 Attribute |         Type          | Modifier
-----------+-----------------------+----------
 chilioid  | character varying(80) |
 parentoid | character varying(80) |
 childoid  | character varying(80) |

 parentoid is here the oid of the patient and childoid is here the oid
of the study.

Thats the query with the original database "compare":

time psql -d compare -c "SELECT patient.*,study.* FROM
patient,study,relpatient_study000 r0 WHERE
(patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
patient.name using <" > 3tableni
1.671u 0.130s 0:11.14 16.4%     0+0k 0+0io 208pf+0w

psql -d compare -c "EXPLAIN SELECT patient.*,study.* FROM
patient,study,relpatient_study000 r0 WHERE
(patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
patient.name using <" > 3tableni
NOTICE:  QUERY PLAN:

Sort  (cost=1135170635.79..1135170635.79 rows=748802386 width=296)
  ->  Merge Join  (cost=1025510.64..1057837.48 rows=748802386 width=296)
        ->  Sort  (cost=1017989.22..1017989.22 rows=2556861 width=96)
              ->  Merge Join  (cost=4287.84..4763.21 rows=2556861
width=96)
                    ->  Sort  (cost=990.43..990.43 rows=8725 width=72)
                          ->  Seq Scan on patient  (cost=0.00..212.25
rows=8725 width=72)
                    ->  Sort  (cost=3297.40..3297.40 rows=29305
width=24)
                          ->  Seq Scan on relpatient_study000 r0
(cost=0.00..774.05 rows=29305 width=24)
        ->  Sort  (cost=7521.42..7521.42 rows=29286 width=200)
              ->  Seq Scan on study  (cost=0.00..1116.86 rows=29286
width=200)

-----------------------------



Thats the query with the new restructured database "pacs":

time psql -d pacs -c "SELECT patient.*,study.* FROM patient,study WHERE
(patient.chiliOID=study.patientOID ) order by patient.name using <" >
2tableni
1.730u 0.120s 0:09.44 19.5%     0+0k 0+0io 208pf+0w


psql -d pacs -c "EXPLAIN SELECT patient.*,study.* FROM patient,study
WHERE (patient.chiliOID=study.patientOID ) order by patient.name using
<" > 2tableni       NOTICE:  QUERY PLAN:

Sort  (cost=2194791.19..2194791.19 rows=2555204 width=284)
  ->  Merge Join  (cost=8978.44..9453.57 rows=2555204 width=284)
        ->  Sort  (cost=990.43..990.43 rows=8725 width=72)
              ->  Seq Scan on patient  (cost=0.00..212.25 rows=8725
width=72)
        ->  Sort  (cost=7988.00..7988.00 rows=29286 width=212)
              ->  Seq Scan on study  (cost=0.00..1236.86 rows=29286
width=212)


Restructured tables i.e.
PAtient-study relationship is 1:n realized with column patientoid in
table study.

                    Table "patient"
      Attribute       |          Type          | Modifier
----------------------+------------------------+----------
 chilioid             | character varying(80)  |
 name                 | text                   |
 id                   | character varying(256) |
 birthdate            | date                   |
 birthtime            | time                   |
 sex                  | character(1)           |
 medicalrecordlocator | character varying(128) |



Table "study"
       Attribute        |          Type          | Modifier
------------------------+------------------------+----------
 chilioid               | character varying(80)  |
 instanceuid            | character varying(64)  |
 id                     | character varying(64)  |
 studydate              | date                   |
 studytime              | time                   |
 modality               | character varying(2)   |
 manufacturer           | character varying(128) |
 referingphysician      | text                   |
 description            | character varying(128) |
 manufacturersmodelname | character varying(128) |
 importtime             | double precision       |
 chilisenderid          | character varying(80)  |
 accessionnumber        | character varying(64)  |
 institutionname        | character varying(128) |
 workflowstate          | character varying(8)   |
 flags                  | character varying(8)   |
 performingphysician    | character varying(128) |
 reportingphysician     | character varying(128) |
 patientoid             | character varying(80)  |


The times of the processes are escape-eliminated by statistical methods.

I determined that the "compare" database is 8% times faster than the new
restructured "pacs" database.
How can I understand this? Whats my mistake?

Anybody  who can make some sugestions on the above will
receive my enthusiastic gratitude

David M. Richter
Вложения

Re: Query optimizing - paradox behave

От
Tom Lane
Дата:
"David M. Richter" <David.M.Richter@freenet.de> writes:
> The query with the 3 tables is faster than the query with 2 tables. 

How you figure that?

> time psql -d compare -c "SELECT patient.*,study.* FROM
> patient,study,relpatient_study000 r0 WHERE
> (patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
> patient.name using <" > 3tableni
> 1.671u 0.130s 0:11.14 16.4%     0+0k 0+0io 208pf+0w

> time psql -d pacs -c "SELECT patient.*,study.* FROM patient,study WHERE
> (patient.chiliOID=study.patientOID ) order by patient.name using <" >
> 2tableni
> 1.730u 0.120s 0:09.44 19.5%     0+0k 0+0io 208pf+0w

9.44 vs 11.14 seconds looks like a clear advantage for the second query
to me...
        regards, tom lane


Re: Query optimizing - paradox behave

От
"David M. Richter"
Дата:
Hallo!

Thanks a lot to You Tom. I stared only at the user and the system time.
I didn't found any C-function, with wich I could measure the whole time,
so I used getrusage(). I did not recognize, that the whole time is
reduced, because the User time is increased.
Could that anybody explain to me? Why is the usertime increased and the
whole time is decreased?
#
Anyway ..
Thanks all a lot for Your effort.
I will now tune my radiology-database further...

Thankful Greetings

David

"David M. Richter" <David.M.Richter@freenet.de> writes:
> The query with the 3 tables is faster than the query with 2 tables.

How you figure that?

> time psql -d compare -c "SELECT patient.*,study.* FROM
> patient,study,relpatient_study000 r0 WHERE
> (patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
> patient.name using <" > 3tableni
> 1.671u 0.130s 0:11.14 16.4%     0+0k 0+0io 208pf+0w

> time psql -d pacs -c "SELECT patient.*,study.* FROM patient,study WHERE
> (patient.chiliOID=study.patientOID ) order by patient.name using <" >
> 2tableni
> 1.730u 0.120s 0:09.44 19.5%     0+0k 0+0io 208pf+0w

9.44 vs 11.14 seconds looks like a clear advantage for the second query
to me...
Вложения