Обсуждение: Adding TEXT columns tanks performance?
Hi all, Saturday I changed a table to add a varchar(24) and a TEXT column. It's used for some reporting purposes (small potatoe stuff really) and the TEXT column remains mostly empty. However, this week performance has gotten terrible. Queries joining against the aforementioned table have gone from 40s to 1500s. The schema change is the only explanation I have for the 30x slower queries. The queries have definitely gotten disk-bound (I can see the connection process sitting in the D state for several minutes). This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact things that much? The change to the schema was alter table foo add column title text; tia, arturo
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/08/07 20:50, Arturo Perez wrote: > Hi all, > > Saturday I changed a table to add a varchar(24) and a TEXT column. It's > used for some reporting purposes (small potatoe stuff really) and the > TEXT column remains mostly empty. However, this week performance has > gotten terrible. Queries joining against the aforementioned table have > gone from 40s to 1500s. The schema change is the only explanation I > have for the 30x slower queries. The queries have definitely gotten > disk-bound (I can see the connection process sitting in the D state for > several minutes). > > This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact > things that much? The change to the schema was > > alter table foo add column title text; Could the issue be record fragmentation? Does PG even suffer from record fragmentation? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFy/YRS9HxQb37XmcRAlM3AKCmcuEQR0OBrNrs6vtdfrxtMgmswgCggVKS d8W9YEzDUmCfT53jorbCq9w= =xpiF -----END PGP SIGNATURE-----
On 2/8/07, Arturo Perez <aperez@hayesinc.com> wrote: > Hi all, > > Saturday I changed a table to add a varchar(24) and a TEXT column. > It's used for some reporting purposes (small potatoe stuff really) > and the TEXT column remains mostly empty. However, this week > performance has gotten terrible. Queries joining against the > aforementioned table have gone from 40s to 1500s. The schema change > is the only explanation I have for the 30x slower queries. The > queries have definitely gotten disk-bound (I can see the connection > process sitting in the D state for several minutes). > > This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact > things that much? The change to the schema was > > alter table foo add column title text; explain analyze please. no reason for adding text column to do that. especially since you didn't default the column which would effectively update the entire table. merlin
On Feb 9, 2007, at 11:43 AM, Merlin Moncure wrote: > On 2/8/07, Arturo Perez <aperez@hayesinc.com> wrote: >> Hi all, >> >> Saturday I changed a table to add a varchar(24) and a TEXT column. >> It's used for some reporting purposes (small potatoe stuff really) >> and the TEXT column remains mostly empty. However, this week >> performance has gotten terrible. Queries joining against the >> aforementioned table have gone from 40s to 1500s. The schema change >> is the only explanation I have for the 30x slower queries. The >> queries have definitely gotten disk-bound (I can see the connection >> process sitting in the D state for several minutes). >> >> This is pg8.1.4 on Linux RedHat. Would adding a TEXT column impact >> things that much? The change to the schema was >> >> alter table foo add column title text; > > explain analyze please. > > no reason for adding text column to do that. especially since you > didn't default the column which would effectively update the entire > table. > > merlin Here's the explain analyze. Note that this query was never very fast but's it has literally gotten two orders of magnitude slower. This is with 8.1.4 on linux with nothing special in terms of disks (I think it's a mirrored system drive). It's a hosted environment kind of thing. Shared buffers is 160MB (20000) and effective cache is 1GB. The user_tracking table has about 2M rows and the extended_user table has about 6K. I'm not very good at reading these but it looks like sort memory might be too low? work_mem is 1024, the default. I did this "set session work_mem to 100000;" in psql but it still takes quite a while. iht=> explain analyze SELECT session_id, action, count(ACTION) as hits iht-> FROM extended_user LEFT JOIN user_tracking USING (user_id) iht-> WHERE subscription_id = 1147 iht-> GROUP BY session_id, action iht-> HAVING count(ACTION) > 0; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ---------------------------------- GroupAggregate (cost=172717.49..173695.46 rows=32599 width=60) (actual time=411713.041..411761.857 rows=7309 loops=1) Filter: (count("action") > 0) -> Sort (cost=172717.49..172798.99 rows=32599 width=60) (actual time=411712.907..411732.032 rows=16012 loops=1) Sort Key: user_tracking.session_id, user_tracking."action" -> Merge Left Join (cost=0.00..169571.78 rows=32599 width=60) (actual time=147593.828..411070.706 rows=16012 loops=1) Merge Cond: ("outer".user_id = "inner".user_id) -> Index Scan using extended_user_pkey on extended_user (cost=0.00..236.92 rows=117 width=4) (actual time=1.627..154.499 rows=119 loops=1) Filter: (subscription_id = 1147) -> Index Scan using user_tracking_user_id_idx on user_tracking (cost=0.00..164008.04 rows=2000218 width=64) (actual time=0.010..408731.064 rows=2000620 loops=1) Total runtime: 411781.174 ms (10 rows)
Arturo Perez <aperez@hayesinc.com> writes: > Saturday I changed a table to add a varchar(24) and a TEXT column. You didn't actually say which of these tables you changed? > I'm not very good at reading these but it looks like sort memory might > be too low? The runtime seems to be entirely in the index scan on user_tracking. I'm surprised it doesn't do something to avoid a full-table indexscan --- in this case, hashing with extended_user as the inner relation would seem like the obvious thing. Is user_id a hashable datatype? It's possible that adding the columns would have affected the plan by making it look like a sort or hash would take too much memory, but if that were it then your hand increase in work_mem should have fixed it. Tis odd. I don't suppose you know what plan was used before? regards, tom lane
On Feb 10, 2007, at 12:34 PM, Tom Lane wrote: > Arturo Perez <aperez@hayesinc.com> writes: >> Saturday I changed a table to add a varchar(24) and a TEXT column. > > You didn't actually say which of these tables you changed? Sorry, I changed extended_user. > >> I'm not very good at reading these but it looks like sort memory >> might >> be too low? > > The runtime seems to be entirely in the index scan on user_tracking. > I'm surprised it doesn't do something to avoid a full-table indexscan > --- in this case, hashing with extended_user as the inner relation > would > seem like the obvious thing. Is user_id a hashable datatype? user_id is an integer; Here are the table definitions, since this seems like a problem that won't go away anytime soon. Table "public.extended_user" Column | Type | Modifiers -------------------+-----------------------------+----------- create_date | timestamp without time zone | not null email | character varying(99) | first_name | character varying(99) | not null last_name | character varying(99) | not null license_agreement | boolean | not null license_date | timestamp without time zone | password | character varying(32) | not null subscription_id | integer | not null user_id | integer | not null user_name | character varying(99) | not null active | boolean | not null phone | character varying(24) | title | text | Indexes: "extended_user_pkey" PRIMARY KEY, btree (user_id) CLUSTER "user_name_uq" UNIQUE, btree (user_name) "extended_user_subscription_id_idx" btree (subscription_id) Foreign-key constraints: "extended_user_subscription_id_fkey" FOREIGN KEY (subscription_id) REFERENCES subscription(subscription_id) DEFERRABLE INITIALLY DEFERRED \d user_tracking Table "public.user_tracking" Column | Type | Modifiers ------------------+----------------------------- +----------------------------------------------------------------------- --- action | character varying(255) | not null entry_date | timestamp without time zone | not null note | text | report_id | integer | session_id | character varying(255) | not null user_id | integer | user_tracking_id | integer | not null default nextval('user_tracking_user_tracking_id_seq'::regclass) Indexes: "user_tracking_pkey" PRIMARY KEY, btree (user_tracking_id) "user_tracking_monthly_idx" btree (date_part('year'::text, entry_date), date_part('month'::text, entry_date)) "user_tracking_quarterly_idx" btree (date_part('year'::text, entry_date), date_part('quarter'::text, entry_date)) "user_tracking_report_id_idx" btree (report_id) "user_tracking_user_id_idx" btree (user_id) Foreign-key constraints: "user_tracking_report_id_fkey" FOREIGN KEY (report_id) REFERENCES article(article_id) DEFERRABLE INITIALLY DEFERRED "user_tracking_user_id_fkey" FOREIGN KEY (user_id) REFERENCES extended_user(user_id) DEFERRABLE INITIALLY DEFERRED > > It's possible that adding the columns would have affected the plan by > making it look like a sort or hash would take too much memory, but if > that were it then your hand increase in work_mem should have fixed it. > Tis odd. I don't suppose you know what plan was used before? > > regards, tom lane No, sorry. Further information: on disk the user_tracking table is over 500MB, I can't increase shared_buffers (currently 20000) because of SHMMAX limits (for now, scheduled outage and all that). Any suggestions on how to improve the situation? tias, -arturo
On Feb 10, 2007, at 12:34 PM, Tom Lane wrote: > Arturo Perez <aperez@hayesinc.com> writes: >> Saturday I changed a table to add a varchar(24) and a TEXT column. > > You didn't actually say which of these tables you changed? > >> I'm not very good at reading these but it looks like sort memory >> might >> be too low? > > The runtime seems to be entirely in the index scan on user_tracking. > I'm surprised it doesn't do something to avoid a full-table indexscan > --- in this case, hashing with extended_user as the inner relation > would > seem like the obvious thing. Is user_id a hashable datatype? > > It's possible that adding the columns would have affected the plan by > making it look like a sort or hash would take too much memory, but if > that were it then your hand increase in work_mem should have fixed it. > Tis odd. I don't suppose you know what plan was used before? > > regards, tom lane I did this and now the thing is nicely faster: iht=> alter table user_tracking alter column user_id set statistics 500; ALTER TABLE iht=> analyze user_tracking; ANALYZE iht=> explain analyze SELECT session_id, action, count(ACTION) as hits iht-> FROM extended_user LEFT JOIN user_tracking USING (user_id) iht-> WHERE subscription_id = 1147 iht-> GROUP BY session_id, action iht-> HAVING count(ACTION) > 0; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------- GroupAggregate (cost=125961.69..127082.82 rows=37371 width=60) (actual time=679.115..725.317 rows=7312 loops=1) Filter: (count("action") > 0) -> Sort (cost=125961.69..126055.12 rows=37371 width=60) (actual time=679.067..697.588 rows=16017 loops=1) Sort Key: user_tracking.session_id, user_tracking."action" -> Nested Loop Left Join (cost=5.64..122319.43 rows=37371 width=60) (actual time=0.160..118.177 rows=16017 loops=1) -> Index Scan using extended_user_subscription_id_idx on extended_user (cost=0.00..161.08 rows=134 width=4) (actual time=0.066..1.289 rows=119 loops=1) Index Cond: (subscription_id = 1147) -> Bitmap Heap Scan on user_tracking (cost=5.64..905.77 rows=469 width=64) (actual time=0.162..0.730 rows=135 loops=119) Recheck Cond: ("outer".user_id = user_tracking.user_id) -> Bitmap Index Scan on user_tracking_user_id_idx (cost=0.00..5.64 rows=469 width=0) (actual time=0.139..0.139 rows=135 loops=119) Index Cond: ("outer".user_id = user_tracking.user_id) Total runtime: 732.520 ms (12 rows) thanks all, arturo