Обсуждение: Correlating Asterisk CDRs

Поиск
Список
Период
Сортировка

Correlating Asterisk CDRs

От
"Raj Mathur (राज माथुर)"
Дата:
Hi,

I'm trying to correlate Call Data Records (CDRs) from two Asterisk
servers, one of which uses the other for telephony.  The data is in
the tables cdr and cdr2.  With some indexes, the query and explain
result are:

explain analyse select cdr.calldate, cdr2.calldate,
(cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst, cdr2.src,
cdr2.dst, cdr2.dstchannel, cdr2.lastapp, cdr2.duration,
cdr2.disposition from cdr, cdr2 where cdr2.calldate >= cdr.calldate
and cdr.clid=cdr2.clid and cdr.dst=substring(cdr2.dst from 4) order by
cdr.calldate, cdr2.calldate, cdr.clid limit 100;
                                                            QUERY PLAN
               
 

---------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=46782.15..46782.40 rows=100 width=109) (actual time=4077.866..4078.054 
 
rows=100 loops=1)  ->  Sort  (cost=46782.15..46785.33 rows=1272 width=109) (actual time=4077.863..4077.926 
rows=100 loops=1)        Sort Key: cdr.calldate, cdr2.calldate, cdr.clid        Sort Method:  top-N heapsort  Memory:
42kB       ->  Merge Join  (cost=2.95..46733.54 rows=1272 width=109) (actual 
 
time=0.070..3799.546 rows=168307 loops=1)              Merge Cond: (((cdr.clid)::text = (cdr2.clid)::text) AND
((cdr.dst)::text= 
 
"substring"((cdr2.dst)::text, 4)))              Join Filter: (cdr2.calldate >= cdr.calldate)              ->  Index
Scanusing ick1 on cdr  (cost=0.00..34667.86 rows=208798 
 
width=43) (actual time=0.022..434.246 rows=208798 loops=1)              ->  Index Scan using i2k1 on cdr2
(cost=0.00..9960.89rows=65449 width=88) 
 
(actual time=0.011..391.599 rows=240981 loops=1)Total runtime: 4078.184 ms
(10 rows)

Is there any way to make this query faster?  I already have an index
i2k1 on substring(cdr2.dst from 4), which is being used.

Application
-----------

I'm looking for all caller records in cdr2 that have the same callerid
(clid) and destination (dst) and were started on cdr2 after they were
started on cdr.  cdr2.dst is the same as cdr.dst but with a
3-character prefix.

Regards,

-- Raj
-- 
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F


Re: Correlating Asterisk CDRs

От
Julien Cigar
Дата:
Try to raise work_mem

On 12/07/2011 15:34, Raj Mathur (राज माथुर) wrote:
> Hi,
>
> I'm trying to correlate Call Data Records (CDRs) from two Asterisk
> servers, one of which uses the other for telephony.  The data is in
> the tables cdr and cdr2.  With some indexes, the query and explain
> result are:
>
> explain analyse select cdr.calldate, cdr2.calldate,
> (cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst, cdr2.src,
> cdr2.dst, cdr2.dstchannel, cdr2.lastapp, cdr2.duration,
> cdr2.disposition from cdr, cdr2 where cdr2.calldate>= cdr.calldate
> and cdr.clid=cdr2.clid and cdr.dst=substring(cdr2.dst from 4) order by
> cdr.calldate, cdr2.calldate, cdr.clid limit 100;
>
>                                                               QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual time=4077.866..4078.054
> rows=100 loops=1)
>     ->   Sort  (cost=46782.15..46785.33 rows=1272 width=109) (actual time=4077.863..4077.926
> rows=100 loops=1)
>           Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
>           Sort Method:  top-N heapsort  Memory: 42kB
>           ->   Merge Join  (cost=2.95..46733.54 rows=1272 width=109) (actual
> time=0.070..3799.546 rows=168307 loops=1)
>                 Merge Cond: (((cdr.clid)::text = (cdr2.clid)::text) AND ((cdr.dst)::text =
> "substring"((cdr2.dst)::text, 4)))
>                 Join Filter: (cdr2.calldate>= cdr.calldate)
>                 ->   Index Scan using ick1 on cdr  (cost=0.00..34667.86 rows=208798
> width=43) (actual time=0.022..434.246 rows=208798 loops=1)
>                 ->   Index Scan using i2k1 on cdr2  (cost=0.00..9960.89 rows=65449 width=88)
> (actual time=0.011..391.599 rows=240981 loops=1)
>   Total runtime: 4078.184 ms
> (10 rows)
>
> Is there any way to make this query faster?  I already have an index
> i2k1 on substring(cdr2.dst from 4), which is being used.
>
> Application
> -----------
>
> I'm looking for all caller records in cdr2 that have the same callerid
> (clid) and destination (dst) and were started on cdr2 after they were
> started on cdr.  cdr2.dst is the same as cdr.dst but with a
> 3-character prefix.
>
> Regards,
>
> -- Raj


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

Вложения

Re: Correlating Asterisk CDRs

От
"Raj Mathur (राज माथुर)"
Дата:
On Wednesday 07 Dec 2011, Julien Cigar wrote:
> Try to raise work_mem

Odd, I tried adding work_mem=50MB / 256MB / 1024MB into postgres.conf
and the times actually went up to over 12 seconds.  Leaving it commented
results in the 4-second time originally posted.

Regards,

-- Raj

> On 12/07/2011 15:34, Raj Mathur (राज माथुर) wrote:
> > I'm trying to correlate Call Data Records (CDRs) from two Asterisk
> > servers, one of which uses the other for telephony.  The data is in
> > the tables cdr and cdr2.  With some indexes, the query and explain
> > result are:
> >
> > explain analyse select cdr.calldate, cdr2.calldate,
> > (cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst, cdr2.src,
> > cdr2.dst, cdr2.dstchannel, cdr2.lastapp, cdr2.duration,
> > cdr2.disposition from cdr, cdr2 where cdr2.calldate>= cdr.calldate
> > and cdr.clid=cdr2.clid and cdr.dst=substring(cdr2.dst from 4) order
> > by cdr.calldate, cdr2.calldate, cdr.clid limit 100;
> >
> >                                                               QUERY
PLAN
> >
> > -------------------------------------------------------------------
> > -------------------------------------------------------------------
> > -------
> >
> >   Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual
> >   time=4077.866..4078.054
> >
> > rows=100 loops=1)
> >
> >     ->   Sort  (cost=46782.15..46785.33 rows=1272 width=109)
> >     (actual time=4077.863..4077.926
> >
> > rows=100 loops=1)
> >
> >           Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
> >           Sort Method:  top-N heapsort  Memory: 42kB
> >           ->   Merge Join  (cost=2.95..46733.54 rows=1272
> >           width=109) (actual
> >
> > time=0.070..3799.546 rows=168307 loops=1)
> >
> >                 Merge Cond: (((cdr.clid)::text = (cdr2.clid)::text)
> >                 AND ((cdr.dst)::text =
> >
> > "substring"((cdr2.dst)::text, 4)))
> >
> >                 Join Filter: (cdr2.calldate>= cdr.calldate)
> >                 ->   Index Scan using ick1 on cdr
> >                 (cost=0.00..34667.86 rows=208798
> >
> > width=43) (actual time=0.022..434.246 rows=208798 loops=1)
> >
> >                 ->   Index Scan using i2k1 on cdr2
> >                 (cost=0.00..9960.89 rows=65449 width=88)
> >
> > (actual time=0.011..391.599 rows=240981 loops=1)
> >
> >   Total runtime: 4078.184 ms
> >
> > (10 rows)
> >
> > Is there any way to make this query faster?  I already have an
> > index i2k1 on substring(cdr2.dst from 4), which is being used.
> >
> > Application
> > -----------
> >
> > I'm looking for all caller records in cdr2 that have the same
> > callerid (clid) and destination (dst) and were started on cdr2
> > after they were started on cdr.  cdr2.dst is the same as cdr.dst
> > but with a 3-character prefix.

--
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F


Re: Correlating Asterisk CDRs

От
Julien Cigar
Дата:
On 12/07/2011 16:14, Raj Mathur (राज माथुर) wrote:
> On Wednesday 07 Dec 2011, Julien Cigar wrote:
>> Try to raise work_mem
>
> Odd, I tried adding work_mem=50MB / 256MB / 1024MB into postgres.conf
> and the times actually went up to over 12 seconds.  Leaving it commented
> results in the 4-second time originally posted.

sorry I replied too fast.. There is no external disk merge so increasing
work_mem is useless. Problem is the "merge join". How big is the table?
Are the statistics up to date?

>
> Regards,
>
> -- Raj
>
>> On 12/07/2011 15:34, Raj Mathur (राज माथुर) wrote:
>>> I'm trying to correlate Call Data Records (CDRs) from two Asterisk
>>> servers, one of which uses the other for telephony.  The data is in
>>> the tables cdr and cdr2.  With some indexes, the query and explain
>>> result are:
>>>
>>> explain analyse select cdr.calldate, cdr2.calldate,
>>> (cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst, cdr2.src,
>>> cdr2.dst, cdr2.dstchannel, cdr2.lastapp, cdr2.duration,
>>> cdr2.disposition from cdr, cdr2 where cdr2.calldate>= cdr.calldate
>>> and cdr.clid=cdr2.clid and cdr.dst=substring(cdr2.dst from 4) order
>>> by cdr.calldate, cdr2.calldate, cdr.clid limit 100;
>>>
>>>                                                                QUERY
> PLAN
>>>
>>> -------------------------------------------------------------------
>>> -------------------------------------------------------------------
>>> -------
>>>
>>>    Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual
>>>    time=4077.866..4078.054
>>>
>>> rows=100 loops=1)
>>>
>>>      ->    Sort  (cost=46782.15..46785.33 rows=1272 width=109)
>>>      (actual time=4077.863..4077.926
>>>
>>> rows=100 loops=1)
>>>
>>>            Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
>>>            Sort Method:  top-N heapsort  Memory: 42kB
>>>            ->    Merge Join  (cost=2.95..46733.54 rows=1272
>>>            width=109) (actual
>>>
>>> time=0.070..3799.546 rows=168307 loops=1)
>>>
>>>                  Merge Cond: (((cdr.clid)::text = (cdr2.clid)::text)
>>>                  AND ((cdr.dst)::text =
>>>
>>> "substring"((cdr2.dst)::text, 4)))
>>>
>>>                  Join Filter: (cdr2.calldate>= cdr.calldate)
>>>                  ->    Index Scan using ick1 on cdr
>>>                  (cost=0.00..34667.86 rows=208798
>>>
>>> width=43) (actual time=0.022..434.246 rows=208798 loops=1)
>>>
>>>                  ->    Index Scan using i2k1 on cdr2
>>>                  (cost=0.00..9960.89 rows=65449 width=88)
>>>
>>> (actual time=0.011..391.599 rows=240981 loops=1)
>>>
>>>    Total runtime: 4078.184 ms
>>>
>>> (10 rows)
>>>
>>> Is there any way to make this query faster?  I already have an
>>> index i2k1 on substring(cdr2.dst from 4), which is being used.
>>>
>>> Application
>>> -----------
>>>
>>> I'm looking for all caller records in cdr2 that have the same
>>> callerid (clid) and destination (dst) and were started on cdr2
>>> after they were started on cdr.  cdr2.dst is the same as cdr.dst
>>> but with a 3-character prefix.
>


--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

Вложения

[SOLVED] Re: Correlating Asterisk CDRs

От
"Raj Mathur (राज माथुर)"
Дата:
On Wednesday 07 Dec 2011, Raj Mathur (राज माथुर) wrote:
> On Wednesday 07 Dec 2011, Julien Cigar wrote:
> > Try to raise work_mem
>
> Odd, I tried adding work_mem=50MB / 256MB / 1024MB into postgres.conf
> and the times actually went up to over 12 seconds.  Leaving it
> commented results in the 4-second time originally posted.

Seems that the time is constant at around 13 seconds once you go above
200 records or so, which is acceptable.  Thanks for your help.

Regards,

-- Raj

> > On 12/07/2011 15:34, Raj Mathur (राज माथुर) wrote:
> > > I'm trying to correlate Call Data Records (CDRs) from two
> > > Asterisk servers, one of which uses the other for telephony.
> > > The data is in the tables cdr and cdr2.  With some indexes, the
> > > query and explain result are:
> > >
> > > explain analyse select cdr.calldate, cdr2.calldate,
> > > (cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst,
> > > cdr2.src, cdr2.dst, cdr2.dstchannel, cdr2.lastapp,
> > > cdr2.duration,
> > > cdr2.disposition from cdr, cdr2 where cdr2.calldate>=
> > > cdr.calldate and cdr.clid=cdr2.clid and
> > > cdr.dst=substring(cdr2.dst from 4) order by cdr.calldate,
> > > cdr2.calldate, cdr.clid limit 100;
> > >
> > >
QUERY
>
> PLAN
>
> > > -----------------------------------------------------------------
> > > --
> > > ----------------------------------------------------------------
> > > --- -------
> > >
> > >   Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual
> > >   time=4077.866..4078.054
> > >
> > > rows=100 loops=1)
> > >
> > >     ->   Sort  (cost=46782.15..46785.33 rows=1272 width=109)
> > >     (actual time=4077.863..4077.926
> > >
> > > rows=100 loops=1)
> > >
> > >           Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
> > >           Sort Method:  top-N heapsort  Memory: 42kB
> > >           ->   Merge Join  (cost=2.95..46733.54 rows=1272
> > >           width=109) (actual
> > >
> > > time=0.070..3799.546 rows=168307 loops=1)
> > >
> > >                 Merge Cond: (((cdr.clid)::text =
> > >                 (cdr2.clid)::text) AND ((cdr.dst)::text =
> > >
> > > "substring"((cdr2.dst)::text, 4)))
> > >
> > >                 Join Filter: (cdr2.calldate>= cdr.calldate)
> > >                 ->   Index Scan using ick1 on cdr
> > >                 (cost=0.00..34667.86 rows=208798
> > >
> > > width=43) (actual time=0.022..434.246 rows=208798 loops=1)
> > >
> > >                 ->   Index Scan using i2k1 on cdr2
> > >                 (cost=0.00..9960.89 rows=65449 width=88)
> > >
> > > (actual time=0.011..391.599 rows=240981 loops=1)
> > >
> > >   Total runtime: 4078.184 ms
> > >
> > > (10 rows)
> > >
> > > Is there any way to make this query faster?  I already have an
> > > index i2k1 on substring(cdr2.dst from 4), which is being used.
> > >
> > > Application
> > > -----------
> > >
> > > I'm looking for all caller records in cdr2 that have the same
> > > callerid (clid) and destination (dst) and were started on cdr2
> > > after they were started on cdr.  cdr2.dst is the same as cdr.dst
> > > but with a 3-character prefix.


--
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F


Re: Correlating Asterisk CDRs

От
Brent Dombrowski
Дата:
On Dec 7, 2011, at 6:34 AM, Raj Mathur (राज माथुर) wrote:

> Hi,
>
> I'm trying to correlate Call Data Records (CDRs) from two Asterisk
> servers, one of which uses the other for telephony.  The data is in
> the tables cdr and cdr2.  With some indexes, the query and explain
> result are:
>
> explain analyse select cdr.calldate, cdr2.calldate,
> (cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst, cdr2.src,
> cdr2.dst, cdr2.dstchannel, cdr2.lastapp, cdr2.duration,
> cdr2.disposition from cdr, cdr2 where cdr2.calldate >= cdr.calldate
> and cdr.clid=cdr2.clid and cdr.dst=substring(cdr2.dst from 4) order by
> cdr.calldate, cdr2.calldate, cdr.clid limit 100;
>
>                                                             QUERY PLAN
                  
>
---------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual time=4077.866..4078.054
> rows=100 loops=1)
>   ->  Sort  (cost=46782.15..46785.33 rows=1272 width=109) (actual time=4077.863..4077.926
> rows=100 loops=1)
>         Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
>         Sort Method:  top-N heapsort  Memory: 42kB
>         ->  Merge Join  (cost=2.95..46733.54 rows=1272 width=109) (actual
> time=0.070..3799.546 rows=168307 loops=1)
>               Merge Cond: (((cdr.clid)::text = (cdr2.clid)::text) AND ((cdr.dst)::text =
> "substring"((cdr2.dst)::text, 4)))
>               Join Filter: (cdr2.calldate >= cdr.calldate)
>               ->  Index Scan using ick1 on cdr  (cost=0.00..34667.86 rows=208798
> width=43) (actual time=0.022..434.246 rows=208798 loops=1)
>               ->  Index Scan using i2k1 on cdr2  (cost=0.00..9960.89 rows=65449 width=88)
> (actual time=0.011..391.599 rows=240981 loops=1)
> Total runtime: 4078.184 ms
> (10 rows)
>
> Is there any way to make this query faster?  I already have an index
> i2k1 on substring(cdr2.dst from 4), which is being used.
>
> Application
> -----------
>
> I'm looking for all caller records in cdr2 that have the same callerid
> (clid) and destination (dst) and were started on cdr2 after they were
> started on cdr.  cdr2.dst is the same as cdr.dst but with a
> 3-character prefix.
>
> Regards,
>
> -- Raj
> --
> Raj Mathur                          || raju@kandalaya.org   || GPG:
> http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
> It is the mind that moves           || http://schizoid.in   || D17F
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql


You have a non-equi join in there (cdr2.calldate >= cdr.calldate). I would try to get rid of that. It's increasing the
numberof rows in the result set and will only get worse as your data set grows. 

Brent.



Re: Correlating Asterisk CDRs

От
"Raj Mathur (राज माथुर)"
Дата:
On Wednesday 07 Dec 2011, Brent Dombrowski wrote:
> On Dec 7, 2011, at 6:34 AM, Raj Mathur (राज माथुर) wrote:
> > I'm trying to correlate Call Data Records (CDRs) from two Asterisk
> > servers, one of which uses the other for telephony.  The data is in
> > the tables cdr and cdr2.  With some indexes, the query and explain
> > result are:
> >
> > explain analyse select cdr.calldate, cdr2.calldate,
> > (cdr2.calldate-cdr.calldate) as rtdur, cdr.clid, cdr.dst, cdr2.src,
> > cdr2.dst, cdr2.dstchannel, cdr2.lastapp, cdr2.duration,
> > cdr2.disposition from cdr, cdr2 where cdr2.calldate >= cdr.calldate
> > and cdr.clid=cdr2.clid and cdr.dst=substring(cdr2.dst from 4) order
> > by cdr.calldate, cdr2.calldate, cdr.clid limit 100;
> >
> >                                                             QUERY
PLAN
> >
> > -------------------------------------------------------------------
> > -------------------------------------------------------------------
> > ------- Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual
> > time=4077.866..4078.054 rows=100 loops=1)
> >
> >   ->  Sort  (cost=46782.15..46785.33 rows=1272 width=109) (actual
> >   time=4077.863..4077.926
> >
> > rows=100 loops=1)
> >
> >         Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
> >         Sort Method:  top-N heapsort  Memory: 42kB
> >         ->  Merge Join  (cost=2.95..46733.54 rows=1272 width=109)
> >         (actual
> >
> > time=0.070..3799.546 rows=168307 loops=1)
> >
> >               Merge Cond: (((cdr.clid)::text = (cdr2.clid)::text)
> >               AND ((cdr.dst)::text =
> >
> > "substring"((cdr2.dst)::text, 4)))
> >
> >               Join Filter: (cdr2.calldate >= cdr.calldate)
> >               ->  Index Scan using ick1 on cdr
> >               (cost=0.00..34667.86 rows=208798
> >
> > width=43) (actual time=0.022..434.246 rows=208798 loops=1)
> >
> >               ->  Index Scan using i2k1 on cdr2
> >               (cost=0.00..9960.89 rows=65449 width=88)
> >
> > (actual time=0.011..391.599 rows=240981 loops=1)
> > Total runtime: 4078.184 ms
> > (10 rows)
> >
> > Is there any way to make this query faster?  I already have an
> > index i2k1 on substring(cdr2.dst from 4), which is being used.
>
> You have a non-equi join in there (cdr2.calldate >= cdr.calldate). I
> would try to get rid of that. It's increasing the number of rows in
> the result set and will only get worse as your data set grows.

Seen.  At the moment (and in the foreseeable future) the query is going
to be run with a condition of the form "cdr.calldate >=
date_trunc('day', now())", so the set of matches from cdr2 will remain
relatively constant.

However, you're right, the scope of cdr2 calldates also ought to be time
limited.  Have added an "and cdr2.calldate < cdr.calldate + interval '1
day'" to the query.  Thanks.

Regards,

-- Raj
--
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F


Re: Correlating Asterisk CDRs

От
Scott Marlowe
Дата:
2011/12/7 Raj Mathur (राज माथुर) <raju@linux-delhi.org>:
>                                                             QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=46782.15..46782.40 rows=100 width=109) (actual time=4077.866..4078.054
> rows=100 loops=1)
>   ->  Sort  (cost=46782.15..46785.33 rows=1272 width=109) (actual time=4077.863..4077.926
> rows=100 loops=1)
>         Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
>         Sort Method:  top-N heapsort  Memory: 42kB
>         ->  Merge Join  (cost=2.95..46733.54 rows=1272 width=109) (actual
> time=0.070..3799.546 rows=168307 loops=1)

Two things to look at here.  First is that the estimation of rows
expected and returned vary by a factor over over 100, which means the
query planner may be making suboptimal choices in terms of the plan it
is running.  If increasing stats target on the target columns in the
query helps, then that's worth trying.  Raise it and re-analyze and
see if you get a closer estimate.  To test if the merge join is the
best choice or not, you can use the set enable_xxx for it (in this
case set enable_mergejoin=off) and then run the query again through
explain analyze and see if the performance gets any better.


Re: Correlating Asterisk CDRs

От
"Raj Mathur (राज माथुर)"
Дата:
On Thursday 08 Dec 2011, Scott Marlowe wrote:
> 2011/12/7 Raj Mathur (राज माथुर) <raju@linux-delhi.org>:
> >                                                             QUERY
> > PLAN
> > ------------------------------------------------------------------
> > -------------------------------------------------------------------
> > -------- Limit  (cost=46782.15..46782.40 rows=100 width=109)
> > (actual time=4077.866..4078.054 rows=100 loops=1)
> >   ->  Sort  (cost=46782.15..46785.33 rows=1272 width=109) (actual
> > time=4077.863..4077.926 rows=100 loops=1)
> >         Sort Key: cdr.calldate, cdr2.calldate, cdr.clid
> >         Sort Method:  top-N heapsort  Memory: 42kB
> >         ->  Merge Join  (cost=2.95..46733.54 rows=1272 width=109)
> > (actual time=0.070..3799.546 rows=168307 loops=1)
>
> Two things to look at here.  First is that the estimation of rows
> expected and returned vary by a factor over over 100, which means the
> query planner may be making suboptimal choices in terms of the plan
> it is running.  If increasing stats target on the target columns in
> the query helps, then that's worth trying.  Raise it and re-analyze
> and see if you get a closer estimate.  To test if the merge join is
> the best choice or not, you can use the set enable_xxx for it (in
> this case set enable_mergejoin=off) and then run the query again
> through explain analyze and see if the performance gets any better.

Fixed the first -- all it needed was a vacuum analyse, and the performance
improved by 50%.  Enabling/disabling mergejoin doesn't seem to make any
difference to the timing.  However, after the vacuum analyse the planner
is now using:
Limit  (cost=37499.24..37502.08 rows=1138 width=109) (actual time=6355.308..6709.661
rows=168307 loops=1)  ->  Sort  (cost=37499.24..37502.08 rows=1138 width=109) (actual time=6355.304..6491.595
rows=168307 loops=1)        Sort Key: cdr.calldate, cdr2.calldate, cdr.clid        Sort Method:  quicksort  Memory:
45211kB       ->  Merge Join  (cost=34720.94..37441.47 rows=1138 width=109) (actual  
time=3438.318..5853.947 rows=168307 loops=1)              Merge Cond: (((cdr.dst)::text =
("substring"((cdr2.dst)::text,4))) AND  
((cdr.clid)::text = (cdr2.clid)::text))              Join Filter: (cdr2.calldate >= cdr.calldate)              ->  Sort
(cost=26987.11..27509.10 rows=208798 width=43) (actual  
time=2631.166..2833.926 rows=208748 loops=1)                    Sort Key: cdr.dst, cdr.clid                    Sort
Method: quicksort  Memory: 19696kB                    ->  Seq Scan on cdr  (cost=0.00..8537.98 rows=208798 width=43)  
(actual time=0.009..211.330 rows=208798 loops=1)              ->  Sort  (cost=7684.78..7848.41 rows=65449 width=89)
(actual 
time=807.031..991.649 rows=240981 loops=1)                    Sort Key: ("substring"((cdr2.dst)::text, 4)), cdr2.clid
                Sort Method:  quicksort  Memory: 9889kB                    ->  Seq Scan on cdr2  (cost=0.00..2449.49
rows=65449width=89)  
(actual time=0.021..125.630 rows=65449 loops=1)Total runtime: 6823.029 ms

Can you see any place here where adding indexes may help?

Regards,

-- Raj
--
Raj Mathur                          || raju@kandalaya.org   || GPG:
http://otheronepercent.blogspot.com || http://kandalaya.org || CC68
It is the mind that moves           || http://schizoid.in   || D17F