Обсуждение: window function help

Поиск
Список
Период
Сортировка

window function help

От
"Schnabel, Robert D."
Дата:
Hi,

I'm trying to figure out how to count the number of rows within a fixed range of the current row value.  My table looks
likethis: 

SELECT chr_pos
FROM mutations_crosstab_9615_99
WHERE bta = 38
LIMIT 10

chr_pos
138
140
163
174
187
187
188
208
210
213

chr_pos is integer and represents the base pair position along a chromosome.

It looks to me like a window function would be appropriate but I cannot figure out the correct syntax.  What I want to
dois count the number of rows within +/- 20 of chr_pos (the current row).  Given the above example, for chr_pos = 138 I
wantthe count of rows between 118 and 158.  For chr_pos 187 I want the count of rows between 167 and 207 etc for all
rows. The result I'm looking for should look like the following: 

chr_pos,num_variants
138,2
140,2
163,2
174,4
187,4
187,4
188,4
208,6
210,3
213,1

Is there a way to do this with a window function? Any help would be appreciated.

Thanks
Bob




Re: window function help

От
Andy Colson
Дата:
On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
> Hi,
>
> I'm trying to figure out how to count the number of rows within a fixed range of the current row value.  My table
lookslike this: 
>
> SELECT chr_pos
> FROM mutations_crosstab_9615_99
> WHERE bta = 38
> LIMIT 10
>
> chr_pos
> 138
> 140
> 163
> 174
> 187
> 187
> 188
> 208
> 210
> 213
>
> chr_pos is integer and represents the base pair position along a chromosome.
>
> It looks to me like a window function would be appropriate but I cannot figure out the correct syntax.  What I want
todo is count the number of rows within +/- 20 of chr_pos (the current row).  Given the above example, for chr_pos =
138I want the count of rows between 118 and 158.  For chr_pos 187 I want the count of rows between 167 and 207 etc for
allrows.  The result I'm looking for should look like the following: 
>
> chr_pos,num_variants
> 138,2
> 140,2
> 163,2
> 174,4
> 187,4
> 187,4
> 188,4
> 208,6
> 210,3
> 213,1
>
> Is there a way to do this with a window function? Any help would be appreciated.
>
> Thanks
> Bob
>
>
>
>

Don't think a window function is needed, how about this:

select chr_pos, (
   select count(*)
   from mutant b
   where b.chr_pos between a.chr_pos-20 and a.chr_pos+20
   )
from mutant a;

Here's what I get.  I dont remember if "between" is inclusive on both
sides or not, but you can change it to suit your needs.

This is the answer I got, which is different than yours, but I think its
right.


  chr_pos | count
---------+-------
      138 |     2
      140 |     2
      163 |     2
      174 |     4
      187 |     3
      188 |     4
      208 |     5
      210 |     4
      212 |     4
      213 |     4
(10 rows)


-Andy


Re: window function help

От
David Johnston
Дата:
Andy Colson wrote
> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
>> I'm trying to figure out how to count the number of rows within a fixed
>> range of the current row value.  My table looks like this:
>>
>> SELECT chr_pos
>> FROM mutations_crosstab_9615_99
>> WHERE bta = 38
>> LIMIT 10
>>
>> chr_pos
>> 138
>> 140
>> 163
>> 174
>> 187
>> 187
>> 188
>> 208
>> 210
>> 213
>>
>
> This is the answer I got, which is different than yours, but I think its
> right.
>
>
>   chr_pos | count
> ---------+-------
>       138 |     2
>       140 |     2
>       163 |     2
>       174 |     4
>       187 |     3
>       188 |     4
>       208 |     5
>       210 |     4
>       212 |     4
>       213 |     4
> (10 rows)

Same concept as mine - but I'm not sure where the "212" came from and you
did not duplicate the "187" that was present in the original.

The OP wanted to show the duplicate row - which yours does and mine does not
- but depending on how many duplicates there are having to run the same
effective query multiple times knowing you will always get the same result
seems inefficient.  Better to query over a distinct set of values and then,
if needed, join that back onto the original dataset.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/window-function-help-tp5798526p5798542.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: window function help

От
"Schnabel, Robert D."
Дата:
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of David Johnston
Sent: Thursday, April 03, 2014 11:09 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] window function help

Andy Colson wrote
> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
>> I'm trying to figure out how to count the number of rows within a
>> fixed range of the current row value.  My table looks like this:
>>
>> SELECT chr_pos
>> FROM mutations_crosstab_9615_99
>> WHERE bta = 38
>> LIMIT 10
>>
>> chr_pos
>> 138
>> 140
>> 163
>> 174
>> 187
>> 187
>> 188
>> 208
>> 210
>> 213
>>
>
> This is the answer I got, which is different than yours, but I think
> its right.
>
>
>   chr_pos | count
> ---------+-------
>       138 |     2
>       140 |     2
>       163 |     2
>       174 |     4
>       187 |     3
>       188 |     4
>       208 |     5
>       210 |     4
>       212 |     4
>       213 |     4
> (10 rows)

Same concept as mine - but I'm not sure where the "212" came from and you did not duplicate the "187" that was present
inthe original. 

The OP wanted to show the duplicate row - which yours does and mine does not
- but depending on how many duplicates there are having to run the same effective query multiple times knowing you will
alwaysget the same result seems inefficient.  Better to query over a distinct set of values and then, if needed, join
thatback onto the original dataset. 

David J.

Thanks. I had considered this strategy initially but didn't actually try it because I figured it would be too slow and
Iknew from previous experience with window functions that they are much faster than queries of this nature.  My largest
chromosomehas about 6M position and this ran in 69 seconds which is acceptable since I'll only be doing this
infrequently. I should be able to handle it form here.  Thanks again for pointing me in the right direction. 

Bob




Re: window function help

От
Andy Colson
Дата:
On 4/3/2014 11:09 AM, David Johnston wrote:
> Andy Colson wrote
>> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
>>> I'm trying to figure out how to count the number of rows within a fixed
>>> range of the current row value.  My table looks like this:
>>>
>>> SELECT chr_pos
>>> FROM mutations_crosstab_9615_99
>>> WHERE bta = 38
>>> LIMIT 10
>>>
>>> chr_pos
>>> 138
>>> 140
>>> 163
>>> 174
>>> 187
>>> 187
>>> 188
>>> 208
>>> 210
>>> 213
>>>
>>
>> This is the answer I got, which is different than yours, but I think its
>> right.
>>
>>
>>    chr_pos | count
>> ---------+-------
>>        138 |     2
>>        140 |     2
>>        163 |     2
>>        174 |     4
>>        187 |     3
>>        188 |     4
>>        208 |     5
>>        210 |     4
>>        212 |     4
>>        213 |     4
>> (10 rows)
>
> Same concept as mine - but I'm not sure where the "212" came from and you
> did not duplicate the "187" that was present in the original.
>
> The OP wanted to show the duplicate row - which yours does and mine does not
> - but depending on how many duplicates there are having to run the same
> effective query multiple times knowing you will always get the same result
> seems inefficient.  Better to query over a distinct set of values and then,
> if needed, join that back onto the original dataset.
>
> David J.
>



 > Same concept as mine - but I'm not sure where the "212" came from and you
 > did not duplicate the "187" that was present in the original.

Ah, data entry error.  I didn't even notice.  Oops.

 > The OP wanted to show the duplicate row - which yours does and mine
does not

Did you post a sql statement?  I didn't seem to get it.

 > - but depending on how many duplicates there are having to run the same

Agreed.  If there are a lot of dups, we could probably speed this up.

-Andy



Re: window function help

От
David Johnston
Дата:
Andy Colson wrote
> On 4/3/2014 11:09 AM, David Johnston wrote:
>> Andy Colson wrote
>>> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
>>>> I'm trying to figure out how to count the number of rows within a fixed
>>>> range of the current row value.  My table looks like this:
>>>>
>>>> SELECT chr_pos
>>>> FROM mutations_crosstab_9615_99
>>>> WHERE bta = 38
>>>> LIMIT 10
>>>>
>>>> chr_pos
>>>> 138
>>>> 140
>>>> 163
>>>> 174
>>>> 187
>>>> 187
>>>> 188
>>>> 208
>>>> 210
>>>> 213
>>>>
>>>
>>> This is the answer I got, which is different than yours, but I think its
>>> right.
>>>
>>>
>>>    chr_pos | count
>>> ---------+-------
>>>        138 |     2
>>>        140 |     2
>>>        163 |     2
>>>        174 |     4
>>>        187 |     3
>>>        188 |     4
>>>        208 |     5
>>>        210 |     4
>>>        212 |     4
>>>        213 |     4
>>> (10 rows)
>>
>> Same concept as mine - but I'm not sure where the "212" came from and you
>> did not duplicate the "187" that was present in the original.
>>
>> The OP wanted to show the duplicate row - which yours does and mine does
>> not
>> - but depending on how many duplicates there are having to run the same
>> effective query multiple times knowing you will always get the same
>> result
>> seems inefficient.  Better to query over a distinct set of values and
>> then,
>> if needed, join that back onto the original dataset.
>>
>> David J.
>>
>
>
>
>  > Same concept as mine - but I'm not sure where the "212" came from and
> you
>  > did not duplicate the "187" that was present in the original.
>
> Ah, data entry error.  I didn't even notice.  Oops.
>
>  > The OP wanted to show the duplicate row - which yours does and mine
> does not
>
> Did you post a sql statement?  I didn't seem to get it.
>
>  > - but depending on how many duplicates there are having to run the same
>
> Agreed.  If there are a lot of dups, we could probably speed this up.
>
> -Andy

My original seems to be held up for some reason...

Let me try again:

WITH val (value) AS (
VALUES (138),(140),(163),(174),(187),(187),(188),(208),(210),(213)
)
SELECT value, (SELECT count(*) FROM val AS valcheck WHERE valcheck.value
BETWEEN src.value - 20 AND src.value + 20)
FROM (
SELECT DISTINCT value FROM val
) src
ORDER BY 1;




--
View this message in context: http://postgresql.1045698.n5.nabble.com/window-function-help-tp5798526p5798565.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.