Обсуждение: Selecting top N percent of records.

Поиск
Список
Период
Сортировка

Selecting top N percent of records.

От
Tim Uckun
Дата:
Is there a way to select the top 10% of the values from a column?

For example the top 10% best selling items where number of sales is a column.

Thanks.

Re: Selecting top N percent of records.

От
Darren Duncan
Дата:
Tim Uckun wrote:
> Is there a way to select the top 10% of the values from a column?
>
> For example the top 10% best selling items where number of sales is a column.

The top 10% would be a variable number of records.  Is that what you want, or
would you rather, say, just see the top N items?

The latter is easy, just something like this:

   SELECT * FROM mytbl ORDER BY num_sales DESC LIMIT $1

... where $1 is the number of records you want.

To actually get 10% of the records, you can replace the $1 with a subquery,
something like this probably:

   SELECT * FROM mytbl ORDER BY num_sales DESC
     LIMIT (SELECT (count(*) / 10) AS selnum FROM mytbl)

There are probably other ways to do it also.

Maybe you want all items whose sales are in the top 90 percentile or something,
or maybe you want what generated the most profit, etc.

-- Darren Duncan

Re: Selecting top N percent of records.

От
Peter Geoghegan
Дата:
On 18 October 2010 00:33, Tim Uckun <timuckun@gmail.com> wrote:
> Is there a way to select the top 10% of the values from a column?
>
> For example the top 10% best selling items where number of sales is a column.
>

That is a bit problematic because it necessitates knowing the number
of rows total, and slow counting is an idiosyncrasy of postgres.

http://wiki.postgresql.org/wiki/Slow_Counting

To get the top 10%:

SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table)

--
Regards,
Peter Geoghegan

Re: Selecting top N percent of records.

От
Tim Uckun
Дата:
On Mon, Oct 18, 2010 at 12:53 PM, Darren Duncan <darren@darrenduncan.net> wrote:
> Tim Uckun wrote:
>>
>> Is there a way to select the top 10% of the values from a column?
>>
>> For example the top 10% best selling items where number of sales is a
>> column.
>
> The top 10% would be a variable number of records.  Is that what you want,
> or would you rather, say, just see the top N items?

Yes this is what I want. I don't care how many records I pull up (well
I might want to put a LIMIT on them). I want the top 10%

Re: Selecting top N percent of records.

От
Tim Uckun
Дата:
> That is a bit problematic because it necessitates knowing the number
> of rows total, and slow counting is an idiosyncrasy of postgres.
>
> http://wiki.postgresql.org/wiki/Slow_Counting
>
> To get the top 10%:
>
> SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table)


I think I wasn't making myself clear. I don't want the top 10% of the
rows. I want the rows with the top 10% of the values in a column.

In my case there is a very non linear set of values. The lowest value
is 1 and the vast majority of records have a 1 in the column. The
highest value might be in the tens of thousands.  I want to pull up
the records that have the top ten percent values.

Re: Selecting top N percent of records.

От
Craig Ringer
Дата:
On 10/18/2010 08:06 AM, Tim Uckun wrote:
>> That is a bit problematic because it necessitates knowing the number
>> of rows total, and slow counting is an idiosyncrasy of postgres.
>>
>> http://wiki.postgresql.org/wiki/Slow_Counting
>>
>> To get the top 10%:
>>
>> SELECT * FROM table LIMIT(SELECT (COUNT(*) * 0.1)::integer FROM table)
>
>
> I think I wasn't making myself clear. I don't want the top 10% of the
> rows. I want the rows with the top 10% of the values in a column.

OK, so you want a median-style "sort them in descending order and count
down until you've selected the first 10% of rows" approach? In other
words, values in the 90th percentile of the distribution?

Try this. Given table "x" with single integer column "y", obtain rows of
x in the 90th percentile of y:

select ranked.y FROM (select percent_rank() over (order by y desc) as
pc, y from x) AS ranked WHERE pc <= 0.1;

or:

select ranked.y from (select ntile(10) over (order by y desc) as pc, y
from x) AS ranked WHERE pc = 1;

See:

http://www.postgresql.org/docs/current/static/functions-window.html


Both of these seem to produce odd results with small input row counts.
Test carefully before trusting these expressions, as I'm quite new to
the use of window functions.

--
Craig Ringer

Re: Selecting top N percent of records.

От
Tim Uckun
Дата:
>
> OK, so you want a median-style "sort them in descending order and count down
> until you've selected the first 10% of rows" approach? In other words,
> values in the 90th percentile of the distribution?
>
> Try this. Given table "x" with single integer column "y", obtain rows of x
> in the 90th percentile of y:
>
> select ranked.y FROM (select percent_rank() over (order by y desc) as pc, y
> from x) AS ranked WHERE pc <= 0.1;
>
> or:
>
> select ranked.y from (select ntile(10) over (order by y desc) as pc, y from
> x) AS ranked WHERE pc = 1;
>


Thanks I will look into the window functions. I haven't used them
before so thanks for the tip.

Re: Selecting top N percent of records.

От
"Brent Wood"
Дата:
Something like this should work - (but is untested), and does require the extra subquery, so there may be a more efficient way?

However, off the top of my head:

select a,b,c
from table
where <where clause>
order by c desc
limit (select count(*)/10 from table where <same where clause>);


where c is the no of sales column




Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand

>>> Tim Uckun <timuckun@gmail.com> 10/18/10 3:40 PM >>>
Is there a way to select the top 10% of the values from a column?

For example the top 10% best selling items where number of sales is a column.

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.