Обсуждение: Aggregate functions on groups
Tables hold data on fish counts by stream name, species, and (unreported) collection dates. I'm trying to write a query that returns the total number of each species in each stream. The latest attempt is (lines wrapped by alpine; submitted as one line): \copy (select f.stream_tribs, f.count_value, sum(f.count_value), i.common_name, i.sci_name from fish_counts as f, itis as i where f.stream_tribs like '%Nehalem River%' group by f.stream_tribs, i.common_name, i.sci_name, f.count_value order by f.stream_tribs, i.common_name, i.sci_name, f.count_value) to '/home/rshepard/projects/oregon/mohler_sand/data/fish/fishes.dat'; The returned set starts this way: Anderson Creek trib to Nehalem River 0 0 Black crappie Pomoxis nigromaculatus Anderson Creek trib to Nehalem River 3 3 Black crappie Pomoxis nigromaculatus Anderson Creek trib to Nehalem River 0 0 Bluegill Lepomis macrochirus Anderson Creek trib to Nehalem River 3 3 Bluegill Lepomis macrochirus Anderson Creek trib to Nehalem River 0 0 Brook trout Salvelinus fontinalis Anderson Creek trib to Nehalem River 3 3 Brook trout Salvelinus fontinalis Anderson Creek trib to Nehalem River 0 0 Brown bullhead Ameiurus nebulosus Anderson Creek trib to Nehalem River 3 3 Brown bullhead Ameiurus nebulosus What I want returned would look like this: Anderson Creek trib to Nehalem River Black crappie Pomoxis nigromaculatus 3 Anderson Creek trib to Nehalem River Bluegill Lepomis macrochirus 3 Anderson Creek trib to Nehalem River Brook trout Salvelinus fontinalis 3 Anderson Creek trib to Nehalem River Brown bullhead Ameiurus nebulosus 3 I've read the manual yet must have not seen the section explaining how to apply aggregate functions to groups. Thanks in advance, Rich
On Fri, 30 Aug 2019, John W Higgins wrote: > You are grouping by count_value which means that you are asking the system > to return a row for each different count_value. John, I didn't realize this. > So if you remove the f.count_value from the select statement (not the > sum(f.count_value)) - and you remove f.count_value from the group_by and > order_by statements - you should get what you want Aha. I thought I had to select f.count_value in order to obtain sum(f.count_value); it's been a long time since I needed to do something like this. > Something like > > \copy (select f.stream_tribs, sum(f.count_value), > i.common_name, i.sci_name from fish_counts as f, itis as i where > f.stream_tribs like '%Nehalem River%' group by f.stream_tribs, > i.common_name, i.sci_name order by f.stream_tribs, > i.common_name, i.sci_name) to > '/home/rshepard/projects/oregon/mohler_sand/data/fish/fishes.dat'; Thanks very much! Rich
Your tributaries and fish master tables make sense. If I read your code right, you're grouping by too many columns. I flattened the data into a survey table for this simple example:
select tributary,
common_name,
scientific_name,
sum(count_value) as fish_seen,
count(count_value) as observations_made
from survey
group by 1,2,3 -- The GROUP BY clause can use positions on the select list, if you feel like typing less.
tributary common_name scientific_name fish_seen observations_made
Anderson Creek trib to Nehalem River Black crappie Pomoxis nigromaculatus 3 2
Anderson Creek trib to Nehalem River Brook trout Salvelinus fontinalis 3 2
Anderson Creek trib to Nehalem River Bluegill Lepomis macrochirus 3 2
Anderson Creek trib to Nehalem River Brown bullhead Ameiurus nebulosus 3 2
But this is not why I'm answering. I'm responding as I wanted to make sure that you're aware of the pg-similarity extension:
This tool implements a *lot* of similarity measures for fuzzy cmparisons. Some are sting-oriented algorithms (Jaro-Winkler, Soundex, Levenshtein, etc.), and others derive from and/or apply to field population comparisons, like the Jaccard and Dice Coefficients. There's a lot of great stuff in the package.
On Sat, Aug 31, 2019 at 3:14 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
Tables hold data on fish counts by stream name, species, and (unreported)
collection dates. I'm trying to write a query that returns the total number
of each species in each stream.
The latest attempt is (lines wrapped by alpine; submitted as one line):
\copy (select f.stream_tribs, f.count_value, sum(f.count_value),
i.common_name, i.sci_name from fish_counts as f, itis as i where
f.stream_tribs like '%Nehalem River%' group by f.stream_tribs,
i.common_name, i.sci_name, f.count_value order by f.stream_tribs,
i.common_name, i.sci_name, f.count_value) to
'/home/rshepard/projects/oregon/mohler_sand/data/fish/fishes.dat';
The returned set starts this way:
Anderson Creek trib to Nehalem River 0 0 Black crappie Pomoxis nigromaculatus
Anderson Creek trib to Nehalem River 3 3 Black crappie Pomoxis nigromaculatus
Anderson Creek trib to Nehalem River 0 0 Bluegill Lepomis macrochirus
Anderson Creek trib to Nehalem River 3 3 Bluegill Lepomis macrochirus
Anderson Creek trib to Nehalem River 0 0 Brook trout Salvelinus fontinalis
Anderson Creek trib to Nehalem River 3 3 Brook trout Salvelinus fontinalis
Anderson Creek trib to Nehalem River 0 0 Brown bullhead Ameiurus nebulosus
Anderson Creek trib to Nehalem River 3 3 Brown bullhead Ameiurus nebulosus
What I want returned would look like this:
Anderson Creek trib to Nehalem River Black crappie Pomoxis nigromaculatus 3
Anderson Creek trib to Nehalem River Bluegill Lepomis macrochirus 3
Anderson Creek trib to Nehalem River Brook trout Salvelinus fontinalis 3
Anderson Creek trib to Nehalem River Brown bullhead Ameiurus nebulosus 3
I've read the manual yet must have not seen the section explaining how to
apply aggregate functions to groups.
Thanks in advance,
Rich
On Sat, 31 Aug 2019, Morris de Oryx wrote: > Your tributaries and fish master tables make sense. If I read your code > right, you're grouping by too many columns. I flattened the data into a > survey table for this simple example: Morris, I'm still learning about postgres groups. My approach is to group on the column of interest, then add more when psql tells me to do so. > select tributary, > common_name, > scientific_name, > sum(count_value) as fish_seen, > count(count_value) as observations_made > > from survey > > group by 1,2,3 -- The GROUP BY clause can use positions on the select > list, if you feel like typing less. I will look more at this approach; at first glance it appears to address one query but not all those needed. > But this is not why I'm answering. I'm responding as I wanted to make sure > that you're aware of the pg-similarity extension: > https://salsa.debian.org/postgresql/pg-similarity Thanks for the URL. I'll definintely read about similarity. Regards, Rich