Обсуждение: Find duplicates in a column then print Information to a file
Tools: perl 5.26 and postgresql 9.6
Goal: Display duplicate aria titles on screen and to a local file
Database name: arias
Table name: aria_precis
csv delimiter: the # symbol
arias=# \d aria_precis
Table "public.aria_precis"
Column | Type | Modifiers
-------------+------+---------
id | text | not null
aria | text |
artist | text |
a_artist | text |
album_title | text |
Indexes:
"aria_precis_pkey" PRIMARY KEY, btree (id)
Problems:
1. Can't connect aria title to id
2. Can't write discovered information to file
I know about this link but I probably didn't understand what I read there.
https://www.postgresql.org/
What I know displays the 46 duplicate aria titles of 413 entries but not the id or artist:
SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY aria HAVING COUNT(aria)>1);
When I perform the following I get (0 rows):
SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY id, aria HAVING COUNT(aria)>1);
aria
------
(0 rows)
After I get the above information how do I write that information to a file? The only thing I know writes the entire database to a file:
\COPY aria_precis TO '/home/sherman/aria_precis_2.
Thank you;
Sherman
From: Sherman Willden <operasopranos@gmail.com>
Date: Sunday, 31 December 2017 at 18:19
To: <pgsql-general@postgresql.org>
Subject: Find duplicates in a column then print Information to a file
Development Platform: Ubuntu 17.10 mainly command line work
Tools: perl 5.26 and postgresql 9.6
Goal: Display duplicate aria titles on screen and to a local file
Database name: arias
Table name: aria_precis
csv delimiter: the # symbol
arias=# \d aria_precis
Table "public.aria_precis"
Column | Type | Modifiers
-------------+------+-----------
id | text | not null
aria | text |
artist | text |
a_artist | text |
album_title | text |
Indexes:
"aria_precis_pkey" PRIMARY KEY, btree (id)
Problems:
1. Can't connect aria title to id
2. Can't write discovered information to file
I know about this link but I probably didn't understand what I read there.
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-WITH
What I know displays the 46 duplicate aria titles of 413 entries but not the id or artist:
SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY aria HAVING COUNT(aria)>1);
When I perform the following I get (0 rows):
SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY id, aria HAVING COUNT(aria)>1);
aria
------
(0 rows)
After I get the above information how do I write that information to a file? The only thing I know writes the entire database to a file:
\COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,' CSV HEADER;
Thank you;
Sherman
========================================
On the cmdline just do:
psql arias <pwd> -c ‘SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis GROUP BY aria HAVING COUNT(aria)>1)‘ > outfile.txt
Martin.
Mmmmm, I notice a faint homework smell here ;-> , but the question is nicely asked so: On Sun, Dec 31, 2017 at 7:19 PM, Sherman Willden <operasopranos@gmail.com> wrote: ... > SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis > GROUP BY aria HAVING COUNT(aria)>1); The outer select is fully redundant. Just use the inner one. In fact, not redundnat, harmful, as it will spit many copies of aria for the duplicated ones. If you add id (and/or artist) to the outer query then it is useful again. > When I perform the following I get (0 rows): > SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis > GROUP BY id, aria HAVING COUNT(aria)>1); As expected. See the inner query, you are grouping by ID which is the primary key, by PK definition all counts are going to be one, so no results, nothing goes to the outer query. Use the first query, but adding id and artist to the OUTER level. > After I get the above information how do I write that information to a file? > The only thing I know writes the entire database to a file: > \COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,' > CSV HEADER; If copy format is ok to you, knowing \copy look at the docs, looking at https://www.postgresql.org/docs/9.6/static/app-psql.html you will read: >>> \copy { table [ ( column_list ) ] | ( query ) } { from | to } { 'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [ [ with ] ( option [, ...] ) ] Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required. <<< Notice the (query) option? Your copy is using the table+optional column list format, (aria_precis), just change it to the "( query )" format ( NOTICE THE MANDATORY PARENS, this is how copy knows what to do. "t" => table, "t(c1)"=>table+columns, "(xx)" => query ( nothing before the opening parens ). ) Happy new year. Francisco Olarte.
Mmmmm, I notice a faint homework smell here ;-> , but the question is
nicely asked so:
On Sun, Dec 31, 2017 at 7:19 PM, Sherman Willden
<operasopranos@gmail.com> wrote:
...
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY aria HAVING COUNT(aria)>1);
The outer select is fully redundant. Just use the inner one. In fact,
not redundnat, harmful, as it will spit many copies of aria for the
duplicated ones. If you add id (and/or artist) to the outer query then
it is useful again.
> When I perform the following I get (0 rows):
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY id, aria HAVING COUNT(aria)>1);
As expected. See the inner query, you are grouping by ID which is the
primary key, by PK definition all counts are going to be one, so no
results, nothing goes to the outer query.
Use the first query, but adding id and artist to the OUTER level.
> After I get the above information how do I write that information to a file?
> The only thing I know writes the entire database to a file:
> \COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,'
> CSV HEADER;
If copy format is ok to you, knowing \copy look at the docs, looking
at https://www.postgresql.org/docs/9.6/static/app-psql.html you will
read:
>>>
\copy { table [ ( column_list ) ] | ( query ) } { from | to } {
'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [
[ with ] ( option [, ...] ) ]
Performs a frontend (client) copy. This is an operation that runs an
SQL COPY command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.
<<<
Notice the (query) option? Your copy is using the table+optional
column list format, (aria_precis), just change it to the "( query )"
format ( NOTICE THE MANDATORY PARENS, this is how copy knows what to
do. "t" => table, "t(c1)"=>table+columns, "(xx)" => query ( nothing
before the opening parens ). )
Happy new year.
Francisco Olarte.
Thank you for the replies. I will start working on them now. Not a student but since I now have the time I may look into it. I am 71 retired working at Home Depot. I have a collection of CDs by various artists and I have the time to create and maintain my own database concerning these subjects. I retired from USAF, SUN Microsystems, and HP where I worked with the UNIX systems. As I said my main question right now is which duplicate songs do I have and who performs them. I really appreciate the support I get here.ShermanOn Sun, Dec 31, 2017 at 11:53 AM, Francisco Olarte <folarte@peoplecall.com> wrote:Mmmmm, I notice a faint homework smell here ;-> , but the question is
nicely asked so:
On Sun, Dec 31, 2017 at 7:19 PM, Sherman Willden
<operasopranos@gmail.com> wrote:
...
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY aria HAVING COUNT(aria)>1);
The outer select is fully redundant. Just use the inner one. In fact,
not redundnat, harmful, as it will spit many copies of aria for the
duplicated ones. If you add id (and/or artist) to the outer query then
it is useful again.
> When I perform the following I get (0 rows):
> SELECT aria FROM aria_precis WHERE aria IN (SELECT aria FROM aria_precis
> GROUP BY id, aria HAVING COUNT(aria)>1);
As expected. See the inner query, you are grouping by ID which is the
primary key, by PK definition all counts are going to be one, so no
results, nothing goes to the outer query.
Use the first query, but adding id and artist to the OUTER level.
> After I get the above information how do I write that information to a file?
> The only thing I know writes the entire database to a file:
> \COPY aria_precis TO '/home/sherman/aria_precis_2.csv' WITH DELIMITER '#,'
> CSV HEADER;
If copy format is ok to you, knowing \copy look at the docs, looking
at https://www.postgresql.org/docs/9.6/static/app-psql.html you will
read:
>>>
\copy { table [ ( column_list ) ] | ( query ) } { from | to } {
'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [
[ with ] ( option [, ...] ) ]
Performs a frontend (client) copy. This is an operation that runs an
SQL COPY command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.
<<<
Notice the (query) option? Your copy is using the table+optional
column list format, (aria_precis), just change it to the "( query )"
format ( NOTICE THE MANDATORY PARENS, this is how copy knows what to
do. "t" => table, "t(c1)"=>table+columns, "(xx)" => query ( nothing
before the opening parens ). )
Happy new year.
Francisco Olarte.
Rob Sargent <robjsargent@gmail.com> writes: > Keep in mind there is a quick write-to-file in psql with ‘\o <filename>’. And don’t forget to turn it off with ‘\o’ See also "\g filename", for one-shot output. regards, tom lane