Обсуждение: Easiest way to compare the results of two queries row by row and column by column
Can someone suggest the easiest way to compare the results from two queries to make sure they are identical? I am rewriting a large number of views and I want to make sure that nothing is changes in the results. Something like select compare_results('select * from v_old', 'select * from v_new'); I would want this to check that the row count and each row matched column by column. I am hoping someone has already written something for this...
Re: Easiest way to compare the results of two queries row by row and column by column
От
David Johnston
Дата:
Jason Long-2 wrote > Can someone suggest the easiest way to compare the results from two > queries to make sure they are identical? First thing that comes to mind: WITH before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , after_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , before_array AS (SELECT array_agg(before_qry) AS before_agg_array FROM before_qry) , after_array AS (SELECT array_agg(before_qry) AS after_agg_array FROM before_qry) SELECT *, before_agg_array = after_agg_array FROM before_array CROSS JOIN after_array Basically turn the resultsets into arrays (of composites) and then see if the arrays are the same. This has issues with respect to column names and comparable datatypes (i.e., if one column is bigint and the other is integer they still compare equally). One thought would to only allow a view name (and possibly, separately, the ORDER BY clause). Catalog lookups can be used to check for identical view output types. No idea of something like this exists and is readily available. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Thu, Jun 20, 2013 at 3:18 PM, Jason Long <mailing.lists@octgsoftware.com> wrote:
Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?
I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.
Something like
select compare_results('select * from v_old', 'select * from v_new');
I'd run:
select * from v_old
except
select * from v_new ;
And then
select * from v_new
except
select * from v_old ;
Both should return no rows.
However, if the queries can contain duplicate rows this will not detect differences in the number of times a row is replicated, i.e. if one query has a row 2 times and the other has it 3 times. If you need to detect such cases, I'd probably \copy out each query to a file, then use system tools to sort and diff the files.
Cheers,
Jeff
Thank you. I will give it a try. I have never used WITH before.
Thank you for the tips.
On Thu, 2013-06-20 at 16:05 -0700, David Johnston wrote:
Thank you for the tips.
On Thu, 2013-06-20 at 16:05 -0700, David Johnston wrote:
Jason Long-2 wrote > Can someone suggest the easiest way to compare the results from two > queries to make sure they are identical? First thing that comes to mind: WITH before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , after_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) ) , before_array AS (SELECT array_agg(before_qry) AS before_agg_array FROM before_qry) , after_array AS (SELECT array_agg(before_qry) AS after_agg_array FROM before_qry) SELECT *, before_agg_array = after_agg_array FROM before_array CROSS JOIN after_array Basically turn the resultsets into arrays (of composites) and then see if the arrays are the same. This has issues with respect to column names and comparable datatypes (i.e., if one column is bigint and the other is integer they still compare equally). One thought would to only allow a view name (and possibly, separately, the ORDER BY clause). Catalog lookups can be used to check for identical view output types. No idea of something like this exists and is readily available. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 06/21/2013 01:07 AM, Jeff Janes wrote:
On Thu, Jun 20, 2013 at 3:18 PM, Jason Long <mailing.lists@octgsoftware.com> wrote:Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?
I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.
Something like
select compare_results('select * from v_old', 'select * from v_new');I'd run:select * from v_oldexceptselect * from v_new ;And thenselect * from v_newexceptselect * from v_old ;Both should return no rows.
This is my solution as well.
However, if the queries can contain duplicate rows this will not detect differences in the number of times a row is replicated, i.e. if one query has a row 2 times and the other has it 3 times. If you need to detect such cases, I'd probably \copy out each query to a file, then use system tools to sort and diff the files.
No need, just use EXCEPT ALL.
Basically, the following query should return nothing:
(TABLE v_old EXCEPT ALL TABLE v_new)
UNION ALL
(TABLE v_new EXCEPT ALL TABLE v_old);
--
Vik
Vik, note that your new suggestion for a query might be more readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF DETECTING DUPLICATE ROWS. -- View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Re: Re: Easiest way to compare the results of two queries row by row and column by column
От
Branden Visser
Дата:
On Wed, Aug 17, 2016 at 4:58 AM, gilad905 <gilad905@gmail.com> wrote: > Vik, note that your new suggestion for a query might be more > readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF > DETECTING DUPLICATE ROWS. > I've done this in the past and in my use-case it was easy enough to export the rows to CSV with a sort, and then use a diff tool to ensure they're identical. My data had 100's of thousands of rows, but obviously with massive tables this may not be appropriate. If you want more details on this I can find the exact psql client query form I used to get a CSV file of the results. Hope that helps. Branden > > > -- > View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On 17/08/16 10:58, gilad905 wrote: > Vik, note that your new suggestion for a query might be more > readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF > DETECTING DUPLICATE ROWS. Yes, it does. And please don't shout at me. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Didn't mean to give the impression I'm 'shouting' :) just wanted to make sure other users will notice my reply. But you're right! it does. It even disregards the order of the rows. For some reason in a previous test I looked to me like it didn't. -- View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5917059.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
select * from (select id, my_age, ur_age as urage from datatable where agegp != 'S') m where m.myage <> m.urage -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html