Re: Different results from identical matviews

Поиск
Список
Период
Сортировка
От Anders Steinlein
Тема Re: Different results from identical matviews
Дата
Msg-id CAC35HN=QPcSAw+Y47tOCZ51PysSgYzoO5bT7AK3q8VPvB8ZCcQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Different results from identical matviews  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Different results from identical matviews  (Anders Steinlein <anders@e5r.no>)
Re: Different results from identical matviews  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On Thu, Jul 2, 2020 at 2:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anders Steinlein <anders@e5r.no> writes:
> We have a materialized view from which a customer reported some
> confusing/invalid results, leading us to inspect the query and not finding
> anything wrong. Running the query defining the matview manually, or
> creating a new (identical) materialized view returns the correct result.
> Obviously, we've done REFRESH MATERIALIZED VIEW just before doing the
> comparison, and all runs are in the same schema.

I suspect the query underlying the matviews is less deterministic than
you think it is.  I did not study that query in any detail, but just
from a quick eyeball: the array_agg() calls with no attempt to enforce a
particular aggregation order are concerning, and so is grouping by
a citext column (where you'll get some case-folding of a common value,
but who knows which).

Thanks for the tip, but I'm having a hard time thinking that's the case, seeing as I'm unable to trigger the wrong result no matter how hard I try with a new definition/manual query. I've introduced random ordering to the first CTE-clause (where the initial citext values comes from, and casing thus could differ in some order) which doesn't change the result.

When the citext type is used throughout the query, shouldn't the grouping result be deterministic? The citext values are first "rolled up" with array_agg() and later unnested and finally grouped. Shouldn't the end result be the same, regardless of what particular case-folded version of the value it chooses to group on?

I've simplified the query for this particular customer case that, again, always returns the correct result no matter how often I try:

mm_prod=> SELECT sid, count(*) FROM (
WITH tagged_contacts AS (
        SELECT lid, email, cl.skip_preexisting_campaigns AS skip_subscribed,
                ct.skip_preexisting_campaigns AS skip_tags, ladded, tagname, created
            FROM contacts_lists cl
                LEFT JOIN contacts_tags ct USING (email)
            WHERE lstatus = 'a'
            ORDER BY random()
    ),
    tagged_segments AS (
        SELECT s.lid, cid, sid, sp.type, sp.mid, matchdelay, tagname, event,
                count(*) OVER (PARTITION BY sid) AS requirements,
                activated_at
            FROM segments s
                LEFT JOIN campaigns USING (cid)
                INNER JOIN segments_predicates sp USING (sid)
            WHERE
                s.archived_at IS NULL
                AND (cid IS NULL OR activated_at IS NOT NULL)
    ),
    segments_contacts AS (
        SELECT lid, sid, requirements,
            CASE
                WHEN type = 'has_tag' THEN (
                    SELECT array_agg(DISTINCT email::citext)
                        FROM tagged_contacts
                        WHERE
                            lid = s.lid
                            AND tagname = s.tagname
                            AND (matchdelay IS NULL OR created + matchdelay < now())
                            AND (
                                cid IS NULL
                                OR (
                                    created >= activated_at
                                    AND NOT COALESCE(skip_tags, false)
                                )
                            )
                )
            END AS emails
        FROM tagged_segments s
    ),
    unnested AS (
        SELECT lid, sid, requirements, unnest(emails) AS email
            FROM segments_contacts
    )
    SELECT lid, sid, email
        FROM unnested
        GROUP BY lid, sid, email, requirements
        HAVING count(email) = requirements
) x
WHERE sid = 42259
GROUP BY sid;

  sid  | count
-------+-------
 42259 |    98
(1 row)


This is stale data for this customer, so no data changes are occurring to change the results. I can REFRESH MATERIALIZED VIEW as many times I was on the original segments_with_contacts matview, and I never see different results. If it were not deterministic, shouldn't I expect to see different results one in at least 100 times tried?

Thanks again for any insight to try and figure this out. Again, I could just re-create the matview we use in production and it would likely work (since I'm unable to get wrong results with a newly created case), but I would rather try to find out the root cause here first.

Best,
-- a.

В списке pgsql-general по дате отправления:

Предыдущее
От: Brajendra Pratap Singh
Дата:
Сообщение: restore_command for postgresql streaming replication
Следующее
От: Anders Steinlein
Дата:
Сообщение: Re: Different results from identical matviews