Обсуждение: ERROR: invalid value "????" for "YYYY"

Поиск
Список
Период
Сортировка

ERROR: invalid value "????" for "YYYY"

От
Brian Wong
Дата:
I'm posting this question to pgsql-general.  Hopefully someone can share some insights with me.

I have a bunch of tables in the database and in a separate schema.  The tables' names are in this format:

???_???_???_YYYYMMDD

where the last 8 characters is a date.

When I query either the information_schema.tables or pg_tables extracting the last 8 characters out and converting it to a date, it works:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????';

But as soon as I reference it in the where clause, it gives a weird error:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????'
and to_date(right(table_name, 8), 'YYYYMMDD') is not null;

ERROR:  invalid value "tati" for "YYYY"
DETAIL:  Value must be an integer.

It seems like some strange values were passed into the to_date function, but I'm seeing that the rightmost 8 characters of all the table names are dates.  So only date strings are passed to the to_date function.  Absolutely nothing containing the string "tati" is passed to the to_date function.  What is going on?  Is that a bug?

Brian

Re: ERROR: invalid value "????" for "YYYY"

От
Brian Wong
Дата:
I think that clearly it is getting "pg_statistic" from the pg_catalog schema and feeding it to the query.  So "tatistic" gets extracted, being the last 8 characters.  And then "tati" is fed to the YYYY part of the to_date function.  But I already specify the table_schema to be the one that I want.  So those internal pg_* views shouldn't even show up in the query.

Brian


On Tue, Oct 8, 2013 at 1:50 PM, Brian Wong <bwong@imageworks.com> wrote:
I'm posting this question to pgsql-general.  Hopefully someone can share some insights with me.

I have a bunch of tables in the database and in a separate schema.  The tables' names are in this format:

???_???_???_YYYYMMDD

where the last 8 characters is a date.

When I query either the information_schema.tables or pg_tables extracting the last 8 characters out and converting it to a date, it works:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????';

But as soon as I reference it in the where clause, it gives a weird error:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????'
and to_date(right(table_name, 8), 'YYYYMMDD') is not null;

ERROR:  invalid value "tati" for "YYYY"
DETAIL:  Value must be an integer.

It seems like some strange values were passed into the to_date function, but I'm seeing that the rightmost 8 characters of all the table names are dates.  So only date strings are passed to the to_date function.  Absolutely nothing containing the string "tati" is passed to the to_date function.  What is going on?  Is that a bug?

Brian

Re: ERROR: invalid value "????" for "YYYY"

От
Rowan Collins
Дата:
On 08/10/2013 21:50, Brian Wong wrote:
select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????'
and to_date(right(table_name, 8), 'YYYYMMDD') is not null;

ERROR:  invalid value "tati" for "YYYY"
DETAIL:  Value must be an integer.

It seems like some strange values were passed into the to_date function, but I'm seeing that the rightmost 8 characters of all the table names are dates.  So only date strings are passed to the to_date function.  Absolutely nothing containing the string "tati" is passed to the to_date function.  What is going on?  Is that a bug?

Unless I'm much mistaken, there is no guarantee that the conditions in a WHERE clause will be checked in any particular order, because SQL does not specify a procedural recipe, only a logical one. If for whatever reason the query planner decides to check the condition involving to_date first, it will have to evaluate it for all rows in the table, leading to this error.  It works fine in the SELECT clause because that happens logically after all filtering has taken place.

I'm not sure if there are easier ways, but one way to force the order would be to restrict the set of tables in a sub-query or CTE first, and then check whatever you need about the date:

With tables_with_dates As (
    select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
    from information_schema.tables
    where table_schema = '????'
    and table_catalog = '????'
)
Select table_name, blah
Where blah > '2013-01-01'::date

--
Rowan Collins
[IMSoP]

Re: ERROR: invalid value "????" for "YYYY"

От
Rowan Collins
Дата:
On 08/10/2013 23:03, Rowan Collins wrote:
> With tables_with_dates As (
>     select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
>     from information_schema.tables
>     where table_schema = '????'
>     and table_catalog = '????'
> )
> Select table_name, blah
> Where blah > '2013-01-01'::date

Oops, there should be a "From tables_with_dates" in there, obviously :|
--
Rowan Collins
[IMSoP]


Re: ERROR: invalid value "????" for "YYYY"

От
Steve Crawford
Дата:
On 10/08/2013 01:50 PM, Brian Wong wrote:
I'm posting this question to pgsql-general.  Hopefully someone can share some insights with me.

I have a bunch of tables in the database and in a separate schema.  The tables' names are in this format:

???_???_???_YYYYMMDD

where the last 8 characters is a date.

When I query either the information_schema.tables or pg_tables extracting the last 8 characters out and converting it to a date, it works:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????';

But as soon as I reference it in the where clause, it gives a weird error:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????'
and to_date(right(table_name, 8), 'YYYYMMDD') is not null;

ERROR:  invalid value "tati" for "YYYY"
DETAIL:  Value must be an integer.

It seems like some strange values were passed into the to_date function, but I'm seeing that the rightmost 8 characters of all the table names are dates.  So only date strings are passed to the to_date function.  Absolutely nothing containing the string "tati" is passed to the to_date function.  What is going on?  Is that a bug?

As Rowan said, you can't necessarily rely on the order of testing in the where clause. In part, this is because one of the first things the planner does is to take the whole shebang and rewrite it into a consolidated statement that it can then refine and optimize. The tests that take place in the view can ultimately happen before, after, or intermixed with the tests in your where-clause as long as they are logically equivalent.

In your example, you are querying information_schema.tables which is not a table, it is a view that references, among other things, a subset of the pg_catalog.pg_class table.

When the planner gets through with its first steps you won't be calling information_schema.tables, you will be calling pg_catalog.pg_class and doing some where-clause tests that logically combine your where-clause with those in the view.

Why "tati"? When I query pg_class directly, the first row has a "relname" of "pg_statistic"  - it's not in the schema/catalog you seek but the executor hasn't checked for that, yet. The right eight characters of that relname are are "tatistic" thus the characters in the "YYYY" position are "tati" so based on the plan and testing order this just happens to be the first thing upon which the execution chokes.

Cheers,
Steve

Re: ERROR: invalid value "????" for "YYYY"

От
Brian Wong
Дата:
But from a user's perspective, why would it ever make sense that by adding an additional where clause, it actually brings in more data into the picture?  If I have query returning 100 rows.  Adding an additional where clause should only cut down the number of rows, not increase it.  And the extra data that's showing up is being added to the resultset cuz without the additional where clause, the result set did not contain any of those rows like pg_statistics/etc.

Brian


On Tue, Oct 8, 2013 at 4:10 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 10/08/2013 01:50 PM, Brian Wong wrote:
I'm posting this question to pgsql-general.  Hopefully someone can share some insights with me.

I have a bunch of tables in the database and in a separate schema.  The tables' names are in this format:

???_???_???_YYYYMMDD

where the last 8 characters is a date.

When I query either the information_schema.tables or pg_tables extracting the last 8 characters out and converting it to a date, it works:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????';

But as soon as I reference it in the where clause, it gives a weird error:

select table_name, to_date(right(table_name, 8), 'YYYYMMDD') blah
from information_schema.tables
where table_schema = '????'
and table_catalog = '????'
and to_date(right(table_name, 8), 'YYYYMMDD') is not null;

ERROR:  invalid value "tati" for "YYYY"
DETAIL:  Value must be an integer.

It seems like some strange values were passed into the to_date function, but I'm seeing that the rightmost 8 characters of all the table names are dates.  So only date strings are passed to the to_date function.  Absolutely nothing containing the string "tati" is passed to the to_date function.  What is going on?  Is that a bug?

As Rowan said, you can't necessarily rely on the order of testing in the where clause. In part, this is because one of the first things the planner does is to take the whole shebang and rewrite it into a consolidated statement that it can then refine and optimize. The tests that take place in the view can ultimately happen before, after, or intermixed with the tests in your where-clause as long as they are logically equivalent.

In your example, you are querying information_schema.tables which is not a table, it is a view that references, among other things, a subset of the pg_catalog.pg_class table.

When the planner gets through with its first steps you won't be calling information_schema.tables, you will be calling pg_catalog.pg_class and doing some where-clause tests that logically combine your where-clause with those in the view.

Why "tati"? When I query pg_class directly, the first row has a "relname" of "pg_statistic"  - it's not in the schema/catalog you seek but the executor hasn't checked for that, yet. The right eight characters of that relname are are "tatistic" thus the characters in the "YYYY" position are "tati" so based on the plan and testing order this just happens to be the first thing upon which the execution chokes.

Cheers,
Steve


Re: ERROR: invalid value "????" for "YYYY"

От
David Johnston
Дата:
Brian Wong-2 wrote
> But from a user's perspective, why would it ever make sense that by adding
> an additional where clause, it actually brings in more data into the
> picture?  If I have query returning 100 rows.  Adding an additional where
> clause should only cut down the number of rows, not increase it.  And the
> extra data that's showing up is being added to the resultset cuz without
> the additional where clause, the result set did not contain any of those
> rows like pg_statistics/etc.

No it does not.  Your general case is flawed in that adding an OR condition
will indeed cause more rows to be added.  In this case you are adding an AND
clause but since it is at the same level as the existing conditions all
possible records must evaluate against this new clause even if one of the
other clauses returns false.  There is no short-circuiting.  This may be
confusing but that is part of the education process.  If that expression
(the one in the where clause) did not cause an error there would be at most
the same number of records output as the original query.  And the same
number of rows are being processed at the WHERE clause in both cases.  Since
one of the conditions only makes sense on a sub-set of valid rows there must
be two levels of WHERE clauses in the query - or use CASE regexp THEN test
ELSE false END so the substring test only is performed against valid table
names.

David J.









--
View this message in context:
http://postgresql.1045698.n5.nabble.com/ERROR-invalid-value-for-YYYY-tp5773787p5773944.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: ERROR: invalid value "????" for "YYYY"

От
Rowan Collins
Дата:
On 10/10/2013 01:57, Brian Wong wrote:
> And the extra data that's showing up is being added to the resultset
> cuz without the additional where clause, the result set did not
> contain any of those rows like pg_statistics/etc.

To add to what Brain said on this already, the key thing is that
Postgres doesn't know (or care) which where clause is "additional". The
only "original" result set is the one with no where clause at all, which
contains rows for 'pg_statistics' etc.

--
Rowan Collins
[IMSoP]



Re: ERROR: invalid value "????" for "YYYY"

От
Steve Crawford
Дата:
On 10/09/2013 05:57 PM, Brian Wong wrote:
> But from a user's perspective, why would it ever make sense that by
> adding an additional where clause, it actually brings in more data
> into the picture?  If I have query returning 100 rows.  Adding an
> additional where clause should only cut down the number of rows, not
> increase it.
...

It may be a bit surprising at first when one does not consider they are
looking at a view, not at a table, but for the planner to do its job of
returning the data you want efficiently it must be allowed to optimize
your request by rewriting it into a logically equivalent form. For
example, given something like ...where eventtime >
abstime(12334554321)... the planner will detect that abstime(1234554321)
will always return the same result and will calculate that value once,
not redo it once for every row.

The idea that you are just reducing the number of rows doesn't mean that
doing things in that order is efficient. Imagine that you had a view
that returned the records of all males from a table of all people in the
US that includes an indexed birthday field as MMDD. This view would
return a bit over 155-million rows - roughly 50% of the approximately
310-million US population. You then query that view to find men who have
a birthday today. It is obviously suboptimal to first retrieve
155-million records from disk then evaluate those for the appropriate
birthday when you can, instead, start by grabbing 1/365th or less than
1-million rows which can be efficiently returned using the index then
evaluate those records for male/female.

It's sort of like sending someone to the store with a shopping list -
you have no expectation that they will fill the cart in the order you
put things on the list, just that they will come back with the items on
the list.

Cheers,
Steve