On Thu, May 29, 2014 at 9:08 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
> Hi,
>
> On 2014-05-29 08:56:10 -0700, Maciek Sakrejda wrote:
> > On Tue, May 27, 2014 at 11:06 AM, Heikki Linnakangas <
> > hlinnakangas@vmware.com> wrote:
> >
> > > I would be interested in seeing the structure of the index, if there is
> > > anything else corrupt in there.
> >
> >
> > It's an index on (integer, timestamp without time zone). Unfortunately,
> > it's a customer DB, so getting more direct access may be problematic. Is
> > there metadata we can gather about it that could be useful?
> >
> > Also, what WAL actions led to the error? Try something like:
> > >
> > > pg_xlogdump -r btree -p $PGDATA -s 339/65000000 | grep 1665279
> > >
> > > and search that for any records related to the failed split, e.g.
> grepping
> > > further for the block numbers in the error message.
>
> I wonder why the failure didn't show the record that triggered the
> error? This is on a primary?
No, I ran pg_xlogdump on the failed replica--I thought that's what Heikki
was suggesting (and it seemed to me like the source of the problem would be
there).
My hope^Wguess is that this is a symptom of
> 1a917ae8610d44985fd2027da0cfe60ccece9104 (not released) or even
> 9a57858f1103b89a5674f0d50c5fe1f756411df6 (9.3.4). Once the hot chain is
> corrupted such errors could occur
> When were those standbys made? Did the issue occur on the primary as
> well?
>
The original ancestor was a 9.3.2. No problems on the primary.
PS: wal-e's intersperesed output is rather annoying...
>
I thought it might be relevant. I'll exclude it in the future.