Using amcheck to check PostgreSQL 10 indexes for corruption (integrity checking)

Motivation

There has been some discussions over at meta discourse lately about PostgresSQL 10 and 12 indexes being corrupt. We have yet to see this in practice using Discourse; however, it is certainly a good idea check indexes and verify they are not corrupt, especially before doing a backup and restore.

The amcheck module provides functions that allow you to verify the logical consistency of the structure of PostgreSQL indexes

Installing amcheck

In this Discourse configuration, we have both web-app and data containers, so we will install amcheck in the data container.

# ./launcher enter data
# apt install postgresql-10-amcheck

Verify the Indexes

We need to create the amcheck_next extension (one time only) and then run the query:

# su postgres
$ psql discourse
psql (10.12 (Debian 10.12-2.pgdg100+1))
Type "help" for help.

discourse=# CREATE EXTENSION amcheck_next;
CREATE EXTENSION
discourse=# 

Now that the initial setup is done, we can run this query:

discourse=# SELECT bt_index_check(c.oid), c.relname, c.relpages
discourse-# FROM pg_index i
discourse-# JOIN pg_opclass op ON i.indclass[0] = op.oid
discourse-# JOIN pg_am am ON op.opcmethod = am.oid
discourse-# JOIN pg_class c ON i.indexrelid = c.oid
discourse-# JOIN pg_namespace n ON c.relnamespace = n.oid
discourse-# WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
discourse-# -- Don't check temp tables, which may be from another session:
discourse-# AND c.relpersistence != 't'
discourse-# -- Function may throw an error when this is omitted:
discourse-# AND i.indisready AND i.indisvalid
discourse-# ORDER BY c.relpages DESC LIMIT 10;
 bt_index_check |              relname              | relpages 
----------------+-----------------------------------+----------
                | pg_depend_depender_index          |      150
                | pg_attribute_relid_attnam_index   |      147
                | pg_depend_reference_index         |      123
                | pg_attribute_relid_attnum_index   |      101
                | pg_class_tblspc_relfilenode_index |       42
                | pg_class_relname_nsp_index        |       38
                | pg_proc_proname_args_nsp_index    |       38
                | pg_class_oid_index                |       27
                | pg_description_o_c_o_index        |       24
                | pg_type_typname_nsp_index         |       20

This Discourse PostgreSQL 10 query shows a session that performs verification of catalog indexes. Verification of the presence of heap tuples as index tuples is requested for unique indexes only.

As we can see, no error is raised, all indexes tested appear to be logically consistent.

The Query (Copy & Paste)

  SELECT bt_index_check(index => c.oid, heapallindexed => i.indisunique),
          c.relname,
          c.relpages
  FROM pg_index i
  JOIN pg_opclass op ON i.indclass[0] = op.oid
  JOIN pg_am am ON op.opcmethod = am.oid
  JOIN pg_class c ON i.indexrelid = c.oid
  JOIN pg_namespace n ON c.relnamespace = n.oid
  WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
  -- Don't check temp tables, which may be from another session:
  AND c.relpersistence != 't'
  -- Function may throw an error when this is omitted:
  AND c.relkind = 'i' AND i.indisready AND i.indisvalid
  ORDER BY c.relpages DESC LIMIT 10;

Summary

No error concerning corruption raised by amcheck should ever be a false positive. In practice, amcheck is more likely to find software bugs than problems with hardware. amcheck raises errors in the event of conditions that, by definition, should never happen, and so careful analysis of amcheck errors is often required.

Repairing Postgres Index Corruption

There is no general method of repairing problems that amcheck detects.

An explanation for the root cause of an invariant violation should be sought. pageinspect may play a useful role in diagnosing corruption that amcheck detects.

A REINDEX may not be effective in repairing corruption, but you can try and run the above query again.

Final Thoughts

I have performed this procedure on two Discourse installations, both running PostgreSQL 10 with a "two container" configuration and a single container configuration. The process went well in both instances. I recommend anyone who in concerned about corrupt PostgreSQL indexes consider this procedure.

https://access.crunchydata.com/documentation/amcheck-next/1.5/

See also:

4 Likes

Is amcheck applicable AFTER an upgrade 10 → 12 was already done, and how should it be done on a single container (Discourse default) install?

root@dev-disco:/var/discourse# ./launcher enter app
root@dev-disco-app:/var/www/discourse# su postgres
postgres@dev-disco-app:/var/www/discourse$ psql discourse
psql (12.2 (Debian 12.2-2.pgdg100+1))
Type "help" for help.

discourse=# CREATE EXTENSION amcheck_next;
ERROR:  could not open extension control file "/usr/share/postgresql/12/extension/amcheck_next.control": No such file or directory

Did you install amcheck?

1 Like

Yes, in the app container:

postgresql-10-amcheck is already the newest version (1.5-1.pgdg100+1).

Hi @omarfilip

So far I have only tested amcheck on PostgeSQL 10.

I assume that amcheck for PostgreSQL 10 will not work on a PostgreSQL 12 install.

1 Like

FWIW,

This PostgreSQL 12 document shows amcheck working for PostgreSQL 12:

So, I assume for sites running PostgreSQL 12, then need to find the amcheck install for that version.

As far as Discourse goes, we have not yet upgraded to PostgreSQL 12 because we are currently content with the performance (and lack of errors and migration problems from 10 to 12 ) with PostgreSQL 10.

1 Like

One would think this was an easy task, but I have not been able to find it.

I could not find it for PG 12 either. Odd that amcheck is mentioned in the PG 12 official documentation but there seems to be no amcheck (installable code) for PG 12.

Anyway, for production Discourse, we have remained with PG 10 for now and into the foreseeable future.

1 Like

FYI @omarfilip,

The github page for amcheck says:

Supported versions: PostgreSQL 9.4 - PostgreSQL 10

1 Like

This is also part of the reason we are sticking with PostgresSQL 10.

We think the benefits of PostgreSQL 12 do not out weight the downsides.

In fact, I see very little benefit (tangible) of PG 12 and only problem and after problem.

For PostgreSQL 12, you can reindex like this (for example):

Reindex Concurrently

docker exec -it data bash -c "echo 'REINDEX DATABASE CONCURRENTLY discourse;' | su postgres -c 'psql discourse'"

Reindex Non-concurrently

docker exec -it data bash -c "echo 'REINDEX DATABASE discourse;' | su postgres -c 'psql discourse'"

Reference: