Bad news yesterday after a power failure on a server, impossible to restart my Helium ETL service: the database has been corrupted and the service is failing with the message “invalid page in block 54206178 of relation base/16384/186119818”
I had to spend time on the web and chatGPT wasn’t helping me (I’m kidding, this bot is stupid) to find a correct way to solve this, luckily, this has been resolved and here is the way I did it.
Find the corrupted element
The main problem is to find the corrupted element to repair it. The log gives the file location (for real, there are many files corresponding to the given path and you can find the precise file by considering the block). But the problem is to identify what this file is corresponding to. In my case, that file was a pkey index but until you know it, you can’t repair it.
For finding the corresponding element, you can use the following SQL
SELECT pg_class.relfilenode, pg_namespace.nspname as schema_name, pg_class.relname, pg_class.relkind
JOIN pg_namespace on pg_class.relnamespace = pg_namespace.oid
ORDER BY pg_class.relfilenode ASC;
This is giving you results like the following, where the relfilenode fied is corresponding to the file number in the error message and the schema_name / relname to the corrupted element.
Repair the identified element
In my case the corrupted element was an index, this is a good situation as an index can be rebuilt without data loss. The following command solved the situation
REINDEX INDEX <index_name>;
In the case of a table, it seems that the Vacuum command can be used. Also you can use the zero_damaged_pages option : this one will avoid to have an error when a damaged paged is found, so the process will not crash, but the error stays and some data may be corrupted.
SET zero_damaged_pages on;
The VACUUM mechanism will recreate the table and as a consequence clear the damages. It seems logical to use it in conjunction with the previous set. I assume the damaged data will be lost.
VACUUM FULL <table_name>
This is blocking the table, create a new table, so you need to have 2x space free for doing this processing.
I did not experiment the table reparation, so I can’t give you any real feedback about this. I did the index rebuild with success and save my DB.