Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation. Provide catalog health check and surgery for four existing issues. #1

Merged
merged 60 commits into from
Jul 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
1983559
Drop unnecessary empty test.
deiferni Apr 17, 2019
e2b6bea
Fix tests for plone 5.
deiferni Apr 17, 2019
c92f16d
Add catalog checkup for rid/uid and data.
deiferni Apr 16, 2019
f9d1ccd
Add method to report symptom to catalog checkup.
deiferni Apr 16, 2019
08fba6d
Add catalog index length health checks.
deiferni Apr 16, 2019
a9eae33
Split checkup result into its own class.
deiferni Apr 16, 2019
85894cf
Improve testing for checkup, add more testcases.
deiferni Apr 16, 2019
635e963
Allow writing to a log.
deiferni Apr 18, 2019
d4f0f05
Add command to run checkup via command line.
deiferni Apr 17, 2019
b00244c
Allow attaching multiple paths per rid.
deiferni Apr 17, 2019
a9dae03
Add doctor and surgery for extra rid.
deiferni Apr 18, 2019
a380f49
Improve naming in various places.
deiferni May 16, 2019
8cd7185
Force process indexing queue before healthcheck.
deiferni Jun 13, 2019
36fc03f
Use set instead of dict to track symptoms.
deiferni May 16, 2019
14c4310
Add surgery for orphaned rid.
deiferni May 16, 2019
04e96a5
Provide consistent path sort order.
deiferni May 16, 2019
6cef3fa
Provide more detailed surgery result.
deiferni May 16, 2019
60ccfb3
Add commit and post-surgery healthcheck.
deiferni May 20, 2019
130e2a9
Add UID index length to healthcheck.
deiferni May 20, 2019
271b2f5
Add command tests, slightly rework arguments.
deiferni May 20, 2019
879b7f3
Improve surgery index removal.
deiferni May 23, 2019
5ca56ee
Rename index_data to catalog_data.
deiferni May 23, 2019
a80d0be
Consistently call index uuid_index.
deiferni May 23, 2019
7edcd7d
Provide get_physical_path and get_rid methods.
deiferni May 23, 2019
9be3479
Add healthcheck for uuid index data.
deiferni May 23, 2019
5e45e2d
Add helper to make unhealty extra rid after move.
deiferni May 24, 2019
b648547
Report rids in catalog but absent from uuid index.
deiferni May 23, 2019
e581946
Add compat module for compatibility imports.
deiferni May 24, 2019
c4c7544
Add debug helper to convert BTrees to python.
deiferni May 27, 2019
00c9df9
Add debug helper to pprint a btrees collection.
deiferni May 29, 2019
4a5d787
Add helper to make orphaned rid.
deiferni May 29, 2019
1c470ec
Add surgery to remove rid from UnIndex.
deiferni May 29, 2019
189ce41
Add surgery for unindex_object call.
deiferni Jun 3, 2019
1850ae6
Add utility function to find keys for a given rid.
deiferni Jun 3, 2019
ae77525
Add surgery to remove rid from DateRangeIndex.
deiferni Jun 3, 2019
9efc04a
Add surgery to remove rid from BooleanIndex.
deiferni Jun 4, 2019
7d91876
Add surgery to remove rid from UUIDIndex.
deiferni Jun 4, 2019
5eeed1d
Pass a linked `Length` to decrease during removal.
deiferni Jun 13, 2019
72557a3
Add surgery to remove rid from ExtendedPathIndex.
deiferni Jun 13, 2019
612f06d
Add helper to make missing UUID index entry.
deiferni Jun 13, 2019
a482db9
Add test for extra rid removal surgery.
deiferni Jun 13, 2019
77dda25
Add test for orphaned rid removal surgery.
deiferni Jun 13, 2019
ff66ab7
Add surgery to reindex missing uuid.
deiferni Jun 13, 2019
f25ff7e
Add helper do drop object from catalog indexes.
deiferni Jun 18, 2019
c58b2ec
Add surgery to reindex obj or drop it.
deiferni Jun 18, 2019
eb45986
Add quick subcommand description to readme.
deiferni Apr 18, 2019
18008c9
Add plone 5.1 classifier.
deiferni Jun 19, 2019
db45c54
Add surgery to drop extra rid not in indexes.
deiferni Jun 21, 2019
a897511
Process indexing queue after fixes.
deiferni Jun 21, 2019
1f8bb48
Also handle moved objects that are still present.
deiferni Jun 21, 2019
cf7e754
Remove orphaned rid not present in indexes.
deiferni Jun 21, 2019
a802308
Fix for acquired objects moved into their parents.
deiferni Jul 2, 2019
381f8af
Skip post surgery healthcheck for unfixables.
deiferni Jul 8, 2019
99e5af3
Always doom transaction for healthcheck.
deiferni Jul 8, 2019
f95437e
Improve unhealthy rid factory method name.
deiferni Jul 8, 2019
894a421
Indicate that symptom tuples shall be sorted.
deiferni Jul 8, 2019
e5ec455
Rename IndexSurgery to SurgeryStep.
deiferni Jul 8, 2019
83c668d
Use helper methods to get rid instead of redefine.
deiferni Jul 8, 2019
ff140bd
Drop unused imports.
deiferni Jul 8, 2019
9f2195b
Test surgery lookup by doctor is correct.
deiferni Jul 8, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 48 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,50 @@
Introduction
============

The package `ftw.catalogdoctor` provides checkup and surgery to remove inconsistencies in portal_catalog.
The package ``ftw.catalogdoctor`` provides healthcheck to find
inconsistencies in ``portal_catalog`` and surgery to remove some of them. It
can be run via a ``zopectl.command``.


Compatibility
-------------
Healthcheck
===========

Plone 4.3.x
Lists inconsistencies detected in ``portal_catalog``. Finds inconsistencies by
inspecting the catalog's internal data structures. It currently uses ``paths``
(the rid-path mapping), ``uids`` (the path-rid mapping), the ``UID`` index and
catalog metadata to determine if the catalog is healthy or if there are
problems. Healtcheck is a read-only operation and won't modify the catalog.

It can be run as follows:

.. code:: sh

$ bin/instance doctor healtcheck


Surgery
=======

Attempts to fix issues found by ``healthcheck``. Will do a healtchcheck before
surgery, then attempt surgery and finally do a post-surgery healthcheck.
Surgery is a write operation but changes are only committed to the database if
the post-surgery healtcheck yields no more health problems.
Currently the set of available surgery is limited to problems we have observed
in production.


It can be run as follows:

.. code:: sh

$ bin/instance doctor surgery


There is also a `--dry-run` parameter that prevents committing changes.

.. code:: sh

$ bin/instance doctor --dry-run surgery


Installation
Expand All @@ -26,6 +63,13 @@ Installation
ftw.catalogdoctor


Compatibility
-------------

Plone 4.3.x
Plone 5.1.x


Development
===========

Expand Down
154 changes: 154 additions & 0 deletions ftw/catalogdoctor/command.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
from __future__ import print_function
from ftw.catalogdoctor.compat import processQueue
from ftw.catalogdoctor.healthcheck import CatalogHealthCheck
from ftw.catalogdoctor.surgery import CatalogDoctor
from Products.CMFCore.utils import getToolByName
from Products.CMFPlone.interfaces import IPloneSiteRoot
from Testing.makerequest import makerequest
from zope.component.hooks import setSite
import argparse
import sys
import transaction


def discover_plone_site(app):
for item_id, item in app.items():
if IPloneSiteRoot.providedBy(item):
return item_id
return None


def load_site(app, path):
if not path:
print('ERROR: No Plone site found. Use --site or create a Plone site '
'in the Zope app root.',
file=sys.stderr)
sys.exit(1)

app = makerequest(app)
site = app.unrestrictedTraverse(path)
app.REQUEST.PARENTS = [site, app]
setSite(site)

return site


class ConsoleOutput(object):

def info(self, msg):
print(msg)

def warning(self, msg):
print(msg)

def error(self, msg):
print(msg, file=sys.stderr)


def healthcheck_command(portal_catalog, args, formatter):
transaction.doom() # extra paranoia, prevent erroneous commit

return _run_healthcheck(portal_catalog, formatter)


def _run_healthcheck(portal_catalog, formatter):
result = CatalogHealthCheck(catalog=portal_catalog).run()
result.write_result(formatter)
return result


def surgery_command(portal_catalog, args, formatter):
if args.dryrun:
formatter.info('Performing dryrun!')
formatter.info('')
transaction.doom()

result = _run_healthcheck(portal_catalog, formatter)
if result.is_healthy():
transaction.doom() # extra paranoia, prevent erroneous commit
formatter.info('Catalog is healthy, no surgery is needed.')
return

there_is_nothing_we_can_do = []
formatter.info('Performing surgery:')
for unhealthy_rid in result.get_unhealthy_rids():
doctor = CatalogDoctor(result.catalog, unhealthy_rid)
if doctor.can_perform_surgery():
surgery = doctor.perform_surgery()
surgery.write_result(formatter)
formatter.info('')
else:
there_is_nothing_we_can_do.append(unhealthy_rid)

if there_is_nothing_we_can_do:
formatter.info('The following unhealthy rids could not be fixed:')
for unhealthy_rid in there_is_nothing_we_can_do:
unhealthy_rid.write_result(formatter)
formatter.info('')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of there_is_nothing_we_can_do, we probably don't need a post-surgery healthcheck, as we couldn't fix all issues in the first place... That means we could return here, no?


formatter.info('Not all health problems could be fixed, aborting.')
return

processQueue()

formatter.info('Performing post-surgery healthcheck:')
post_result = _run_healthcheck(portal_catalog, formatter)
if not post_result.is_healthy():
transaction.doom() # extra paranoia, prevent erroneous commit
formatter.info('Not all health problems could be fixed, aborting.')
return

if args.dryrun:
formatter.info('Surgery would have been successful, but was aborted '
'due to dryrun!')
else:
transaction.commit()
formatter.info('Surgery was successful, known health problems could '
'be fixed!')


def _setup_parser(app):
parser = argparse.ArgumentParser(
description='Provide health check and fixes for portal_catalog.',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# swallows instance command
parser.add_argument('-c', help=argparse.SUPPRESS)

parser.add_argument(
'-s', '--site', dest='site',
default=discover_plone_site(app),
help='Path to the Plone site from which portal_catalog is used.')
parser.add_argument(
'-n', '--dry-run', dest='dryrun',
default=False, action="store_true",
help='Dryrun, do not commit changes. Only relevant for surgery.')

commands = parser.add_subparsers(dest='command')
healthcheck = commands.add_parser(
'healthcheck',
help='Run a health check for portal_catalog.')
healthcheck.set_defaults(func=healthcheck_command)

surgery = commands.add_parser(
'surgery',
help='Run a healthcheck and perform surgery for unhealthy rids in '
'portal_catalog.')
surgery.set_defaults(func=surgery_command)
return parser


def _parse(parser, args):
return parser.parse_args(args)


def _run(parsed_args, app, formatter):
site = load_site(app, parsed_args.site)
portal_catalog = getToolByName(site, 'portal_catalog')

return parsed_args.func(portal_catalog, parsed_args, formatter=formatter)


def doctor_cmd(app, args, formatter=None):
parser = _setup_parser(app)
parsed_args = _parse(parser, args)
_run(parsed_args, app, formatter or ConsoleOutput())
22 changes: 22 additions & 0 deletions ftw/catalogdoctor/compat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import pkg_resources


IS_PLONE_5 = pkg_resources.get_distribution('Products.CMFPlone').version >= '5'


if IS_PLONE_5:
from Products.CMFCore.indexing import processQueue
else:
# optional collective.indexing support
try:
from collective.indexing.queue import processQueue
except ImportError:
def processQueue():
pass

# optional Products.DateRecurringIndex support
try:
from Products.DateRecurringIndex.index import DateRecurringIndex
except ImportError:
class DateRecurringIndex(object):
pass
58 changes: 58 additions & 0 deletions ftw/catalogdoctor/debug.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from pprint import pprint


def btrees_to_python_collections(maybe_btrees):
"""Convert collections from btrees to python collections for debugging.

WARNING: naive implementation:
- converts sets to lists
- should not be used on large data structures
- should only be used to debug

Only use it to display things on the command line. Better not
programmatically work with the result otherwise. Stick to BTrees if you
can.

This method is intended to help displaying catalog data structures on the
command line for debugging. It can be uses in combination with pprint to
quickly analize the state of the catalog's internal data structures.

"""
if isinstance(maybe_btrees, (int, basestring)):
return maybe_btrees
elif hasattr(maybe_btrees, 'items'):
return dict((key, btrees_to_python_collections(val))
for key, val in maybe_btrees.items())
else:
return list(maybe_btrees)


def pprint_btrees(btrees):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

"""pretty print a collection from btrees.

Sample output looks like:

>>> index = plone.portal_catalog._catalog.indexes['path']
>>> pprint_btrees(index._index)
{None: {1: [97], 2: [98, 99]},
'child': {2: [98]},
'otherchild': {2: [99]},
'parent': {1: [97, 98, 99]},
'plone': {0: [97, 98, 99]}}

>>> pprint_btrees(index._unindex)
{97: '/plone/parent',
98: '/plone/parent/child',
99: '/plone/parent/otherchild'}

>>> pprint_btrees(index._index_items)
{'/plone/parent': 97,
'/plone/parent/child': 98,
'/plone/parent/otherchild': 99}

>>> pprint_btrees(index._index_parents)
{'/plone': [97],
'/plone/parent': [98, 99]}

"""
pprint(btrees_to_python_collections(btrees))
2 changes: 2 additions & 0 deletions ftw/catalogdoctor/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
class CantPerformSurgery(Exception):
"""Raised when a procedure cannot be performed."""
Loading