Skip to content

Commit

Permalink
Add a new marker to check for memory leaks (#52)
Browse files Browse the repository at this point in the history
Users have indicated that it will be very useful if the plugin exposes a
way to detect memory leaks in tests. This is possible, but is a bit
tricky as the interpreter can allocate memory for internal caches, as
well as user functions.

To make this more reliable, the new marker will take two parameters:

* The limit of memory per location to consider an allocation. If the
  memory leaked by any allocation location in the test is higher than
  this value, the test will fail.

* An optional callable function that can be used to filter out
  locations. This will allow users to remove false positives.

Signed-off-by: Pablo Galindo <[email protected]>
Signed-off-by: Matt Wozniski <[email protected]>
Co-authored-by: Matt Wozniski <[email protected]>
  • Loading branch information
pablogsal and godlygeek committed Aug 23, 2023
1 parent b25d4b8 commit 0e33179
Show file tree
Hide file tree
Showing 9 changed files with 497 additions and 59 deletions.
18 changes: 18 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. module:: pytest_memray

pytest-memray API
=================

Types
-----

.. autoclass:: LeaksFilterFunction()
:members: __call__
:show-inheritance:

.. autoclass:: Stack()
:members:

.. autoclass:: StackFrame()
:members:

11 changes: 11 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@
from sphinxcontrib.programoutput import Command

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.extlinks",
"sphinx.ext.githubpages",
"sphinx.ext.intersphinx",
"sphinxarg.ext",
"sphinx_inline_tabs",
"sphinxcontrib.programoutput",
Expand All @@ -36,6 +38,15 @@
"https://github.com/bloomberg/pytest-memray/issues/.*": "https://github.com/bloomberg/pytest-memray/pull/.*"
}

# Try to resolve Sphinx references as Python objects by default. This means we
# don't need :func: or :class: etc, which keep docstrings more human readable.
default_role = "py:obj"

# Automatically link to Python standard library types.
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
}


def _get_output(self):
code, out = prev(self)
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ reports like:

usage
configuration
api
news
122 changes: 102 additions & 20 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,33 +31,115 @@ reported after tests run ends:
Markers
~~~~~~~

This plugin provides markers that can be used to enforce additional checks and
validations on tests when this plugin is enabled.
This plugin provides `markers <https://docs.pytest.org/en/latest/example/markers.html>`__
that can be used to enforce additional checks and validations on tests.

.. important:: These markers do nothing when the plugin is not enabled.

.. py:function:: pytest.mark.limit_memory(memory_limit: str)
``limit_memory``
----------------
Fail the execution of the test if the test allocates more memory than allowed.

When this marker is applied to a test, it will cause the test to fail if the execution
of the test allocates more memory than allowed. It takes a single argument with a
string indicating the maximum memory that the test can allocate.
When this marker is applied to a test, it will cause the test to fail if the
execution of the test allocates more memory than allowed. It takes a single argument
with a string indicating the maximum memory that the test can allocate.

The format for the string is ``<NUMBER> ([KMGTP]B|B)``. The marker will raise
``ValueError`` if the string format cannot be parsed correctly.
The format for the string is ``<NUMBER> ([KMGTP]B|B)``. The marker will raise
``ValueError`` if the string format cannot be parsed correctly.

.. warning::
.. warning::

As the Python interpreter has its own
`object allocator <https://docs.python.org/3/c-api/memory.html>`__ is possible
that memory is not immediately released to the system when objects are deleted, so
tests using this marker may need to give some room to account for this.
As the Python interpreter has its own
`object allocator <https://docs.python.org/3/c-api/memory.html>`__ it's possible
that memory is not immediately released to the system when objects are deleted,
so tests using this marker may need to give some room to account for this.

Example of usage:
Example of usage:

.. code-block:: python
.. code-block:: python
@pytest.mark.limit_memory("24 MB")
def test_foobar():
pass # do some stuff that allocates memory
@pytest.mark.limit_memory("24 MB")
def test_foobar():
pass # do some stuff that allocates memory
.. py:function:: pytest.mark.limit_leaks(location_limit: str, filter_fn: LeaksFilterFunction | None = None)
Fail the execution of the test if any call stack in the test leaks more memory than
allowed.

.. important::
To detect leaks, Memray needs to intercept calls to the Python allocators and
report native call frames. This is adds significant overhead, and will slow your
test down.

When this marker is applied to a test, the plugin will analyze the memory
allocations that are made while the test body runs and not freed by the time the
test body function returns. It groups them by the call stack leading to the
allocation, and sums the amount leaked by each **distinct call stack**. If the total
amount leaked from any particular call stack is greater than the configured limit,
the test will fail.

.. important::
It's recommended to run your API or code in a loop when utilizing this plugin.
This practice helps in distinguishing genuine leaks from the "noise" generated
by internal caches and other incidental allocations.

The format for the string is ``<NUMBER> ([KMGTP]B|B)``. The marker will raise
``ValueError`` if the string format cannot be parsed correctly.

The marker also takes an optional keyword-only argument ``filter_fn``. This argument
represents a filtering function that will be called once for each distinct call
stack that leaked more memory than allowed. If it returns *True*, leaks from that
location will be included in the final report. If it returns *False*, leaks
associated with the stack it was called with will be ignored. If all leaks are
ignored, the test will not fail. This can be used to discard any known false
positives.

.. tip::

You can pass the ``--memray-bin-path`` argument to ``pytest`` to specify
a directory where Memray will store the binary files with the results. You
can then use the ``memray`` CLI to further investigate the allocations and the
leaks using any Memray reporters you'd like. Check `the memray docs
<https://bloomberg.github.io/memray/getting_started.html>`_ for more
information.

Example of usage:

.. code-block:: python
@pytest.mark.limit_leaks("1 MB")
def test_foobar():
# Run the function we're testing in a loop to ensure
# we can differentiate leaks from memory held by
# caches inside the Python interpreter.
for _ in range(100):
do_some_stuff()
.. warning::
It is **very** challenging to write tests that do not "leak" memory in some way,
due to circumstances beyond your control.

There are many caches inside the Python interpreter itself. Just a few examples:

- The `re` module caches compiled regexes.
- The `logging` module caches whether a given log level is active for
a particular logger the first time you try to log something at that level.
- A limited number of objects of certain heavily used types are cached for reuse
so that `object.__new__` does not always need to allocate memory.
- The mapping from bytecode index to line number for each Python function is
cached when it is first needed.

There are many more such caches. Also, within pytest, any message that you log or
print is captured, so that it can be included in the output if the test fails.

Memray sees these all as "leaks", because something was allocated while the test
ran and it was not freed by the time the test body finished. We don't know that
it's due to an implementation detail of the interpreter or pytest that the memory
wasn't freed. Morever, because these caches are implementation details, the
amount of memory allocated, the call stack of the allocation, and even the
allocator that was used can all change from one version to another.

Because of this, you will almost certainly need to allow some small amount of
leaked memory per call stack, or use the ``filter_fn`` argument to filter out
false-positive leak reports based on the call stack they're associated with.
6 changes: 6 additions & 0 deletions src/pytest_memray/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
from __future__ import annotations

from ._version import __version__ as __version__
from .marks import LeaksFilterFunction
from .marks import Stack
from .marks import StackFrame

__all__ = [
"__version__",
"LeaksFilterFunction",
"Stack",
"StackFrame",
]
Loading

0 comments on commit 0e33179

Please sign in to comment.