Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Generalized Pareto distribution #294

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

xieyj17
Copy link

@xieyj17 xieyj17 commented Jan 1, 2024

Generalized Pareto distribution is a commonly used distribution for modelling the tail of another distribution. It has wide applications in risk management, finance, ans quality assuarance. See wiki page.

I added a new distribution to the pymc-experimental branch.

@xieyj17 xieyj17 marked this pull request as ready for review January 1, 2024 19:46
@xieyj17
Copy link
Author

xieyj17 commented Jan 1, 2024

When I am rebasing, I accidently added the recent PR by @zaxtax .

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening up the PR. I left some comments and questions.

For the rebase, what did you try to do exactly?

@@ -221,6 +221,163 @@ def moment(rv, size, mu, sigma, xi):
return mode


# Generalized Pareto Distribution
class GenParetoRV(RandomVariable):
name: str = "Generalized Pareto Distribution"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: str = "Generalized Pareto Distribution"
name: str = "Generalized Pareto"

@@ -221,6 +221,163 @@ def moment(rv, size, mu, sigma, xi):
return mode


# Generalized Pareto Distribution
class GenParetoRV(RandomVariable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should subclass ScipyRandomVariable because Scipy RVs (sometimes) do something dumb with size=(1,)

Suggested change
class GenParetoRV(RandomVariable):
class GenParetoRV(ScipyRandomVariable):

Comment on lines 232 to 233
def __call__(self, mu=0.0, sigma=1.0, xi=1.0, size=None, **kwargs) -> TensorVariable:
return super().__call__(mu, sigma, xi, size=size, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly necessary because most users will never call the RV directly. We usually provide default values through the PyMC distribution class

Suggested change
def __call__(self, mu=0.0, sigma=1.0, xi=1.0, size=None, **kwargs) -> TensorVariable:
return super().__call__(mu, sigma, xi, size=size, **kwargs)

return stats.genpareto.rvs(c=xi, loc=mu, scale=sigma, random_state=rng, size=size)


gp = GenParetoRV()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gp = GenParetoRV()
gen_pareto = GenParetoRV()

ndim_supp: int = 0
ndims_params: List[int] = [0, 0, 0]
dtype: str = "floatX"
_print_name: Tuple[str, str] = ("Generalized Pareto Distribution", "\\operatorname{GP}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_print_name: Tuple[str, str] = ("Generalized Pareto Distribution", "\\operatorname{GP}")
_print_name: Tuple[str, str] = ("Generalized Pareto", "\\operatorname{GenPareto}")


def test_logp(self):
def ref_logp(value, mu, sigma, xi):
if xi == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scipy genpareto logpdf fails for xi = 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do noticed a tiny bug in scipy's function when calculating pdf for general pareto distribution with xi==0. Will double check and sumbit a PR to fix that as well.

Comment on lines +161 to +162
decimal=select_by_precision(float64=6, float32=2),
skip_paramdomain_outside_edge_test=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you skipping the outside edge test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am bounding the xi to be >= 0, I'd like to skip the outside edge test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of that test is to make sure the bounding is defined correctly, so you shouldn't skip

Comment on lines 304 to 305
xi : float
Shape parameter (xi >= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth a note saying this is more restrictive than other definitions of the GenPareto (in wikipedia there seems to be special cases for xi < 0?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xi < 0 are less seen for modelling extreme values. I will add a note here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xi < 0 are less seen for modelling extreme values. I will add a note here.


def moment(rv, size, mu, sigma, xi):
r"""
Mean is defined when :math:`\xi < 1`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to provide a real "moment", just anything that always has finite logp. So in this case moment = mu may be good enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we only need to return mu instead of the true mean? Or shall I just leave it as it?

Copy link
Member

@ricardoV94 ricardoV94 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that you can just return mu and take away the check_parameter part. Just make sure you broadcast mu with the other parameters in case size is None

@ricardoV94 ricardoV94 added the enhancements New feature or request label Jan 2, 2024
@ricardoV94 ricardoV94 changed the title Adding Generalized Pareto distribution Implement Generalized Pareto distribution Jan 2, 2024
@ricardoV94
Copy link
Member

You should add an entry in the docs API: https://github.com/pymc-devs/pymc-experimental/blob/main/docs/api_reference.rst

@xieyj17
Copy link
Author

xieyj17 commented Jan 4, 2024

@ricardoV94 Thanks for you comments! I have addressed your suggested changes. Thank you!

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments, thanks for the work so far!

Comment on lines +316 to +329
"""
Calculate log-probability of Generalized Pareto distribution
at specified value.
Parameters
----------
value: numeric
Value(s) for which log-probability is calculated. If the log probabilities for multiple
values are desired the values must be provided in a numpy array or Pytensor tensor
Returns
-------
TensorVariable
"""
Copy link
Member

@ricardoV94 ricardoV94 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docstrings are incomplete, and the logp function is not really user facing, so it's better to not include anything at all

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logp for Generalized Pareto distribution should be
我认为广义帕累托分布的 logp 应该是
def logp(value, mu, sigma, xi):
"""
Calculate log-probability of Generalized Pareto distribution
计算广义帕累托分布的对数概率
at specified value. 在指定值。

Parameters
----------
value: numeric
    Value(s) for which log-probability is calculated. If the log probabilities for multiple
    values are desired the values must be provided in a numpy array or Pytensor tensor

Returns
-------
TensorVariable
"""

scaled = (value - mu) / sigma

logp_expression = pt.switch(
    pt.isclose(xi, 0),
    -1 * scaled,
    -1 * pt.log(sigma) - ((xi + 1) / xi) * pt.log1p(xi * scaled),
)
logp = pt.switch(pt.gt(1 + xi * scaled, 0), logp_expression, -np.inf)
return check_parameters(logp, sigma > 0, pt.and_(xi > -1, xi < 1), msg="sigma > 0 or -1 < xi < 1")

Comment on lines +342 to +356
"""
Compute the log of the cumulative distribution function for Generalized Pareto
distribution at the specified value.
Parameters
----------
value: numeric or np.ndarray or `TensorVariable`
Value(s) for which log CDF is calculated. If the log CDF for
multiple values are desired the values must be provided in a numpy
array or `TensorVariable`.
Returns
-------
TensorVariable
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -138,6 +138,64 @@ class TestGenExtreme(BaseTestDistributionRandom):
]


class TestGenParetoClass:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the test for moment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logp for Generalized Pareto distribution should be
我认为广义帕累托分布的 logp 应该是
def logp(value, mu, sigma, xi):
"""
Calculate log-probability of Generalized Pareto distribution
计算广义帕累托分布的对数概率
at specified value. 在指定值。

Parameters
----------
value: numeric
    Value(s) for which log-probability is calculated. If the log probabilities for multiple
    values are desired the values must be provided in a numpy array or Pytensor tensor

Returns
-------
TensorVariable
"""

scaled = (value - mu) / sigma

logp_expression = pt.switch(
    pt.isclose(xi, 0),
    -1 * scaled,
    -1 * pt.log(sigma) - ((xi + 1) / xi) * pt.log1p(xi * scaled),
)
logp = pt.switch(pt.gt(1 + xi * scaled, 0), logp_expression, -np.inf)
return check_parameters(logp, sigma > 0, pt.and_(xi > -1, xi < 1), msg="sigma > 0 or -1 < xi < 1")

@rxd199682
Copy link

rxd199682 commented Feb 14, 2024

I think the logp for Generalized Pareto distribution should be
def logp(value, mu, sigma, xi):
"""
Calculate log-probability of Generalized Pareto distribution
at specified value.

    Parameters
    ----------
    value: numeric
        Value(s) for which log-probability is calculated. If the log probabilities for multiple
        values are desired the values must be provided in a numpy array or Pytensor tensor

    Returns
    -------
    TensorVariable
    """

    scaled = (value - mu) / sigma

    logp_expression = pt.switch(
        pt.isclose(xi, 0),
        -1 * scaled,
        -1 * pt.log(sigma) - ((xi + 1) / xi) * pt.log1p(xi * scaled),
    )
    logp = pt.switch(pt.gt(1 + xi * scaled, 0), logp_expression, -np.inf)
    return check_parameters(logp, sigma > 0, pt.and_(xi > -1, xi < 1), msg="sigma > 0 or -1 < xi < 1")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancements New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants