[ENH] Tweedie Distribution #428

ShreeshaM07 · 2024-07-17T18:16:54Z

Reference Issues/PRs

related to #423

What does this implement/fix? Explain your changes.

Implements a Tweedie distribution in skpro by working the compound poisson gamma case when power parameter 1<p<2. Calls the other pre-existing distributions in other interger cases

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

If this idea would be suitable or is there something that might hinder the further extension as skpro does not handle case when 2<power<3.

Did you add any tests for the change?

no

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

ShreeshaM07 · 2024-07-17T18:20:00Z

@fkiraly I have tried to implement pdf of the compound poisson-gamma case when power is in (1,2). But I am still unsure what can be done to handle the case when power is in (2,3) or when it is p<0.

Need some thought on whether this would be a good idea to distinguish them based on the power parameters in the way I have done or is there any better way.

fkiraly

hm, this is incomplete - only covers p between 1 and 2, no?

May I suggest calling this CompoundPoissonGamma instead, and later using it as one of the Tweedie ED components in a delegator, as suggested here?
#423

ShreeshaM07 · 2024-07-18T09:25:11Z

hm, this is incomplete - only covers p between 1 and 2, no?

Yes I will be splitting it into Compound poisson gamma and then calling it in the tweedie distribution. However I still face the problem of integrating the pdf to get the cdf. Is there any method in scipy apart from using quad to integrate the pdf manually?

Also facing the same issue for finding the ppf some ideas would be helpful. I've asked gpt and it suggests using a scipy.optimize.brentq to find the ppf I'm not very sure if that will work correct.

fkiraly · 2024-07-19T21:04:51Z

Brute force integration and such are typically last resort defaults.

I would suggest to check the approximation formulae along the lines of those linked in this comment: #429 (comment)

ShreeshaM07 · 2024-07-20T10:43:59Z

I was going through the section 3 of the Withers Nadaraja paper and it seems to me that the cdf ppf

is still not a simple term it although it looks straight forward.
It requires the summation of the infinite series of the Hermite polynomial

where $\phi(x)=(e^-\frac{(x)^2}{2})$ . and $\Phi(x)$ is the wright_bessel function.

I am not sure how we can do this efficiently. Also where is the power parameter here I do not seem to spot it it should ideally be present right as it will alter the value of the cdf.
Some insight here would be helpful.

fkiraly · 2024-07-20T18:32:33Z

Also where is the power parameter here I do not seem to spot it it should ideally be present right as it will alter the value of the cdf.

it's implicit in the constants $c_{rj}$ (the formula below the screenshot box, in the paper)

ShreeshaM07 · 2024-07-22T13:58:55Z

it's implicit in the constants crj (the formula below the screenshot box, in the paper)

I see then how do I generate these crj from the power parameter alone.

And also any thoughts on how this infinite series summation can be made efficient. Is there any way this expression will simplify to something without a summation?

fkiraly · 2024-07-22T14:25:33Z

I see then how do I generate these crj from the power parameter alone.

I think you need to map them, following through the correspondence between Tweedie and CPG parameters. I would recommend you start implementing CPG in pure CPG parameters, and the mapping is done in the delegator.

fkiraly · 2024-07-22T14:26:18Z

And also any thoughts on how this infinite series summation can be made efficient. Is there any way this expression will simplify to something without a summation?

I don't think so - the usual procedure is to prescribe an epsilon and abort once the change becomes less than that epsilon - this is sound as long as you know the magnitudes are decreasing, and you have some theory that links magnitude of individual terms to magnitude of the total remainder.

ShreeshaM07 · 2024-07-24T05:18:34Z

I don't think so - the usual procedure is to prescribe an epsilon and abort once the change becomes less than that epsilon - this is sound as long as you know the magnitudes are decreasing, and you have some theory that links magnitude of individual terms to magnitude of the total remainder.

Can we not just keep the pdf = poisson.pdf(x) * gamma.pdf(x) where gamma will have the $\alpha=i*\alpha$ where $\alpha$ would have been the shape otherwise and i is at all the integers. so now we get the complete poisson gamma distribution. For the cdf we must sum these up for all values keeping a tolerance does that work?

For now i have implemented the CPG distributions pdf manually as given in the paper by Nadaraja and Withers. I'll try to implement the cdf next.

ShreeshaM07 · 2024-07-24T06:57:48Z

I am not able to get the gr(x) polynomial could you please help out as to what that is.

ShreeshaM07 · 2024-07-25T10:14:34Z

@fkiraly I have implemented the cdf as per the Witheres Nadaraja paper. But I think there is some small issue as the function is not converging to a sum within [0,1] when the iterations(r) are large. Some help in finding out the fault would be helpful.

fkiraly · 2024-07-25T12:05:25Z

Sure!

What would help me if you write a snippet of code, either assuming the branch or just numpy imports, that computes the sum.

fkiraly · 2024-07-25T12:06:04Z

I am not able to get the gr(x) polynomial could you please help out as to what that is.

I have no context here, what is the gr polynomial, what does it mean to "get" it, and why do you need to get it?

ShreeshaM07 · 2024-07-26T09:26:13Z

What would help me if you write a snippet of code, either assuming the branch or just numpy imports, that computes the sum.

With reference to this theorem

cdf

equation(3.8) is the cdf which requires sum(equation(3.10)) for all positive integer r.
Now the equation(3.10) requires the sum of for values of j from 1 to r.
cdf is being computed by

    def _cdf(self, x):
        """Cumulative distribution function.

        Parameters
        ----------
        x : 2D np.ndarray, same shape as ``self``
            values to evaluate the cdf at

        Returns
        -------
        2D np.ndarray, same shape as ``self``
            cdf values at the given points
        """
        rho = self.alpha
        max_r = 10  # Adjust as needed
        phi_x = np.exp(-(x**2) / 2) / np.sqrt(2 * np.pi)
        series_sum = np.zeros_like(x, dtype=float)
        for r in range(1, max_r + 1):
            hr_x = self._compute_hr(x, r, rho)
            series_sum += hr_x
            # print(f"r={r}, hr_x={hr_x}, series_sum={series_sum}")
            # if np.any(np.abs(hr_x) < 1e-6):  # Convergence check
            #     break
        cdf = phi_x * (1 + series_sum)
        return cdf

Using $\phi(x)=\frac{e^-\frac{x^2}{2}}{sqrt(2*pi)}$.
He(r+2j,x) is being computed by

from scipy.special import eval_hermitenorm
eval_hermitenorm(r + 2 * j, x)

in the method

    def _compute_hr(self, x, r, rho):
        from scipy.special import eval_hermitenorm, factorial

        hr = np.zeros_like(x, dtype=float)
        for j in range(1, r + 1):
            H = eval_hermitenorm(r + 2 * j, x)
            crj = self._compute_crj(r, j, rho)
            hr += H * crj / factorial(j)
        return hr

This now requires the crj term for each value of r which is being computed by.

    def _compute_crj(self, r, j, rho):
        from itertools import combinations_with_replacement

        from scipy.special import factorial

        c_rj = 0
        partitions = [
            p for p in combinations_with_replacement(range(1, r + 1), j) if sum(p) == r
        ]
        for partition in partitions:
            term = 1
            for s_i in partition:
                term *= factorial(rho + 1 + s_i) / (
                    factorial(rho - 1) * factorial(s_i + 2)
                )
            c_rj += term
        c_rj *= (rho**2 + rho) ** (-r / 2 - j)
        return c_rj

So the idea here is that the partitions array contains the possible arrays which in turn contain values of si (that is required in calculation of crj as per the equation in the paper) where sum(si) = r for different values of r.
From this I am then finding the nCr formula and multiplying all the possible arrays for a specific r value and then summing them up to make crj.
I am then multiplying the resultant crj by the constant term $(\rho^2 + \rho)^(-\frac{r}{2}-j)$ to get the final required crj.
There seems to be some fault in the way crj is being calculated as the sum is not converging to a value between [0,1] when r=say 10.

ppf

the ppf requires the computing of the equation(3.7) where $\Phi(x)$ is the standard normal cumulative distribution and $\lambda$ is the self.lambda but I am not sure what the definition of gr(x) here is. So I now need the definition of this polynomial g(r,x).

fkiraly · 2024-07-26T12:57:08Z

Just following through the references, my interpretation: the $g_r$ are the g-polynomials in the Cornish-Fisher expansion, see here: https://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion

To avoid confusion, there are g- and h-polynomials, and also f-polynomials, and Withers claims there is a mistake in some original explicit formulae for the low order g-polynomials in the papers.

The trail of references is:

paper cited references "Withers - Asymptotic expansion for Distributions and Quantiles with Power Series Cumulants, JRSS, 1984". Section 3 contains the g-polynomials.
This in turn references an explicit formula of g-polynomials in terms of f-polynomials in "order inference for asymptoticaliy normal random variables. Sankhya B, 1982", Theorem A2 contains the expression.

This seems to become the definition of a rabbit hole...

I would suggest trying to implement fast numpy based functions for the various polynomials or expressions, in cases they cannot be obtained from scipy or numpy directly. I would also test these individually against known values and cover them well with tests.

For expressions such as the standard normal pdf, I would use scipy or numpy where available, instead of reimplementing them, to avoid little bugs like wrong signs, missing factors, etc.

ShreeshaM07 · 2024-07-30T07:47:08Z

Just following through the references, my interpretation: the gr are the g-polynomials in the Cornish-Fisher expansion, see here: https://en.wikipedia.org/wiki/Cornish%E2%80%93Fisher_expansion

I am not quite sure I see an implementation of $g_r$ polynomial here. It only shows the $h_r$ polynomial .

I am not able to find any pre-existing method in scipy to interface the computation of $c_{rj}$. From the various implementations I have done so far it does not produce results in the interval [0,1] for the cdf. It would be of much help if you could take a look at the implementation of the _compute_crj method. As from what I infer from the Withers and Natharaja paper the implementation of the crj seems to be correct however it does not produce the right results.

I've also tried using $l_r$ polynomial for the calculation of the same $c_{rj}$ but that was producing the exact same results for the respective values of x and not in range [0,1]. So I am not really sure what exactly is going wrong.

ShreeshaM07 · 2024-07-30T08:40:20Z

This is the code that was used to test

from skpro.distributions.cmp_pois_gam import CmpPoissonGamma
import numpy as np
lambda_param = 2.0
alpha_param = 1.5
beta_param = 1.0

cmp_poisson_gamma_dist = CmpPoissonGamma(lambda_=lambda_param, alpha=alpha_param, beta=beta_param)

x_values = np.array([[0.5, 1.0, 1.5], [2.0, 2.5, 3.0]])
p_values = np.array([[0.1, 0.25, 0.5], [0.75, 0.9, 0.95]])

pdf_values = cmp_poisson_gamma_dist._pdf(x_values)
pmf_values = cmp_poisson_gamma_dist._pmf(np.floor(x_values).astype(int))
cdf_values = cmp_poisson_gamma_dist._cdf(x_values)
# ppf_values = cmp_poisson_gamma_dist._ppf(p_values)

print(f"PDF values:\n{pdf_values}")
print(f"PMF values:\n{pmf_values}")
print(f"CDF values:\n{cdf_values}")
# print(f"PPF values:\n{ppf_values}")

Output

PDF values:
[[0.17620241 0.19445143 0.19319874]
 [0.18385598 0.17024862 0.154407  ]]
PMF values:
[[0.13533528 0.27067057 0.27067057]
 [0.27067057 0.27067057 0.18044704]]
CDF values:
[[ 2483.23279161 -1265.84513228   273.00637241]
 [  233.08588426  -333.03847813   238.31041401]]

Regarding the implementation specifics is present in the comment.

fkiraly · 2024-07-30T16:10:22Z

ok, I see, this is typical for "implementation from paper" - it is not unusual that it looks like things have been correctly implemented, but they then do not behave as expected. My primary suggestion for continuing with this is to proceed with "bottom up validation", i.e., first test correctness of the lower down functions like crj and gr.

What I would do, in sequence:

replace common functions computed explicitly by scipy functions, e.g., normal pdf, cdf used in the formulae
compute some values of crj manually. Add these as tests for crj
compute some values of gr manually. Add these as tests for gr
then, compute some pdf and pmf values that are easy to get. Add as tests for tweedie.

Debug wherever manually computed values do not agree with machine computed ones.

If we did this, and pdf, pmf values fail to match the constraints still, then there is supsicion that there is an error in the math, in the paper. But until we can conclude this, we need to do the tedious manual verification above.

Of course we can decide to work on sth else if this feels like too much of a rabbit hole.

ShreeshaM07 · 2024-08-06T18:39:48Z

This comment is the output for the final output of cdf.

Now for the individual values itself for example
the hermite polynomial for respective values of x as given in the comment linked above is

[[ -3240.81005859    936.           5141.87841797]
 [ -3342.         -13760.63232422  12312.        ]]

when r=5 and j=3 so the above array contains $He_{11}(x)$ and
$c_rj$ for the same is 0.018245457192850898
and $j!$ = 6.0

and the value $\frac{He_{r+2j}(x)*c_{rj}}{j!}$ for this iteration is

[[ -9.8550102    2.84629132  15.63598709]
 [-10.16271966 -41.844838    37.43967816]]

Now doing this for all values of j from 1 to r where r=5 gives $h_5(x)=$

[[ -8.43080033   7.69420971   6.21852576]
 [-23.28549087   8.26764324  73.77196399]]

Now these are smaller for small values of r but when r is larger the hermite polynomial also becomes bigger
for example when r=8 and j=8
we get hermite $He_{24}(x)$ =

[[-2.64545228e+11  9.25901342e+10  2.46363203e+11]
 [-7.97293258e+11  1.45418958e+12 -1.42824459e+12]]

and $c_{rj}$=6.779681328689356e-05
and 8!=40320.0

the value of this iteration given by $\frac{He_{r+2j}(x)*c_{rj}}{j!}$
is

[[ -444.82448913   155.68740181   414.25198522]
 [-1340.62356486  2445.17409824 -2401.54841047]]

and doing this for all values of j from 1 to r=8 gives
$h_8$=sum( $\frac{He_{r+2j}(x)*c_{rj}}{j!}$) we get

[[ -369.49482312    70.70751258   412.63323489]
 [-1030.47215469  1589.06248313 -1351.95793231]]
```.

fkiraly · 2024-08-06T20:41:04Z

could you kindly do this bottom-up, with some values calculated manually, and saying explicitly which object in the papers this corresponds to?

E.g., sth like

manual computation for `he_5`

$He_5 = ...$

using the code

In the implementation, we call he(5), this gives ...

ShreeshaM07 · 2024-08-07T19:34:30Z

So the whole stack output for say $h_5$ =

1. r =  5  and j = 1
2. He( 7 , [[0.5 1.  1.5]
 [2.  2.5 3. ]] ) = 
 [[ -40.0234375  -20.          54.4921875]
 [  86.         -62.3046875 -396.       ]]
3. crj for r= 5 j= 1  =  0.030768701028203366
4. j! 1.0
5. h_ 5 upto from j=1 to j= 1  =  [[ -1.23146918  -0.61537402   1.67665383]
 [  2.64610829  -1.9170343  -12.18440561]]
-------------------------------
1. r =  5  and j = 2
2. He( 9 , [[0.5 1.  1.5]
 [2.  2.5 3. ]] ) = 
 [[ 326.53320312   28.         -541.21289062]
 [-190.         1431.10351562 1620.        ]]
3. crj for r= 5 j= 2  =  0.034147998534292366
4. j! 2.0
5. h_ 5 upto from j=1 to j= 2  =  [[ 4.34375849 -0.13730204 -7.56401467]
 [-0.59795157 22.51762607 15.47547321]]
-------------------------------
1. r =  5  and j = 3
2. He( 11 , [[0.5 1.  1.5]
 [2.  2.5 3. ]] ) = 
 [[ -3240.81005859    936.           5141.87841797]
 [ -3342.         -13760.63232422  12312.        ]]
3. crj for r= 5 j= 3  =  0.018245457192850898
4. j! 6.0
5. h_ 5 upto from j=1 to j= 3  =  [[ -5.51125171   2.70898928   8.07197242]
 [-10.76067123 -19.32721193  52.91515137]]
-------------------------------
1. r =  5  and j = 4
2. He( 13 , [[0.5 1.  1.5]
 [2.  2.5 3. ]] ) = 
 [[  37809.77648926  -23672.          -47160.5592041 ]
 [  84398.           73069.20471191 -350568.        ]]
3. crj for r= 5 j= 4  =  0.004783453196627576
4. j! 24.0
5. h_ 5 upto from j=1 to j= 4  =  [[  2.02463563  -2.00909006  -1.32762456]
 [  6.06074056  -4.76374856 -16.95674948]]
-------------------------------
1. r =  5  and j = 5
2. He( 15 , [[0.5 1.  1.5]
 [2.  2.5 3. ]] ) = 
 [[ -505845.15194702   469456.           365090.80709839]
 [-1419802.           630472.64480591  4389552.        ]]
3. crj for r= 5 j= 5  =  0.0024803090649180024
4. j! 120.0
5. h_ 5 upto from j=1 to j= 5  =  [[ -8.43080033   7.69420971   6.21852576]
 [-23.28549087   8.26764324  73.77196399]]
-------------------------------
6. h_ 5  =  [[ -8.43080033   7.69420971   6.21852576]
 [-23.28549087   8.26764324  73.77196399]]

This corresponds to

The eqn 2) gives the Hermite polynomial $He_{r+2j}(x)$.
In the code it corresponds to the call

H = eval_hermitenorm(r + 2 * j, x)

The eqn 3) computes the $c_{rj}$ and is being computed using the call

    def _compute_crj(self, r, j, rho):
        from itertools import combinations_with_replacement

        from scipy.special import comb

        partitions = np.array(
            [
                p
                for p in combinations_with_replacement(range(1, r + 1), j)
                if sum(p) == r
            ]
        )
        if partitions.size == 0:
            return 0

        term = np.prod([comb(rho + 1 + s_i, s_i + 2) for s_i in partitions.T], axis=0)
        c_rj = np.sum(term)
        c_rj *= (rho**2 + rho) ** (-r / 2 - j)
        return c_rj

Which corresponds to

Here $\rho$ corresponds to $\alpha$ in our code.

fkiraly · 2024-08-08T10:24:09Z

can we schedule a walkthrough for the Monday tech session, or whichever next is convenient? I think this needs discussion with questions and explanations.

ShreeshaM07 · 2024-08-08T15:27:02Z

can we schedule a walkthrough for the Monday tech session, or whichever next is convenient? I think this needs discussion with questions and explanations.

Yeah I think Monday tech session should work as per my current schedule.

ShreeshaM07 · 2024-08-12T17:57:34Z

So I have implemented the permutation of the partitions that we spotted. The cdf values did come down by a lot to the order 10^0. But it still has some values outside the [0,1] range. Not sure how to tackle that.

The current cdf output

CDF values:
[[ 0.21771439  0.13619774  0.40067358]
 [-1.52041722  3.87381296 -4.79373376]]

The change handles it this way

[[1 2]
 [2 1]]
1. r =  3  and j = 2

where the array is the partition array when r=3 and j=2. So sum is 3 and number of terms that add upto 3 is 2.

The code doing that is

        partitions = np.array(
            [
                p
                for p in combinations_with_replacement(range(1, r + 1), j)
                if sum(p) == r
            ]
        )

        partitions = np.vstack([np.array(list(set(permutations(sub_list)))) for sub_list in partitions])

fkiraly · 2024-08-12T19:43:51Z

hm, this seems correct, if a bit inefficient (and likely subject to combinatorial explosion for larger r) - a better solution would be a direct computation for the multiplicity factor, though I suppose we can leave optimization until after the entire code is correct.

The next oddity I am spotting are the negative values.

I am suspecting pre-factors in capital and lower case phi - can you point me to the exact python code that you have for these two?

ShreeshaM07 · 2024-08-13T04:13:15Z

I am suspecting pre-factors in capital and lower case phi - can you point me to the exact python code that you have for these two?

Which pre-factors in capital are we talking about. The lower case $\phi(x)$ is the normal pdf and is being calculated by

phi_x = norm.pdf(x)

fkiraly · 2024-08-13T08:37:42Z

right here in the cdf

there is a capital phi and a lower case phi

ShreeshaM07 · 2024-08-13T08:50:13Z

right here in the cdf there is a capital phi and a lower case phi

The capital phi represents the normal cdf but I have not used it anywhere in my code since I am implementing the equation (3.8) which is the 4th equation here in the paper for the cdf which is an approximation of equation (3.5).

fkiraly · 2024-08-13T11:35:18Z

Slightly confused - 3.8 is an equation for the pdf, not cdf. How are you using it in implementing the cdf?

fkiraly · 2024-08-13T11:44:38Z

Waitasec I am now noticing this:

equation(3.8) is the cdf which requires sum(equation(3.10)) for all positive integer r.

Equation 3.8 is the equation for pdf, not cdf! The "dot" above the $P_\lambda$ is physicist notation for derivative. Odd choice in a statistics paper, and strictly speaking wrong since in the narrow sense it refers to derivative with respect to time.

Equation 3.5 is the equation for cdf.

ShreeshaM07 · 2024-08-13T18:00:54Z

Ahhh I see thats a blunder. But all the functions created are still useful nevertheless. I shall make the change and commit it.

ShreeshaM07 · 2024-08-13T18:19:46Z

So now these are the outputs

PDF values:
[[0.17620241 0.19445143 0.19319874]
 [0.18385598 0.17024862 0.154407  ]]
PMF values:
[[0.13533528 0.27067057 0.27067057]
 [0.27067057 0.27067057 0.18044704]]
CDF values:
[[0.82052395 0.85803452 0.89180471]
 [0.90106802 0.95702399 1.0135732 ]]

for the following code

from skpro.distributions.cmp_pois_gam import CmpPoissonGamma
import numpy as np
lambda_param = 2.0
alpha_param = 1.5
beta_param = 1.0

cmp_poisson_gamma_dist = CmpPoissonGamma(lambda_=lambda_param, alpha=alpha_param, beta=beta_param)

x_values = np.array([[0.5, 1.0, 1.5], [2.0, 2.5, 3.0]])
p_values = np.array([[0.1, 0.25, 0.5], [0.75, 0.9, 0.95]])

pdf_values = cmp_poisson_gamma_dist._pdf(x_values)
pmf_values = cmp_poisson_gamma_dist._pmf(np.floor(x_values).astype(int))
cdf_values = cmp_poisson_gamma_dist._cdf(x_values)

print(f"PDF values:\n{pdf_values}")
print(f"PMF values:\n{pmf_values}")
print(f"CDF values:\n{cdf_values}")

ShreeshaM07 · 2024-08-13T18:22:56Z

The reason why the value is going a little out of bound for x=3,x=4 etc maybe due to the minor division errors that keep on aggregating while calculating . It is outputting cdf=1.0 for large x like 20,100,200 etc.

fkiraly · 2024-08-14T11:30:38Z

Nice! My gut feeling says that we have caught all logical inaccuracies, and the remainder is - as you say - probably indeed numerical issues.

Specifically, I think the large coefficients are contributive here, the naive calculation will lead to severe error propagation.

I would suggest to add tests for the individual functions in play.

The typical next steps would be to use numerical "tricks of the trade" to bring down the error, and at least ensure properties such as monotonicity for the cdf.

ShreeshaM07 · 2024-08-15T05:00:56Z

I tried plotting the cdf to see if it is monotonically increasing atleast not decreasing by a large amount due to the division errors. It works well on most inputs but there seemed to be some anomaly in the 1st case any reason why it is coming that way?

fkiraly · 2024-08-15T05:58:54Z

it seems to have problems in case 3 as well.

The oscillations in the 1st case do not look like numerical artefacts, but still might be - we should investigate.

Could you print the coefficients maybe, to see how large they get? And if you do any divisions, how large numerators/denominators are?

ShreeshaM07 · 2024-08-15T10:02:15Z

From some playing around with the values I have found that the oscillations are directly dependent only on the $\lambda$ variable. Not sure if it is the expected behaviour.

Upon modifying alpha and beta after attaining a monotonic curve the monotonicity remains same.

ShreeshaM07 · 2024-08-15T10:26:59Z

The 2 other contributors to the coefficients are $c_{rj}$ and $j!$ which are always positive so it does not cause any oscillation.
The hermite polynomials does oscillate

So it probably makes sense for our cdf to oscillate as well. But this oscillation is getting neutralised when the $\lambda$ is big as is expected from the eqn.

fkiraly · 2024-08-15T11:59:10Z

does this equation converge to something monotonic if you take enough terms?

ShreeshaM07 · 2024-08-20T18:46:11Z

@fkiraly I have followed through the references as mentioned in this <a href="https://github.com/sktime/skpro/pull/428#issuecomment-2252710366>comment and I dont think I see anything about the $g_r$ polynomial which is required for calculating the ppf of CPG.

I could only find some formula for first 4 $g_r$ polynomials in one of Withers's papers.

Could you please have a look at it again and give a more explicit form for the $g_r$ polynomial.

fkiraly · 2024-08-20T21:46:43Z

do you have a link to Fisher/Cornish 1960, or is that a physical library-only reference?

ShreeshaM07 · 2024-08-21T05:23:00Z

do you have a link to Fisher/Cornish 1960, or is that a physical library-only reference?

The screenshot of the first 4 $g_r$ polynomials I have attached above can be found here it is the 17th reference that was present in the "Withers and Nadaraja On the Compound Poisson Gamma Distribution".

Regarding the Cornish Fisher 1960 expansions it can be found here. I think it can be accessed only through an institute account.

fkiraly · 2024-08-22T14:22:22Z

that sounds like the wrong paper? Similar name, but the author is someone named Ulyanov from Moscov State U.

ShreeshaM07 · 2024-08-24T06:23:19Z

that sounds like the wrong paper? Similar name, but the author is someone named Ulyanov from Moscov State U.

Yea I wasn't able to find any original one as such.

fkiraly · 2024-08-29T08:02:34Z

Hm, feels like we are a little stuck then. Should we park this? Would be a bit of a shame.

[ENH] Tweedie Distribution

6198c5e

ShreeshaM07 marked this pull request as draft July 17, 2024 18:16

fkiraly requested changes Jul 17, 2024

View reviewed changes

compound poisson gamma

a0eb95d

manual implementation of pdf of cpg

12d9826

fkiraly assigned ShreeshaM07 Jul 24, 2024

fkiraly added enhancement module:probability&simulation probability distributions and simulators implementing algorithms Implementing algorithms, estimators, objects native to skpro labels Jul 24, 2024

implement cdf but not giving correct answers

edfa74f

ShreeshaM07 requested a review from fkiraly July 25, 2024 10:12

used numpy/scipy functions to speed up cdf

daf40d0

Permutation instead of just combination

9066d6d

eqn 3.8 instead of 3.10

893dcdf

ShreeshaM07 mentioned this pull request Aug 15, 2024

[ENH] Tweedie distribution, incl mathematics and design #429

Open

[ENH] Tweedie Distribution #428

Are you sure you want to change the base?

[ENH] Tweedie Distribution #428

Conversation

ShreeshaM07 commented Jul 17, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

PR checklist

For all contributions

For new estimators

ShreeshaM07 commented Jul 17, 2024

fkiraly left a comment

Choose a reason for hiding this comment

ShreeshaM07 commented Jul 18, 2024

fkiraly commented Jul 19, 2024

ShreeshaM07 commented Jul 20, 2024

fkiraly commented Jul 20, 2024 • edited Loading

ShreeshaM07 commented Jul 22, 2024

fkiraly commented Jul 22, 2024

fkiraly commented Jul 22, 2024 • edited Loading

ShreeshaM07 commented Jul 24, 2024 • edited Loading

ShreeshaM07 commented Jul 24, 2024

ShreeshaM07 commented Jul 25, 2024

fkiraly commented Jul 25, 2024 • edited Loading

fkiraly commented Jul 25, 2024

ShreeshaM07 commented Jul 26, 2024

cdf

ppf

fkiraly commented Jul 26, 2024

ShreeshaM07 commented Jul 30, 2024 • edited Loading

ShreeshaM07 commented Jul 30, 2024

fkiraly commented Jul 30, 2024 • edited Loading

ShreeshaM07 commented Aug 6, 2024

fkiraly commented Aug 6, 2024

manual computation for he_5

using the code

ShreeshaM07 commented Aug 7, 2024 • edited Loading

fkiraly commented Aug 8, 2024

ShreeshaM07 commented Aug 8, 2024

ShreeshaM07 commented Aug 12, 2024 • edited Loading

fkiraly commented Aug 12, 2024

ShreeshaM07 commented Aug 13, 2024

fkiraly commented Aug 13, 2024

ShreeshaM07 commented Aug 13, 2024

fkiraly commented Aug 13, 2024

fkiraly commented Aug 13, 2024 • edited Loading

ShreeshaM07 commented Aug 13, 2024

ShreeshaM07 commented Aug 13, 2024

ShreeshaM07 commented Aug 13, 2024

fkiraly commented Aug 14, 2024

ShreeshaM07 commented Aug 15, 2024

fkiraly commented Aug 15, 2024

ShreeshaM07 commented Aug 15, 2024

ShreeshaM07 commented Aug 15, 2024 • edited Loading

fkiraly commented Aug 15, 2024

ShreeshaM07 commented Aug 20, 2024

fkiraly commented Aug 20, 2024

ShreeshaM07 commented Aug 21, 2024

fkiraly commented Aug 22, 2024 • edited Loading

ShreeshaM07 commented Aug 24, 2024

fkiraly commented Aug 29, 2024

fkiraly commented Jul 20, 2024 •

edited

Loading

fkiraly commented Jul 22, 2024 •

edited

Loading

ShreeshaM07 commented Jul 24, 2024 •

edited

Loading

fkiraly commented Jul 25, 2024 •

edited

Loading

ShreeshaM07 commented Jul 30, 2024 •

edited

Loading

fkiraly commented Jul 30, 2024 •

edited

Loading

manual computation for `he_5`

ShreeshaM07 commented Aug 7, 2024 •

edited

Loading

ShreeshaM07 commented Aug 12, 2024 •

edited

Loading

fkiraly commented Aug 13, 2024 •

edited

Loading

ShreeshaM07 commented Aug 15, 2024 •

edited

Loading

fkiraly commented Aug 22, 2024 •

edited

Loading