Skip to content

Bug fix - to handle "u: list of array-like, shape (n_samples, n_co… #605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

YaadR
Copy link

@YaadR YaadR commented Feb 23, 2025

Bug fix - to handle "u: list of array-like, shape (n_samples, n_control_features)" input - I'll elaborate more in the PR comment section

Generally there is an option to give the model.fit() ' list of array-like, shape' as documented. when this is done for X_train data requires all the associated data to be in a sequence (list) form as well, e.g. [t_train], [x_dot] and [u_train] - the problem that arises with u_train is that there is a reshape part in the code that does it poorly as well as a section that called for X_train.shape - an numpy.array() feature that doesn't exist in python 'list' type. both fixes allows the code to run correct and smoothly, and does not effect other library features.

In the code x_train_1 and x_trian_2 are of different lengths, to demonstrate the use of python 'list' specifically and not numpy.array() which constrained to 'symmetric' matrix shape only

The code that reproduces the problem

#!/usr/bin/env python3
import numpy as np # numpy==1.26.4
import pysindy as ps # pysindy==1.7.5

def main():
    # 1. Create sample data to mimic your shape conditions.
    #    Two trajectories: (2, 101) and (2, 100).
    t_train_1 = np.linspace(0, 1, 101)
    t_train_2 = np.linspace(0, 1, 100)

    x_train_1 = np.vstack([
        np.sin(2*np.pi*t_train_1),
        np.cos(2*np.pi*t_train_1)
    ])  # shape (2, 101)

    x_train_2 = np.vstack([
        np.sin(2*np.pi*t_train_2),
        np.cos(2*np.pi*t_train_2)
    ])  # shape (2, 100)

    # Create x_dot data for x_train_1
    x_dot_1 = np.zeros_like(x_train_1)
    for i in range(x_train_1.shape[0]):
        sfd = ps.SmoothedFiniteDifference(smoother_kws={'window_length': 25})
        x_dot_1[i, :] = sfd._differentiate(x_train_1[i, :], t_train_1[1] - t_train_1[0])

    x_dot_2 = np.zeros_like(x_train_2)
    for i in range(x_train_2.shape[0]):
        x_dot_2[i, :] = sfd._differentiate(x_train_2[i, :], t_train_2[1] - t_train_2[0])

    x_dot = [x_dot_1.T, x_dot_2.T]


    # Optional: Control input (u_train) for each trajectory
    #           shape matches time dimension
    u_train_1 = np.zeros_like(t_train_1)
    u_train_2 = np.zeros_like(t_train_2)

    # Combine into lists to represent multiple trajectories
    x_train = [x_train_1, x_train_2]
    u_train = [u_train_1, u_train_2]

    # Simple time step (dt) taken from the first trajectory
    dt = t_train_1[1] - t_train_1[0]

    # Example feature library (you can choose any)
    feature_library = ps.PolynomialLibrary(degree=2)

    # For demonstration, define a single optimizer:
    from pysindy.optimizers import STLSQ
    selected_optimizers = {
        "STLSQ_example": {
            "class": STLSQ,
            "params": {
                "alpha": 0.1,
                "threshold": 0.1,
                "fit_intercept": True
            }
        }
    }

    # Check if x_train is a list => multiple trajectories
    xu_list = isinstance(x_train, list)

    def run_selected_optimizers(selected_opts):
        if not selected_opts:
            print("Please select at least one optimizer.")
            return

        models_scores = {}
        models_errors = {}

        # Example function to compute "prediction error" (stub)
        # pred_state, state_data shapes must match in time dimension.
        def compute_prediction_error(pred_state, state_data):
            # Just a demo for RMS error
            state_data = state_data[:, :pred_state.shape[1]]
            return [
                np.sqrt(np.mean((pred - true) ** 2))
                for (pred, true) in zip(pred_state, state_data)
            ]

        # 3. Loop over each optimizer
        for name, opt_data in selected_opts.items():
            optimizer_class = opt_data["class"]
            optimizer_params = opt_data["params"]

            # 4. Initialize and fit the SINDy model
            model = ps.SINDy(
                optimizer=optimizer_class(**optimizer_params),
                feature_library=feature_library
            )
            model.fit(
                x=x_train,
                t=dt,
                x_dot=x_dot,                # Not providing pre-computed derivatives
                u=u_train,                 # Control inputs
                multiple_trajectories=xu_list,
            )

            # 5. Print model to console
            print(f"\n===== Trained Model: {name} =====")
            model.print()

    # 6. Finally, run the optimizers
    run_selected_optimizers(selected_optimizers)


if __name__ == "__main__":
    main()

The problem:
Screenshot from 2025-02-23 17-28-29

…ol_features)" input - I'll elaborate more in the PR comment section
@giopapanas
Copy link

Thank you @YaadR , for your fix here. I raised this issue: #611, do you think it relates to your bug fix? In brief, when I do a toy experiment and run model.fit() with X of 1D, then the model.fit runs fine. However, as I explain in the discussion in the link above, the model gives me an error when I load a multi-dimensional X.

Btw, do you know if I need to input [x_dot] and [u_train] data myself? I think PySINDy is by default loading [x_dot] and [u_train], if you specify the differentiation method and the library to use? Thank you in advance.

@YaadR
Copy link
Author

YaadR commented Mar 25, 2025

Hi @giopapanas , to my understanding #611 is not sourced from the same bug.

@Jacob-Stevens-Haas
Copy link
Member

Jacob-Stevens-Haas commented Apr 4, 2025

Hey, thanks for your PR @YaadR - sorry for the delay. I've verified that this still exists on master branch.

Let's talk about your code. I've shrunk it down to a alternative MWE (minimal, working example):

import numpy as np 
import pysindy as ps 

x1 = np.arange(10).reshape((-1, 1))
x2 = np.arange(11).reshape((-1, 1))
x = [x1, x2]
u1 = np.arange(10)
u2 = np.arange(11)
u = [np.arange(10), np.arange(11)]


model = ps.SINDy()
# No error
model.fit(x=x[0], t=1.0, u=u[0])
# No error
model.fit(x=[x1, x1], t=1.0, u=[u1, u1])
# Error
model.fit(x=x, t=1.0, u=u)

This is the form we prefer to receive examples in, as the process of reducing the example is likely to show you the problem. Here it is obvious: u arrays are allowed to be flat when passed as a single trajectory, or when every trajectory is the same length, but not when using multiple trajectories of different lengths.

But you've noticed the docstring: u: list of array-like, shape (n_samples, n_control_features). So I see the problem differently: by accepting the first two calls when they don't obey the API, it promotes the expectation that the third form would work.

If you're still interested in the PR, and for that I'd be grateful, you're welcome to find a way to fix what I described as the bug. I'd recommend starting by writing a test.

(BTW: Code formatting in github allows you to specify the language in order to get syntax highlighting, by typing "```python". I've added that to your comment.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants