-
Notifications
You must be signed in to change notification settings - Fork 281
Open
Description
Discussed in #944
Originally posted by AaronRoseDA May 28, 2025
Hi Miles,
I'm really struggling to deal with a particular utf-8 encoding issue. I have no problems with any other dataset so I think it might be the data frame but I can't find anything that stands out in the data:
C:\Users\arose\.conda\envs\test-env\Lib\site-packages\pysr\sr.py:2774: UserWarning: Note: it looks like you are running in Jupyter. The progress bar will be turned off.
warnings.warn(
[ Info: Started!
[ Info: Final population:
[ Info: Results saved to:
Elapsed time: 64.96 seconds
Error in callback _flush_stdio (for post_execute), with arguments args (),kwargs {}:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
File ~\.julia\packages\PythonCall\WMWY0\src\JlWrap\any.jl:262, in __call__(self, *args, **kwargs)
260 return ValueBase.__dir__(self) + self._jl_callmethod($(pyjl_methodnum(pyjlany_dir)))
261 def __call__(self, *args, **kwargs):
--> 262 return self._jl_callmethod($(pyjl_methodnum(pyjlany_call)), args, kwargs)
263 def __bool__(self):
264 return True
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 4095: unexpected end of data
I've narrowed it down to a few things but I can't put my finger on the exact reason. It could be something wrong with my data frame but I can't find anything that stands out.
I've tried pairing down the model as much as possible and here's my model setup:
# Define input (X) and target (y)
X = data[feature_names].values
# y = data["entropy"].values
y = data["entropy_kde_XY"].values
# y = data["entropy_theoretical_XY"].values
model = PySRRegressor(
model_selection="best",
run_id=f'Bi-Skew-Normal - {datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S.%f")[:-4]}',
niterations=10000000,
binary_operators=["+", "-", "*", "/"],
unary_operators=["log", "sqrt", "square","exp","cube","cbrt"
# , "sin", "cos", "tan"
],
nested_constraints = {
"log": {"log": 0},
"sqrt": {"sqrt": 0},
"square": {"square": 0},
"exp": {"exp": 0},
"cube": {"cube": 0},
"cbrt": {"cbrt": 0}
},
populations=48,
ncycles_per_iteration=5000,
# batching=True,
# weight_optimize=.005,
# turbo=True,
maxsize=26,
warm_start=True,
temp_equation_file=False,
output_directory=r"C:/Users/arose/OneDrive/Desktop/output/BivariateSkewNormal",
verbosity=1,
timeout_in_seconds=60,
complexity_of_constants=1,
# logger_spec=logger_spec
# early_stop_condition="f(loss, complexity) = (loss <= 5.526e-15) && (complexity < 6)",print_precision=24
)
# model.weight_optimize = .005
# model.maxsize = 26
# model.ncycles_per_iteration = 5000
# model.timeout_in_seconds = 120
# model.print_precision = 16
# model.model_selection = 'accuracy'
# model.unary_operators = ["log", "square"]
# model.populations = 96
st = time.time()
model.fit(X, y, variable_names=feature_names)
runtime = round(time.time() - st, 2)
print(f"Elapsed time: {runtime} seconds")
Any expertise you can provide would be immensely helpful. I'm going to take a deeper dive into my data and see if I can find anything that stands out in my data.
Metadata
Metadata
Assignees
Labels
No labels