Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pzmm.MLFlowModel.read_mlflow_model_file() failed with JSONDecodeError: Extra data #179

Open
pulungw opened this issue Sep 21, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@pulungw
Copy link

pulungw commented Sep 21, 2023

Describe the issue
Trying to read mlflow model using pzmm.MLFlowModel.read_mlflow_model_file result in JSONDecodeError. I'm just using a simple example from here:
https://medium.com/@rehabreda/registering-mlflow-models-to-sas-model-manager-using-sasctl-a-comprehensive-guide-a47dbf183338

To Reproduce
The rest of the training code can be found on the above link. The code that perform the read mlflow model file is shown below:

## define randomforest model 
model = RandomForestClassifier(n_estimators=300).fit(x_train, y_train)

##Model signature defines schema of model input and output
signature = infer_signature(x_train, model.predict(x_train))

## log model score to mlflow
score = model.score(x_test, y_test)
print("Score: %s" % score)
mlflow.log_metric("score", score)

### log model 
mlflow.sklearn.log_model(model, "model", signature=signature)
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)

mlPath = Path(f'./mlruns/1/{mlflow.active_run().info.run_uuid}/artifacts/model')

## get info aboud model variables ,input and output
varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file(mlPath)

Expected behavior
Getting the dictionary successfully from pzmm.MLFlowModel.read_mlflow_model_file().

Stack Trace
If you're experiencing an exception, include the full stack trace and error message.

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Cell In[4], line 4
      1 mlPath = Path(f'./mlruns/1/{mlflow.active_run().info.run_uuid}/artifacts/model')
      3 ## get info aboud model variables ,input and output
----> 4 varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file(mlPath)

File ~\AppData\Local\miniconda3\envs\ml\Lib\site-packages\sasctl\pzmm\mlflow_model.py:56, in MLFlowModel.read_mlflow_model_file(cls, m_path)
     53     outputs = m_lines[ind_out[0] : -1]
     55     inputs_dict = json.loads("".join([s.strip() for s in inputs])[9:-1])
---> 56     outputs_dict = json.loads("".join([s.strip() for s in outputs])[10:-1])
     57 else:
     58     raise ValueError(
     59         "Improper or unset signature values for model. No input or output "
     60         "dicts could be generated. "
     61     )

File ~\AppData\Local\miniconda3\envs\ml\Lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File ~\AppData\Local\miniconda3\envs\ml\Lib\json\decoder.py:340, in JSONDecoder.decode(self, s, _w)
    338 end = _w(s, end).end()
    339 if end != len(s):
--> 340     raise JSONDecodeError("Extra data", s, end)
    341 return obj

JSONDecodeError: Extra data: line 1 column 73 (char 72)

Version
1.10.0

@pulungw pulungw added the bug Something isn't working label Sep 21, 2023
@pulungw
Copy link
Author

pulungw commented Sep 21, 2023

By the way, I'm using mlflow 2.7.1 on Windows 11 machine.

@pulungw
Copy link
Author

pulungw commented Sep 22, 2023

I think I found the root cause.

The MLmodel file has an extra line params in the end like below. Since the code is parsing outputs until the end of line, this params is giving theJSONDecodeError: Extra data error. If I remove the params from the MLmodel. I could read the file just fine.

  outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1]}}]'
  params: null

This seems to be a new specification from MLflow 2.6.0 when they add the "Inference params support". This would affect all MLmodel created since MLflow 2.6.0 release.
mlflow/mlflow#9068

I believe this is the problematic line of code in sasctl, it assumes no other field after outputs and reads the whole line.

outputs = m_lines[ind_out[0] : -1]

Perhaps a better solution is to parse the MLmodel file natively in YAML? Since it is apparently in YAML format. That way you can keep forward compatibility if MLflow decides to add another field.
https://mlflow.org/docs/latest/models.html#id28

I'll stick with MLflow 2.5.0 for now, it seems to be working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant