Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results missing, ValueError: Input #20

Open
maciejkos opened this issue Apr 3, 2019 · 4 comments
Open

results missing, ValueError: Input #20

maciejkos opened this issue Apr 3, 2019 · 4 comments

Comments

@maciejkos
Copy link

This can be hard to figure out since I can't share the data. I am running it in Google Colab.

automl_grid_search(csv_path='/content/CLT_all_tasks_trial_level.csv', target_field='correctResp', model_name='tpu', tpu_address = tpu_address)

Solving a binary_classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
Subject: categorical
Finished: categorical
TrainingDay: categorical
Condition: categorical
CondPrev: categorical
TaskNumber: categorical
TaskId: categorical
TrialNumber: numeric
PresentationStimulus: numeric
StimTime: numeric
RespToTime: numeric
RT: numeric
SubjResp: categorical
OutcomeInt: categorical
TaskOutcomeInt: categorical
StimDim1: categorical
StimDim2: categorical
StimDim3: categorical
StimDim4: categorical
IntendedRule: categorical
Background: categorical
StimDimWord1: categorical
StimDimWord2: categorical
StimDimWord3: categorical
StimDimWord4: categorical
ExpResp: categorical
DistinctDays: categorical
out: categorical
StimType: categorical
0% 0/100 [00:00<?, ?trial/s]
0% 0/20 [00:00<?, ?epoch/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-16-ca69e1157d4e> in <module>()
      2                    target_field='correctResp',
      3                    model_name='tpu',
----> 4                    tpu_address = tpu_address)

/usr/local/lib/python3.6/dist-packages/automl_gs/automl_gs.py in automl_grid_search(csv_path, target_field, target_metric, framework, model_name, context, num_trials, split, num_epochs, col_types, gpu, tpu_address)
     92                     header=(best_result is None))
     93 
---> 94         train_results = results.tail(1).to_dict('records')[0]
     95 
     96         # If the target metric improves, save the new hps/files,

IndexError: list index out of range

Here is the log.


Apr 3, 2019, 5:31:44 PM | WARNING | ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
-- | -- | --
Apr 3, 2019, 5:31:44 PM | WARNING | raise ValueError(msg_err.format(type_err, X.dtype))
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
Apr 3, 2019, 5:31:44 PM | WARNING | allow_nan=force_all_finite == 'allow-nan')
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 573, in check_array
Apr 3, 2019, 5:31:44 PM | WARNING | y_pred = check_array(y_pred, ensure_2d=False)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py", line 1763, in log_loss
Apr 3, 2019, 5:31:44 PM | WARNING | logloss = log_loss(y_true, y_pred)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1126, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callback.on_epoch_end(epoch, logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py", line 251, in on_epoch_end
Apr 3, 2019, 5:31:44 PM | WARNING | callbacks.on_epoch_end(epoch, epoch_logs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1734, in _pipeline_fit_loop
Apr 3, 2019, 5:31:44 PM | WARNING | validation_steps=validation_steps)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1633, in _pipeline_fit
Apr 3, 2019, 5:31:44 PM | WARNING | steps_per_epoch, validation_steps, **kwargs)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py", line 1532, in fit
Apr 3, 2019, 5:31:44 PM | WARNING | batch_size=64 * 8)
Apr 3, 2019, 5:31:44 PM | WARNING | File "/content/tpu_train/pipeline.py", line 1095, in model_train
Apr 3, 2019, 5:31:44 PM | WARNING | model_train(df, encoders, args, model)
Apr 3, 2019, 5:31:44 PM | WARNING | File "model.py", line 69, in <module>
Apr 3, 2019, 5:31:44 PM | WARNING | Traceback (most recent call last):

AFAIK, the largest number in the dataset is 12007245.

Thanks for the help!

@tayiorbeii
Copy link

I've actually seen a similar problem, and I am currently preparing an example to share (watch as it works this time!)

@tayiorbeii
Copy link

Here's a gist with train and test files.

I ran the following command to target the result column:

automl_gs train.csv result --model_name broken_example --framework xgboost --num_trials 100

then with test.csv in the broken_example_xgboost folder, run this command to check:

python3 model.py -d test.csv -m predict

Output is as follows:

error_example/broken_example_xgboost_20190404_233918 » python3 model.py -d test_pdb.csv -m predict
Traceback (most recent call last):
  File "model.py", line 55, in <module>
    predictions = model_predict(df, model, encoders)
  File "error_example/broken_example_xgboost_20190404_233918/pipeline.py", line 341, in model_predict
    data_enc = process_data(df, encoders, process_target=False)
  File "error_example/broken_example_xgboost_20190404_233918/pipeline.py", line 275, in process_data
    col4_enc = encoders['col4_encoder'].transform(col4_enc)
  File "/usr/local/lib/python3.7/site-packages/sklearn/preprocessing/label.py", line 467, in transform
    sparse_output=self.sparse_output)
  File "/usr/local/lib/python3.7/site-packages/sklearn/preprocessing/label.py", line 581, in label_binarize
    y = check_array(y, accept_sparse='csr', ensure_2d=False, dtype=None)
  File "/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 573, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
    raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I've poked around at it a bit, but I don't really know what I'm doing.

I hope this helps though, and I appreciate your work on this project!

@tayiorbeii
Copy link

tayiorbeii commented Apr 5, 2019

Adding this line before each encoder transform call in pipeline.py is helping, but annoying to do manually:

whatever_enc = whatever_enc[~np.isnan(whatever_enc)]

...but different errors are good to see!

EDIT: Found some luck by adjusting the training target to "true/false" strings instead of 0/1... not sure if this is a fluke.

EDIT AGAIN: this error can happen if you have a blank entry somewhere in your data. Open your csv file in your favorite editor and search for ,, and either delete the line or fill the cell or a third thing.

@mahesh1amour
Copy link

I'm also facing error like

mahesh@mahesh-HP-EliteBook-840-G1:$ pip3 install automl_gs
Collecting automl_gs
Downloading https://files.pythonhosted.org/packages/c4/51/27833a08fe4f83711b09836ddd9128e275a6900c47e0e5782112ed611484/automl_gs-0.2.1.tar.gz
Collecting pandas (from automl_gs)
Cache entry deserialization failed, entry ignored
Using cached https://files.pythonhosted.org/packages/19/74/e50234bc82c553fecdbd566d8650801e3fe2d6d8c8d940638e3d8a7c5522/pandas-0.24.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting scikit-learn (from automl_gs)
Using cached https://files.pythonhosted.org/packages/5e/82/c0de5839d613b82bddd088599ac0bbfbbbcbd8ca470680658352d2c435bd/scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl
Collecting autopep8 (from automl_gs)
Collecting tqdm (from automl_gs)
Using cached https://files.pythonhosted.org/packages/6c/4b/c38b5144cf167c4f52288517436ccafefe9dc01b8d1c190e18a6b154cd4a/tqdm-4.31.1-py2.py3-none-any.whl
Collecting jinja2>=2.8 (from automl_gs)
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/1d/e7/fd8b501e7a6dfe492a433deb7b9d833d39ca74916fa8bc63dd1a4947a671/Jinja2-2.10.1-py2.py3-none-any.whl (124kB)
100% |████████████████████████████████| 133kB 436kB/s
Collecting pyyaml (from automl_gs)
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz (274kB)
100% |████████████████████████████████| 276kB 4.6MB/s
Collecting numpy>=1.12.0 (from pandas->automl_gs)
Cache entry deserialization failed, entry ignored
Using cached https://files.pythonhosted.org/packages/35/d5/4f8410ac303e690144f0a0603c4b8fd3b986feb2749c435f7cdbb288f17e/numpy-1.16.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting python-dateutil>=2.5.0 (from pandas->automl_gs)
Cache entry deserialization failed, entry ignored
Using cached https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl
Collecting pytz>=2011k (from pandas->automl_gs)
Cache entry deserialization failed, entry ignored
Using cached https://files.pythonhosted.org/packages/61/28/1d3920e4d1d50b19bc5d24398a7cd85cc7b9a75a490570d5a30c57622d34/pytz-2018.9-py2.py3-none-any.whl
Collecting scipy>=0.13.3 (from scikit-learn->automl_gs)
Using cached https://files.pythonhosted.org/packages/7f/5f/c48860704092933bf1c4c1574a8de1ffd16bf4fde8bab190d747598844b2/scipy-1.2.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting pycodestyle>=2.4.0 (from autopep8->automl_gs)
Using cached https://files.pythonhosted.org/packages/0e/0c/04a353e104d2f324f8ee5f4b32012618c1c86dd79e52a433b64fceed511b/pycodestyle-2.5.0-py2.py3-none-any.whl
Collecting MarkupSafe>=0.23 (from jinja2>=2.8->automl_gs)
Cache entry deserialization failed, entry ignored
Using cached https://files.pythonhosted.org/packages/b2/5f/23e0023be6bb885d00ffbefad2942bc51a620328ee910f64abe5a8d18dd1/MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting six>=1.5 (from python-dateutil>=2.5.0->pandas->automl_gs)
Cache entry deserialization failed, entry ignored
Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Building wheels for collected packages: automl-gs, pyyaml
Running setup.py bdist_wheel for automl-gs ... done
Stored in directory: /home/mahesh/.cache/pip/wheels/2d/5e/f0/deffbd0fcc4afd6b065366dded7b9c8cd1b7c02f6b39c24552
Running setup.py bdist_wheel for pyyaml ... done
Stored in directory: /home/mahesh/.cache/pip/wheels/ad/56/bc/1522f864feb2a358ea6f1a92b4798d69ac783a28e80567a18b
Successfully built automl-gs pyyaml
Installing collected packages: numpy, six, python-dateutil, pytz, pandas, scipy, scikit-learn, pycodestyle, autopep8, tqdm, MarkupSafe, jinja2, pyyaml, automl-gs
Successfully installed MarkupSafe-1.1.1 automl-gs-0.2.1 autopep8-1.4.3 jinja2-2.10.1 numpy-1.16.2 pandas-0.24.2 pycodestyle-2.5.0 python-dateutil-2.8.0 pytz-2018.9 pyyaml-5.1 scikit-learn-0.20.3 scipy-1.2.1 six-1.12.0 tqdm-4.31.1
mahesh@mahesh-HP-EliteBook-840-G1:
$ from automl_gs import automl_grid_search
from: can't read /var/mail/automl_gs
mahesh@mahesh-HP-EliteBook-840-G1:~$ python3

automl_grid_search('/home/mahesh/Documents/Projects/DataX/Venv_programs/Data/Housing.csv','price')
Solving a regression problem, minimizing mse using tensorflow.

Modeling with field specifications:
area: numeric
bedrooms: numeric
bathrooms: categorical
stories: categorical
mainroad: categorical
guestroom: categorical
basement: categorical
hotwaterheating: categorical
airconditioning: categorical
parking: categorical
prefarea: categorical
furnishingstatus: categorical
0%| | 0/100 [00:00<?, ?trial/sTraceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "model.py", line 49, in
build_encoders(df)
File "/home/mahesh/automl_train/pipeline.py", line 209, in build_encoders
mainroad_encoder.fit(mainroad_tf)
File "/home/mahesh/.local/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 413, in fit
self.classes_ = unique_labels(y)
File "/home/mahesh/.local/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 96, in unique_labels
raise ValueError("Unknown label type: %s" % repr(ys))
ValueError: Unknown label type: (array([nan, nan, 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', nan,
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'no',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'no',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'no',
'no', 'no', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'no',
'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'no', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes',
'yes', 'yes', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes',
'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'no', 'yes', 'no', 'yes',
'no', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'no', 'yes', 'yes',
'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'no', 'no', 'yes',
'no', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes',
'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes',
'no', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'no', 'yes',
'yes', 'yes', 'no', 'no', 'no', 'no', 'no', 'yes', 'no', 'yes',
'yes', 'yes', 'yes', 'no', 'no', 'no', 'yes', 'yes', 'yes', 'yes',
'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes',
'no', 'yes', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'yes',
'no', 'yes', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'yes',
'yes', 'no', 'no', 'yes', 'no', 'no', 'no', 'yes', 'yes', 'yes',
'no', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'no', 'yes'],
dtype=object),)
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "", line 1, in
File "/home/mahesh/.local/lib/python3.6/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "/home/mahesh/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/mahesh/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/mahesh/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/home/mahesh/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/mahesh/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants