-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bin edges must be unique #23
Comments
I am running into the same issue. The edges problem can be solved by instructing pandas to drop duplicates (add argument Not sure what the developers could to automatize in this case -- maybe call sklearn |
Thank you! I will give that a try. I did call fillna() on the dataframe before passing the csv to the tool; guess that wasn't enough.
…-----Original Message-----
From: Tiago Tresoldi <[email protected]>
To: minimaxir/automl-gs <[email protected]>
Cc: Griffin <[email protected]>; Author <[email protected]>
Sent: Thu, Apr 11, 2019 10:48 am
Subject: Re: [minimaxir/automl-gs] bin edges must be unique (#23)
I am running into the same issue. The edges problem can be solved by instructing pandas to drop duplicates (add argument duplicates="drop" to the pd.cut call in templates/processors/numeric), but of course it probably means that the problem is in the data itself.Not sure what the developers could to automatize in this case -- maybe call sklearn Inputer or (in my case) just fill the NAs?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Well that fixed it but now I get this error: ValueError: Error when checking input: expected input_loan_type to have shape (1,) but got array with shape (2,) when I check this attribute, I get this: train_data.loan_type.unique() array([3, 1, 2, 4], dtype=int64) Should I open a separate ticket for this? And thank you for getting me a little bit further |
I'm having the same issue. Did you solve it? |
I did not. I used the xgboost algorithm instead. That ran to completion but I didn't get the output I expected. I thought I would get 1's and 0's but got probabilities instead which wasn't acceptable to what I had to submit for my course project. Good luck! |
@avinregmi Sounds similar to my problem here: #25. |
Possible Causes and Solutions Cause: If the data you're binning contains duplicate values, and these duplicates coincide with the bin edges, it can cause this error. Cause: If your bin edges are very close to each other, floating-point precision errors might cause them to be treated as non-unique. Cause: If you're manually calculating bin edges and there's a mistake in the logic, it can result in duplicate edges. Cause: When bin edges are calculated using floating-point arithmetic, very small differences might not be distinguishable, leading to apparent duplicates. |
Hello - I am trying to use this package to provide predictions for my Data Science Capstone project. When I run against my training data, I get the following exception/error:
raceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "model.py", line 63, in
model_train(df, encoders, args, model)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 903, in model_train
X, y = process_data(df, encoders)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 758, in process_data
df['msa_md'].values, encoders['msa_md_bins'], labels=False, include_lowest=True)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 234, in cut
duplicates=duplicates)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 332, in _bins_to_cuts
"the 'duplicates' kwarg".format(bins=bins))
ValueError: Bin edges must be unique: array([ -1., -1., 18., 63., 118., 192., 247., 305., 329., 371., 408.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\Scripts\automl_gs.exe_main.py", line 9, in
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'automl_train\metadata\results.csv' does not exist
The text was updated successfully, but these errors were encountered: