You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run StratifiedCrossValidator instead of CrossValidator with my pipeline, I get the following error, which I suspect relates to the newer version of PySpark and/or NumPy since spark_stratifier installs pyspark-2.3.2 and numpy==1.15.1 as part of its installation.
Any plans for upgrading the package?
---------------------------------------------------------------------------AnalysisExceptionTraceback (mostrecentcalllast)
<ipython-input-4-e237f44298bb>in<module>237238-->239cvModel=crossval.fit(train)
240predictions=cvModel.transform(test)
~/PycharmProjects/DataSchool/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/ml/base.pyinfit(self, dataset, params)
127returnself.copy(params)._fit(dataset)
128else:
-->129returnself._fit(dataset)
130else:
131raiseValueError("Params must be either a param map or a list/tuple of param maps, "~/PycharmProjects/DataSchool/DS_Pandas_tut/venv/lib/python3.7/site-packages/spark_stratifier/stratifier.pyin_fit(self, dataset)
45metrics= [0.0] *numModels46--->47stratified_data=self.stratify_data(dataset)
4849foriinrange(nFolds):
~/PycharmProjects/DataSchool/DS_Pandas_tut/venv/lib/python3.7/site-packages/spark_stratifier/stratifier.pyinstratify_data(self, dataset)
26split_ratio=1.0/nFolds27--->28passes=dataset[dataset['label'] ==1]
29fails=dataset[dataset['label'] ==0]
30~/PycharmProjects/DataSchool/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/dataframe.pyin__getitem__(self, item)
1378 """
1379 if isinstance(item, basestring):
-> 1380 jc = self._jdf.apply(item)
1381 return Column(jc)
1382 elif isinstance(item, Column):
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
135 # Hide where the exception came from that shows a non-Pythonic
136 # JVM exception message.
--> 137 raise_from(converted)
138 else:
139 raise
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/utils.py in raise_from(e)
AnalysisException: Cannot resolve column name "label" among (type, amount, oldbalanceOrg, newbalanceOrig, isFraud, sample_weight_per_class);
The text was updated successfully, but these errors were encountered:
browshanravan
changed the title
Compatibility with PySpark 3.0.0
Compatibility with PySpark 3.0.0 and NumPy
Aug 31, 2020
When I run
StratifiedCrossValidator
instead ofCrossValidator
with my pipeline, I get the following error, which I suspect relates to the newer version of PySpark and/or NumPy sincespark_stratifier
installspyspark-2.3.2
andnumpy==1.15.1
as part of its installation.Any plans for upgrading the package?
The text was updated successfully, but these errors were encountered: