StandardScaler in dataprocessing #16

BinhMinhs10 · 2019-10-30T08:58:44Z

dùng hàm StandardScaler để chuẩn hóa dữ liệu đầu vào là dataframe nhưng nó trả về numpy.array. Nếu dùng pd.Dataframe() thì mất tên cột mặc dù chuyển được từ numpy về dataframe

one easy way by using Pandas: (here I want to use mean normalization)
normalized_df=(df-df.mean())/df.std()
to use min-max normalization:
normalized_df=(df-df.min())/(df.max()-df.min())

Nhưng thời gian normal và ram tốn nhiều, vậy có cách nào hay hơn để chuẩn hóa mà đầu ra vẫn ở dạng dataframe không?

bangoc123 · 2019-11-01T01:46:33Z

Bạn ơi mình có thể giữ lại index và tên columns của dataframe. Sau khi chuẩn hoá mình convert numpy array ngược lại thành dataframe nếu cần:

X_train_data_frame = pd.DataFrame(X_train_is_numpy_array, index=X_train.index, columns=X_train.columns)

Kết quả:

BinhMinhs10 closed this as completed Oct 30, 2019

BinhMinhs10 reopened this Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StandardScaler in dataprocessing #16

StandardScaler in dataprocessing #16

BinhMinhs10 commented Oct 30, 2019 •

edited

Loading

bangoc123 commented Nov 1, 2019

StandardScaler in dataprocessing #16

StandardScaler in dataprocessing #16

Comments

BinhMinhs10 commented Oct 30, 2019 • edited Loading

bangoc123 commented Nov 1, 2019

BinhMinhs10 commented Oct 30, 2019 •

edited

Loading