Skip to content

Commit 2bb6e27

Browse files
Updated xgboost user guide (#2872)
* Updated xgboost user guide * Pinged the xgboost releaas to a specific version --------- Co-authored-by: Chester Chen <[email protected]>
1 parent 21fc8b9 commit 2bb6e27

File tree

1 file changed

+23
-22
lines changed

1 file changed

+23
-22
lines changed

docs/user_guide/federated_xgboost/secure_xgboost_user_guide.rst

+23-22
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@ NVFlare supports federated training with XGBoost. It provides the following adva
1313

1414
It supports federated training in the following 4 modes:
1515

16-
1. **horizontal(h)**: Row split without encryption
17-
2. **vertical(v)**: Column split without encryption
18-
3. **horizontal_secure(hs)**: Row split with HE (Requires at least 3 clients. With 2 clients, the other client's histogram can be deduced.)
19-
4. **vertical_secure(vs)**: Column split with HE
16+
1. Row split without encryption
17+
2. Column split without encryption
18+
3. Row split with HE (Requires at least 3 clients. With 2 clients, the other client's histogram can be deduced.)
19+
4. Column split with HE
2020

21-
When running with NVFlare, all the GRPC connections in XGBoost are local and the messages are forwarded to other clients through NVFlare's CellNet communication. The local GRPC ports are selected automatically by NVFlare.
21+
When running with NVFlare, all the GRPC connections in XGBoost are local and the messages are forwarded to other clients through NVFlare's CellNet communication.
22+
The local GRPC ports are selected automatically by NVFlare.
2223

2324
The encryption is handled in XGBoost by encryption plugins, which are external components that can be installed at runtime. The plugins are bundled with NVFlare.
2425

@@ -43,7 +44,7 @@ or if XGBoost 2.2 is not released yet, use
4344

4445
.. code-block:: bash
4546
46-
pip install xgboost --extra-index-url https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=federated-secure/
47+
pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
4748
4849
``TenSEAL`` package is needed for horizontal secure training,
4950

@@ -120,11 +121,11 @@ This is a snippet of the ``secure_project.yml`` file with the HEBuilder:
120121
121122
Data Preparation
122123
================
123-
Data must be properly formatted for federated XGBoost training based on split mode (horizontal or vertical).
124+
Data must be properly formatted for federated XGBoost training based on split mode (row or column).
124125

125-
For horizontal training, the datasets on all clients must share the same columns.
126+
For horizontal (row-split) training, the datasets on all clients must share the same columns.
126127

127-
For vertical training, the datasets on all clients contain different columns, but must share overlapping rows. For more details on vertical split preprocessing, refer to the :github_nvflare_link:`Vertical XGBoost Example <examples/advanced/vertical_xgboost>`.
128+
For vertical (column-split) training, the datasets on all clients contain different columns, but must share overlapping rows. For more details on vertical split preprocessing, refer to the :github_nvflare_link:`Vertical XGBoost Example <examples/advanced/vertical_xgboost>`.
128129

129130
XGBoost Plugin Configuration
130131
============================
@@ -206,8 +207,9 @@ On the server side, following controller must be configured in workflows,
206207
Even though the XGBoost training is performed on clients, the parameters are configured on the server so all clients share the same configuration.
207208
XGBoost parameters are defined here, https://xgboost.readthedocs.io/en/stable/python/python_intro.html#setting-parameters
208209

209-
- **num_rounds**: Number of training rounds
210-
- **training_mode**: Training mode, must be one of the following: horizontal, vertical, horizontal_secure, vertical_secure.
210+
- **num_rounds**: Number of training rounds.
211+
- **data_split_mode**: Same as XGBoost data_split_mode parameter, 0 for row-split, 1 for column-split.
212+
- **secure_training**:If true, XGBoost will train in secure mode using the plugin.
211213
- **xgb_params**: The training parameters defined in this dict are passed to XGBoost as **params**, the boost paramter.
212214
- **xgb_options**: This dict contains other optional parameters passed to XGBoost. Currently, only **early_stopping_rounds** is supported.
213215
- **client_ranks**: A dict that maps client name to rank.
@@ -226,15 +228,14 @@ Only one parameter is required for executor,
226228
Data Loader
227229
-----------
228230

229-
On the client side, a data loader must be configured in the components. The SecureDataLoader can be used if the data is pre-processed. For example,
231+
On the client side, a data loader must be configured in the components. The CSVDataLoader can be used if the data is pre-processed. For example,
230232

231233
.. code-block:: json
232234
233235
{
234236
"id": "dataloader",
235-
"path": "nvflare.app_opt.xgboost.histogram_based_v2.secure_data_loader.SecureDataLoader",
237+
"path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
236238
"args": {
237-
"rank": 0,
238239
"folder": "/opt/dataset/vertical_xgb_data"
239240
}
240241
}
@@ -249,7 +250,7 @@ Job Example
249250
Vertical Training
250251
-----------------
251252

252-
Here are the configuration files for a vertical secure training job. If encryption is not needed, just change the training_mode to vertical.
253+
Here are the configuration files for a vertical secure training job. If encryption is not needed, just change the secure_training to false.
253254

254255
config_fed_server.json
255256

@@ -264,7 +265,8 @@ config_fed_server.json
264265
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
265266
"args": {
266267
"num_rounds": "{num_rounds}",
267-
"training_mode": "vertical_secure",
268+
"data_split_mode": 1,
269+
"secure_training": true,
268270
"xgb_options": {
269271
"early_stopping_rounds": 2
270272
},
@@ -310,9 +312,8 @@ config_fed_client.json
310312
"components": [
311313
{
312314
"id": "dataloader",
313-
"path": "nvflare.app_opt.xgboost.histogram_based_v2.secure_data_loader.SecureDataLoader",
315+
"path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
314316
"args": {
315-
"rank": 0,
316317
"folder": "/opt/dataset/vertical_xgb_data"
317318
}
318319
}
@@ -323,7 +324,7 @@ config_fed_client.json
323324
Horizontal Training
324325
-------------------
325326

326-
The configuration for horizontal training is the same as vertical except training_mode and the data loader must point to horizontal split data.
327+
The configuration for horizontal training is the same as vertical except data_split_mode is 0 and the data loader must point to horizontal split data.
327328

328329
config_fed_server.json
329330

@@ -338,7 +339,8 @@ config_fed_server.json
338339
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
339340
"args": {
340341
"num_rounds": "{num_rounds}",
341-
"training_mode": "horizontal_secure",
342+
"data_split_mode": 0,
343+
"secure_training": true,
342344
"xgb_options": {
343345
"early_stopping_rounds": 2
344346
},
@@ -388,9 +390,8 @@ config_fed_client.json
388390
"components": [
389391
{
390392
"id": "dataloader",
391-
"path": "nvflare.app_opt.xgboost.histogram_based_v2.secure_data_loader.SecureDataLoader",
393+
"path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
392394
"args": {
393-
"rank": 0,
394395
"folder": "/data/xgboost_secure/dataset/horizontal_xgb_data"
395396
}
396397
}

0 commit comments

Comments
 (0)