diff --git a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png index 07f8a9d..157e00d 100644 Binary files a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png and b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png differ diff --git a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png index 9b180c1..18a7932 100644 Binary files a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png and b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png index dd69ee5..648427c 100644 Binary files a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png and b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/data.html b/docs/data.html index e7eba0e..bdbb5c6 100644 --- a/docs/data.html +++ b/docs/data.html @@ -146,48 +146,10 @@

2.1 Catch weight and nutrional co

The FishBase database provides length-to-length and length-to-weight relationships for over 5,000 fish species. Typically, there are multiple records for the parameters a and b for each species. Since the length measurements in Peskas’ first version pertained to FL, we initially standardized all length measurements to TL using the FishBase length-to-length conversion tables. Subsequently, we applied the TL-to-weight conversion tables to estimate the weights.

The FishBase length-to-weight conversion tables offer species-level taxonomic resolution. To derive a singular length-to-weight relationship for each fish group, we calculated the median values of parameters a and b for all species within a particular fish group. To ensure relevance to the region of interest, we refined the species list using FAO country codes (https://www.fao.org/countryprofiles/iso3list/en/) pertinent to Timor-Leste and Indonesia (country codes 626 and 360, respectively). For instance, to ascertain the weight of a catch categorized under the fish group labeled ECN (representing the Echeneidae family), we first identified the species within ECN documented in Timor-Leste and Indonesia. After this, we computed the average values of the parameters a and b for the identified species, which in this case were Echeneis naucrates and Remora remora (as illustrated in the figure below).

To address the scarcity of measured nutrient values for fish, which are typically limited to a few species and countries. To overcome this data limitation, MacNeil et al. developed a Bayesian hierarchical model that leverages both phylogenetic information and trait-based information to predict concentrations of seven essential nutrients: calcium, iron, omega-3 fatty acids, protein, selenium, vitamin A, and zinc for both marine and inland fish species globally (see Hicks et al. 2019). For each catch, the nutritional yield was calculated by combining the validated weight estimates for each fish group with the modelled nutrient concentrations. Specifically, we used the highest posterior predictive density values for each of the seven nutrients, which can be found in the repository (https://github.com/mamacneil/NutrientFishbase). For non-fish groups—including octopuses, squids, cockles, shrimps, crabs, and lobsters—nutritional yield information was not available in the NutrientFishbase repository models. We retrieved the necessary data for these groups from the Global food composition database, using the same methodological approach as for the fish groups to estimate their nutritional content. To represent the nutrient concentration associated with each fish group, we used the median value as a summarizing metric.

-
## 
-ℹ Downloading rfish-table__20231111005820_fe395e3__.rds
-
-✔ Saved rfish-table__20231111005820_fe395e3__.rds to rfish-table__20231111005820_fe395e3__.rds  ( 159.2 K…
-## Rows: 515 Columns: 13── Column specification ──────────────────────────────────────────────────────────────────────────────────
-## Delimiter: ","
-## chr (4): integragency_code, food_name, habitat, food_state
-## dbl (9): food_id, ISSCAAP, protein(g), calcium(mg), iron(mg), zinc(mg), selenium(mcg), vitaminA(mcg), ...
-## ℹ Use `spec()` to retrieve the full column specification for this data.
-## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
-
## Warning: There were 17 warnings in `dplyr::mutate()`.
-## The first warning was:
-## ℹ In argument: `ic = se * qt((1 - 0.05)/2 + 0.5, n - 1)`.
-## ℹ In group 5: `interagency_code = "BWH"`.
-## Caused by warning in `qt()`:
-## ! NaNs produced
-## ℹ Run `dplyr::last_dplyr_warnings()` to see the 16 remaining warnings.
-
## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-## `geom_line()`: Each group consists of only one observation.
-## ℹ Do you need to adjust the group aesthetic?
-
## Warning: Removed 12 rows containing missing values (`geom_segment()`).
-
## Warning: Removed 12 rows containing missing values (`geom_segment()`).
-## Removed 12 rows containing missing values (`geom_segment()`).
-## Removed 12 rows containing missing values (`geom_segment()`).
-## Removed 12 rows containing missing values (`geom_segment()`).
-## Removed 12 rows containing missing values (`geom_segment()`).
-## Removed 12 rows containing missing values (`geom_segment()`).
-
-Distribution of nutrients' concentration for each fish group. Dots represent the median, fig.height=4, fig.width=10, message=FALSE, warning=FALSE, bars represent the 95% confidence interval. +
+Distribution of nutrients' concentration for each fish group. Dots represent the median, bars represent the 95% confidence interval.

-Figure 2.1: Distribution of nutrients’ concentration for each fish group. Dots represent the median, fig.height=4, fig.width=10, message=FALSE, warning=FALSE, bars represent the 95% confidence interval. +Figure 2.1: Distribution of nutrients’ concentration for each fish group. Dots represent the median, bars represent the 95% confidence interval.

diff --git a/docs/highlight.html b/docs/highlight.html index 430b1c4..f0bbaa8 100644 --- a/docs/highlight.html +++ b/docs/highlight.html @@ -181,8 +181,8 @@

3.1 Timor-Est SSF nutritional sce -
- +
+ diff --git a/docs/profiles.html b/docs/profiles.html index 5fffc5a..bba6a59 100644 --- a/docs/profiles.html +++ b/docs/profiles.html @@ -145,21 +145,21 @@

5.2 Results5.2.1 Clusters

The scatter plot from the k-means clustering (Figure 5.1) showed the distribution of nutrient profiles across different clusters. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutrient profiles among the fishing trips. The clear separation of clusters in this plot suggests that the fishing trips could be effectively categorized based on their nutrient content. The bar chart (Figure 5.2) displaying nutrient adequacy across clusters indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The segmentation of bars into different nutrients (calcium, iron, omega-3, protein, vitamin A, zinc) across clusters showed variation in nutritional fulfillment. This suggests that different fishing strategies, represented by different clusters, result in catches with varying nutritional values.

-Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, fig.width=8, message=FALSE, warning=FALSE, providing a visual guide to the cluster density and separation. +Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, providing a visual guide to the cluster density and separation.

-Figure 5.1: Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, fig.width=8, message=FALSE, warning=FALSE, providing a visual guide to the cluster density and separation. +Figure 5.1: Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, providing a visual guide to the cluster density and separation.

-Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), fig.width=6, message=FALSE, warning=FALSE, iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters. +Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters.

-Figure 5.2: Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), fig.width=6, message=FALSE, warning=FALSE, iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters. +Figure 5.2: Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters.

5.2.2 XGBoost model

-

The model’s predictive capacity was quantitatively assessed via receiver operating characteristic (ROC) analysis across five distinct clusters. The ROC (see ML model interpretation) curves illustrate a differential capacity of the model to classify each cluster based on the nutritional profiles derived from various fishing strategies. Cluster 2 and 5 demonstrated superior model performance, indicated by a curve proximate to the top-left, suggesting high sensitivity and specificity. Clusters 1 and 4 showed marginally lower but comparable discrimination ability. Cluster 3 indicated a slight decrease in sensitivity and exhibited the model’s lowest performance, with a curve markedly farther from the ideal top-left position. Collectively, an aggregate AUC of 0.86 signifies a strong overall ability of the model to differentiate between the clusters, albeit with varying degrees of precision. These findings underscore the model’s effectiveness in predicting nutritional outcomes based on fishing strategies, with implications for tailoring nutrient-sensitive fisheries management interventions.

+

The model’s predictive capacity was quantitatively assessed via receiver operating characteristic (ROC) analysis across five distinct clusters. The ROC curves (see ML model interpretation) illustrate a differential capacity of the model to classify each cluster based on the nutritional profiles derived from various fishing strategies. Cluster 2 and 5 demonstrated superior model performance, indicated by a curve proximate to the top-left, suggesting high sensitivity and specificity. Clusters 1 and 4 showed marginally lower but comparable discrimination ability. Cluster 3 indicated a slight decrease in sensitivity and exhibited the model’s lowest performance, with a curve markedly farther from the ideal top-left position. Collectively, an aggregate AUC of 0.86 signifies a strong overall ability of the model to differentiate between the clusters, albeit with varying degrees of precision. These findings underscore the model’s effectiveness in predicting nutritional outcomes based on fishing strategies, with implications for tailoring nutrient-sensitive fisheries management interventions.

Receiver Operating Characteristic (ROC) Curves with Data Points for Cluster-Based Classification. The curves delineate the sensitivity versus 1-specificity for the five clusters derived from the XGBoost classification model. Each cluster is represented by a distinct color with data points marked, which illustrates the true positive rate against the false positive rate for each respective cluster. The closeness of each curve to the top-left corner indicates the model’s classification efficacy per cluster, with Cluster 1 and 2 showing the highest performance. The overall model demonstrates substantial predictive accuracy with a composite AUC value of 0.86.

@@ -189,6 +189,7 @@

5.3 Checks and limitations
  • The distribution of both habitat types and gear types in our data is uneven. Observations in deep water and reef environments are more common compared to other habitats, and similarly, the use of gill nets is more frequent than other types of fishing gear. We need to evaluate whether this imbalance could lead to biases or issues in our model.

  • Are we considering all the possible potential good predictors?

  • +
  • These color are confusing sometimes, consider change the colo palette

  • diff --git a/docs/search_index.json b/docs/search_index.json index d35b43e..fceb5dd 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "Modelling scenarios for nutrient-sensitive fisheries management 1 Content", " Modelling scenarios for nutrient-sensitive fisheries management Lore 2023-11-12 1 Content This book contains analyses and reports in ‘Modelling scenarios for nutrient-sensitive fisheries management’ "],["data.html", "2 Data 2.1 Catch weight and nutrional content 2.2 Checks and limitations", " 2 Data The research presented in this book relies on two primary sources of data: Recorded Catch (RC): This dataset comprises detailed records of fishing trips that were documented by data collectors in the coastal municipalities of East Timor starting from January 2018. Estimated Catch (EC): This dataset provides a broader view of catch data on a regional level. It is created by combining RC with additional information, including the frequency of fishing trips made by each fishing boat and the total number of boats surveyed (censused) in each municipality. This combination extrapolates the recorded catch data to a larger scale. 2.1 Catch weight and nutrional content The total estimated catch weight is determined by the number of individuals and the length range of each catch. Specifically, during the initial phase of the Peskas project (July 2017 - April 2019), the standard length measurement used was the fork length (FL), which later changed to the total length (TL) in the subsequent and current version of the project. We utilized the API service offered by the FishBase database to incorporate length-to-length and length-to-weight conversion tables, using information from survey landings to calculate the weight in grams based on the following formula: W = a × L^b Here, W represents the weight in grams, L is the total length (TL) in centimeters, and a and b are the conversion parameters obtained from FishBase for each fish species. The FishBase database provides length-to-length and length-to-weight relationships for over 5,000 fish species. Typically, there are multiple records for the parameters a and b for each species. Since the length measurements in Peskas’ first version pertained to FL, we initially standardized all length measurements to TL using the FishBase length-to-length conversion tables. Subsequently, we applied the TL-to-weight conversion tables to estimate the weights. The FishBase length-to-weight conversion tables offer species-level taxonomic resolution. To derive a singular length-to-weight relationship for each fish group, we calculated the median values of parameters a and b for all species within a particular fish group. To ensure relevance to the region of interest, we refined the species list using FAO country codes (https://www.fao.org/countryprofiles/iso3list/en/) pertinent to Timor-Leste and Indonesia (country codes 626 and 360, respectively). For instance, to ascertain the weight of a catch categorized under the fish group labeled ECN (representing the Echeneidae family), we first identified the species within ECN documented in Timor-Leste and Indonesia. After this, we computed the average values of the parameters a and b for the identified species, which in this case were Echeneis naucrates and Remora remora (as illustrated in the figure below). To address the scarcity of measured nutrient values for fish, which are typically limited to a few species and countries. To overcome this data limitation, MacNeil et al. developed a Bayesian hierarchical model that leverages both phylogenetic information and trait-based information to predict concentrations of seven essential nutrients: calcium, iron, omega-3 fatty acids, protein, selenium, vitamin A, and zinc for both marine and inland fish species globally (see Hicks et al. 2019). For each catch, the nutritional yield was calculated by combining the validated weight estimates for each fish group with the modelled nutrient concentrations. Specifically, we used the highest posterior predictive density values for each of the seven nutrients, which can be found in the repository (https://github.com/mamacneil/NutrientFishbase). For non-fish groups—including octopuses, squids, cockles, shrimps, crabs, and lobsters—nutritional yield information was not available in the NutrientFishbase repository models. We retrieved the necessary data for these groups from the Global food composition database, using the same methodological approach as for the fish groups to estimate their nutritional content. To represent the nutrient concentration associated with each fish group, we used the median value as a summarizing metric. ## ℹ Downloading rfish-table__20231111005820_fe395e3__.rds ✔ Saved rfish-table__20231111005820_fe395e3__.rds to rfish-table__20231111005820_fe395e3__.rds ( 159.2 K… ## Rows: 515 Columns: 13── Column specification ────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (4): integragency_code, food_name, habitat, food_state ## dbl (9): food_id, ISSCAAP, protein(g), calcium(mg), iron(mg), zinc(mg), selenium(mcg), vitaminA(mcg), ... ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ## Warning: There were 17 warnings in `dplyr::mutate()`. ## The first warning was: ## ℹ In argument: `ic = se * qt((1 - 0.05)/2 + 0.5, n - 1)`. ## ℹ In group 5: `interagency_code = "BWH"`. ## Caused by warning in `qt()`: ## ! NaNs produced ## ℹ Run `dplyr::last_dplyr_warnings()` to see the 16 remaining warnings. ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## `geom_line()`: Each group consists of only one observation. ## ℹ Do you need to adjust the group aesthetic? ## Warning: Removed 12 rows containing missing values (`geom_segment()`). ## Warning: Removed 12 rows containing missing values (`geom_segment()`). ## Removed 12 rows containing missing values (`geom_segment()`). ## Removed 12 rows containing missing values (`geom_segment()`). ## Removed 12 rows containing missing values (`geom_segment()`). ## Removed 12 rows containing missing values (`geom_segment()`). ## Removed 12 rows containing missing values (`geom_segment()`). Figure 2.1: Distribution of nutrients’ concentration for each fish group. Dots represent the median, fig.height=4, fig.width=10, message=FALSE, warning=FALSE, bars represent the 95% confidence interval. 2.2 Checks and limitations Check groups with higher dispersion… Dow we need to narrow species grouping? "],["highlight.html", "3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario", " 3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario The table uses the EC dataset and summarizes the main statistics on nutrient supply for each region. Below is a description of each table’ column: MUNICIPALITY (POPULATION): Municipality and number of people > 5 years old in 2022. COASTLINE EXTENSION: Municipality coastline extension in Km. NUTRIENT: Nutrient of reference ANNUAL SUPPLY: Aggregated annual value in kg. These values represent municipal-level estimates based on the number of fishing boats recorded in the 2021 Timor-Leste boat census, average number of fishing trips per boat and average landing weight values for each fish group. ANNUAL SUPPLY PER KM: It describes the annual supply of each nutrient standardized on the coastline length, that is: \\(\\frac{Annual\\ supply\\ (kg)}{Coastline\\ extension\\ (km)}\\) N. PEOPLE SUPPLIED DAILY: It describes the number of people meeting the nutrient’ RNI for each municipality. RNI values used are the following: Selenium Zinc Protein Total -3 PUFA Calcium Iron Vitamin-A 0.000026 0.0049 46 2.939 1 0.0294 0.0005 The 20% of RNIs values was take as reference in consideration of the fact that an ‘adequate diet’ is expected to comprise 5 food group. RNIs were then converted from grams to kg (dividing by 1000) and the requirements was calculated as: \\(\\frac{Anuual\\ supply\\ (kg)}{(RNI\\times 0.20) \\ / 1000} /365\\) POPULATION MEETING RNI REQUIREMENTS: Percentage of the population meeting the RNI requirements in each municipality: \\(\\frac{Number\\ of\\ people\\ supplied\\ daily}{Municipality\\ population} \\times 100\\) "],["distribution.html", "4 Nutrients distribution 4.1 Fish groups 4.2 Habitat and gear type", " 4 Nutrients distribution This section presents the analyses that illustrates the distribution of nutrients within various components of small-scale fisheries in East Timor. 4.1 Fish groups Figure 4.1: Fish groups’ nutrional contribution to RNI. 4.2 Habitat and gear type Figure 4.2: Sankey diagram showing the relative distribution of key nutrients across various marine habitats and the corresponding extraction by different fishing gear types used in Timor-Est small-scale fisheries. "],["profiles.html", "5 Timor SSF nutrient profiles 5.1 Methods 5.2 Results 5.3 Checks and limitations 5.4 Next steps", " 5 Timor SSF nutrient profiles 5.1 Methods In this section, we identified recurrent nutritional profiles based on RC data. We aimed to determine the most appropriate number of distinct groups, or “clusters,” present in our dataset. To achieve this, we used the total within sum of square (WSS) to identify the point at which grouping additional data points together does not significantly improve the clarity of the clustering. Once we established the optimal number of clusters, we applied the K-means clustering method. This is a widely-used technique that organizes data into clusters based on similarity. In our case, we grouped fishing trips together if they showed similar levels of nutrient concentrations. By doing this, we were able to observe patterns and categorize the trips according to their nutritional profiles. To investigate the predictability of nutritional profiles based on fishing strategies, we employed a machine learning model using the XGBoost algorithm. This algorithm is also known for its ability to prevent overfitting, a critical factor for ensuring the reliability of our predictive model. Additionally, XGBoost’s feature importance tool allowed us to identify the most influential predictors on the nutritional profiles. Our methodology began with the preparation of the dataset, which was crucial for the effective application of the machine learning model. We transformed the dataset to combine habitat and gear type information and selected relevant predictor variables, including quarter, habitat, gear type, and vessel type. The model parameters (number of trees, tree depth, and learning rate) were dynamically tuned during the training phase to optimize model performance. The dataset was split into two parts: 80% for training the model and 20% for testing. This allocation ensured a comprehensive dataset for training the model while allowing for effective validation. To further enhance the model’s accuracy and generalizability, we applied cross-validation using the training set. In the final stage, we fitted the XGBoost model to the training data and evaluated its performance using metrics such as accuracy, ROC AUC, sensitivity, and specificity. These metrics provided insights into the model’s discriminatory ability between different nutritional profile outcomes. The calculation of ROC curves and AUC values offered additional evaluation of the model’s effectiveness. 5.2 Results 5.2.1 Clusters The scatter plot from the k-means clustering (Figure 5.1) showed the distribution of nutrient profiles across different clusters. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutrient profiles among the fishing trips. The clear separation of clusters in this plot suggests that the fishing trips could be effectively categorized based on their nutrient content. The bar chart (Figure 5.2) displaying nutrient adequacy across clusters indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The segmentation of bars into different nutrients (calcium, iron, omega-3, protein, vitamin A, zinc) across clusters showed variation in nutritional fulfillment. This suggests that different fishing strategies, represented by different clusters, result in catches with varying nutritional values. Figure 5.1: Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, fig.width=8, message=FALSE, warning=FALSE, providing a visual guide to the cluster density and separation. Figure 5.2: Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), fig.width=6, message=FALSE, warning=FALSE, iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters. 5.2.2 XGBoost model The model’s predictive capacity was quantitatively assessed via receiver operating characteristic (ROC) analysis across five distinct clusters. The ROC (see ML model interpretation) curves illustrate a differential capacity of the model to classify each cluster based on the nutritional profiles derived from various fishing strategies. Cluster 2 and 5 demonstrated superior model performance, indicated by a curve proximate to the top-left, suggesting high sensitivity and specificity. Clusters 1 and 4 showed marginally lower but comparable discrimination ability. Cluster 3 indicated a slight decrease in sensitivity and exhibited the model’s lowest performance, with a curve markedly farther from the ideal top-left position. Collectively, an aggregate AUC of 0.86 signifies a strong overall ability of the model to differentiate between the clusters, albeit with varying degrees of precision. These findings underscore the model’s effectiveness in predicting nutritional outcomes based on fishing strategies, with implications for tailoring nutrient-sensitive fisheries management interventions. Figure 5.3: Receiver Operating Characteristic (ROC) Curves with Data Points for Cluster-Based Classification. The curves delineate the sensitivity versus 1-specificity for the five clusters derived from the XGBoost classification model. Each cluster is represented by a distinct color with data points marked, which illustrates the true positive rate against the false positive rate for each respective cluster. The closeness of each curve to the top-left corner indicates the model’s classification efficacy per cluster, with Cluster 1 and 2 showing the highest performance. The overall model demonstrates substantial predictive accuracy with a composite AUC value of 0.86. metric estimator estimate roc_auc hand_till 0.86 5.3 Checks and limitations The distribution of both habitat types and gear types in our data is uneven. Observations in deep water and reef environments are more common compared to other habitats, and similarly, the use of gill nets is more frequent than other types of fishing gear. We need to evaluate whether this imbalance could lead to biases or issues in our model. Are we considering all the possible potential good predictors? 5.4 Next steps Explore the model: Quantify the importance of each predictor on the model outcome Assess the direction of the effect of each predictor, that is analyze which features have the most impact on driving predictions towards each cluster. SHAP Values are a good way to address that. "],["references.html", "References", " References "],["notes.html", "6 Notes 6.1 ML model interpretation 6.2 ML model explanation", " 6 Notes 6.1 ML model interpretation ROC Curve: The curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold settings. The true positive rate is on the y-axis, and the false positive rate is on the x-axis. Performance: A perfect classifier would have a point in the upper left corner of the graph, where the true positive rate is 1 (or 100%) and the false positive rate is 0. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. Diagonal Line: The dotted diagonal line represents a no-skill classifier (e.g., random guessing). A good classifier stays as far away from this line as possible (toward the upper left corner). Area Under the Curve (AUC): The area under each ROC curve (AUC) is a measure of the test’s accuracy. An AUC of 0.5 suggests no discrimination (no better than random chance), while an AUC of 1.0 suggests perfect discrimination. 6.2 ML model explanation SHAP values: help in understanding how each predictor in the dataset contributed to each particular prediction. A high positive SHAP value for a feature increases the probability of a certain prediction, while a high negative SHAP value decreases it. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "Modelling scenarios for nutrient-sensitive fisheries management 1 Content", " Modelling scenarios for nutrient-sensitive fisheries management Lore 2023-11-12 1 Content This book contains analyses and reports in ‘Modelling scenarios for nutrient-sensitive fisheries management’ "],["data.html", "2 Data 2.1 Catch weight and nutrional content 2.2 Checks and limitations", " 2 Data The research presented in this book relies on two primary sources of data: Recorded Catch (RC): This dataset comprises detailed records of fishing trips that were documented by data collectors in the coastal municipalities of East Timor starting from January 2018. Estimated Catch (EC): This dataset provides a broader view of catch data on a regional level. It is created by combining RC with additional information, including the frequency of fishing trips made by each fishing boat and the total number of boats surveyed (censused) in each municipality. This combination extrapolates the recorded catch data to a larger scale. 2.1 Catch weight and nutrional content The total estimated catch weight is determined by the number of individuals and the length range of each catch. Specifically, during the initial phase of the Peskas project (July 2017 - April 2019), the standard length measurement used was the fork length (FL), which later changed to the total length (TL) in the subsequent and current version of the project. We utilized the API service offered by the FishBase database to incorporate length-to-length and length-to-weight conversion tables, using information from survey landings to calculate the weight in grams based on the following formula: W = a × L^b Here, W represents the weight in grams, L is the total length (TL) in centimeters, and a and b are the conversion parameters obtained from FishBase for each fish species. The FishBase database provides length-to-length and length-to-weight relationships for over 5,000 fish species. Typically, there are multiple records for the parameters a and b for each species. Since the length measurements in Peskas’ first version pertained to FL, we initially standardized all length measurements to TL using the FishBase length-to-length conversion tables. Subsequently, we applied the TL-to-weight conversion tables to estimate the weights. The FishBase length-to-weight conversion tables offer species-level taxonomic resolution. To derive a singular length-to-weight relationship for each fish group, we calculated the median values of parameters a and b for all species within a particular fish group. To ensure relevance to the region of interest, we refined the species list using FAO country codes (https://www.fao.org/countryprofiles/iso3list/en/) pertinent to Timor-Leste and Indonesia (country codes 626 and 360, respectively). For instance, to ascertain the weight of a catch categorized under the fish group labeled ECN (representing the Echeneidae family), we first identified the species within ECN documented in Timor-Leste and Indonesia. After this, we computed the average values of the parameters a and b for the identified species, which in this case were Echeneis naucrates and Remora remora (as illustrated in the figure below). To address the scarcity of measured nutrient values for fish, which are typically limited to a few species and countries. To overcome this data limitation, MacNeil et al. developed a Bayesian hierarchical model that leverages both phylogenetic information and trait-based information to predict concentrations of seven essential nutrients: calcium, iron, omega-3 fatty acids, protein, selenium, vitamin A, and zinc for both marine and inland fish species globally (see Hicks et al. 2019). For each catch, the nutritional yield was calculated by combining the validated weight estimates for each fish group with the modelled nutrient concentrations. Specifically, we used the highest posterior predictive density values for each of the seven nutrients, which can be found in the repository (https://github.com/mamacneil/NutrientFishbase). For non-fish groups—including octopuses, squids, cockles, shrimps, crabs, and lobsters—nutritional yield information was not available in the NutrientFishbase repository models. We retrieved the necessary data for these groups from the Global food composition database, using the same methodological approach as for the fish groups to estimate their nutritional content. To represent the nutrient concentration associated with each fish group, we used the median value as a summarizing metric. Figure 2.1: Distribution of nutrients’ concentration for each fish group. Dots represent the median, bars represent the 95% confidence interval. 2.2 Checks and limitations Check groups with higher dispersion… Dow we need to narrow species grouping? "],["highlight.html", "3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario", " 3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario The table uses the EC dataset and summarizes the main statistics on nutrient supply for each region. Below is a description of each table’ column: MUNICIPALITY (POPULATION): Municipality and number of people > 5 years old in 2022. COASTLINE EXTENSION: Municipality coastline extension in Km. NUTRIENT: Nutrient of reference ANNUAL SUPPLY: Aggregated annual value in kg. These values represent municipal-level estimates based on the number of fishing boats recorded in the 2021 Timor-Leste boat census, average number of fishing trips per boat and average landing weight values for each fish group. ANNUAL SUPPLY PER KM: It describes the annual supply of each nutrient standardized on the coastline length, that is: \\(\\frac{Annual\\ supply\\ (kg)}{Coastline\\ extension\\ (km)}\\) N. PEOPLE SUPPLIED DAILY: It describes the number of people meeting the nutrient’ RNI for each municipality. RNI values used are the following: Selenium Zinc Protein Total -3 PUFA Calcium Iron Vitamin-A 0.000026 0.0049 46 2.939 1 0.0294 0.0005 The 20% of RNIs values was take as reference in consideration of the fact that an ‘adequate diet’ is expected to comprise 5 food group. RNIs were then converted from grams to kg (dividing by 1000) and the requirements was calculated as: \\(\\frac{Anuual\\ supply\\ (kg)}{(RNI\\times 0.20) \\ / 1000} /365\\) POPULATION MEETING RNI REQUIREMENTS: Percentage of the population meeting the RNI requirements in each municipality: \\(\\frac{Number\\ of\\ people\\ supplied\\ daily}{Municipality\\ population} \\times 100\\) "],["distribution.html", "4 Nutrients distribution 4.1 Fish groups 4.2 Habitat and gear type", " 4 Nutrients distribution This section presents the analyses that illustrates the distribution of nutrients within various components of small-scale fisheries in East Timor. 4.1 Fish groups Figure 4.1: Fish groups’ nutrional contribution to RNI. 4.2 Habitat and gear type Figure 4.2: Sankey diagram showing the relative distribution of key nutrients across various marine habitats and the corresponding extraction by different fishing gear types used in Timor-Est small-scale fisheries. "],["profiles.html", "5 Timor SSF nutrient profiles 5.1 Methods 5.2 Results 5.3 Checks and limitations 5.4 Next steps", " 5 Timor SSF nutrient profiles 5.1 Methods In this section, we identified recurrent nutritional profiles based on RC data. We aimed to determine the most appropriate number of distinct groups, or “clusters,” present in our dataset. To achieve this, we used the total within sum of square (WSS) to identify the point at which grouping additional data points together does not significantly improve the clarity of the clustering. Once we established the optimal number of clusters, we applied the K-means clustering method. This is a widely-used technique that organizes data into clusters based on similarity. In our case, we grouped fishing trips together if they showed similar levels of nutrient concentrations. By doing this, we were able to observe patterns and categorize the trips according to their nutritional profiles. To investigate the predictability of nutritional profiles based on fishing strategies, we employed a machine learning model using the XGBoost algorithm. This algorithm is also known for its ability to prevent overfitting, a critical factor for ensuring the reliability of our predictive model. Additionally, XGBoost’s feature importance tool allowed us to identify the most influential predictors on the nutritional profiles. Our methodology began with the preparation of the dataset, which was crucial for the effective application of the machine learning model. We transformed the dataset to combine habitat and gear type information and selected relevant predictor variables, including quarter, habitat, gear type, and vessel type. The model parameters (number of trees, tree depth, and learning rate) were dynamically tuned during the training phase to optimize model performance. The dataset was split into two parts: 80% for training the model and 20% for testing. This allocation ensured a comprehensive dataset for training the model while allowing for effective validation. To further enhance the model’s accuracy and generalizability, we applied cross-validation using the training set. In the final stage, we fitted the XGBoost model to the training data and evaluated its performance using metrics such as accuracy, ROC AUC, sensitivity, and specificity. These metrics provided insights into the model’s discriminatory ability between different nutritional profile outcomes. The calculation of ROC curves and AUC values offered additional evaluation of the model’s effectiveness. 5.2 Results 5.2.1 Clusters The scatter plot from the k-means clustering (Figure 5.1) showed the distribution of nutrient profiles across different clusters. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutrient profiles among the fishing trips. The clear separation of clusters in this plot suggests that the fishing trips could be effectively categorized based on their nutrient content. The bar chart (Figure 5.2) displaying nutrient adequacy across clusters indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The segmentation of bars into different nutrients (calcium, iron, omega-3, protein, vitamin A, zinc) across clusters showed variation in nutritional fulfillment. This suggests that different fishing strategies, represented by different clusters, result in catches with varying nutritional values. Figure 5.1: Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, providing a visual guide to the cluster density and separation. Figure 5.2: Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters. 5.2.2 XGBoost model The model’s predictive capacity was quantitatively assessed via receiver operating characteristic (ROC) analysis across five distinct clusters. The ROC curves (see ML model interpretation) illustrate a differential capacity of the model to classify each cluster based on the nutritional profiles derived from various fishing strategies. Cluster 2 and 5 demonstrated superior model performance, indicated by a curve proximate to the top-left, suggesting high sensitivity and specificity. Clusters 1 and 4 showed marginally lower but comparable discrimination ability. Cluster 3 indicated a slight decrease in sensitivity and exhibited the model’s lowest performance, with a curve markedly farther from the ideal top-left position. Collectively, an aggregate AUC of 0.86 signifies a strong overall ability of the model to differentiate between the clusters, albeit with varying degrees of precision. These findings underscore the model’s effectiveness in predicting nutritional outcomes based on fishing strategies, with implications for tailoring nutrient-sensitive fisheries management interventions. Figure 5.3: Receiver Operating Characteristic (ROC) Curves with Data Points for Cluster-Based Classification. The curves delineate the sensitivity versus 1-specificity for the five clusters derived from the XGBoost classification model. Each cluster is represented by a distinct color with data points marked, which illustrates the true positive rate against the false positive rate for each respective cluster. The closeness of each curve to the top-left corner indicates the model’s classification efficacy per cluster, with Cluster 1 and 2 showing the highest performance. The overall model demonstrates substantial predictive accuracy with a composite AUC value of 0.86. metric estimator estimate roc_auc hand_till 0.86 5.3 Checks and limitations The distribution of both habitat types and gear types in our data is uneven. Observations in deep water and reef environments are more common compared to other habitats, and similarly, the use of gill nets is more frequent than other types of fishing gear. We need to evaluate whether this imbalance could lead to biases or issues in our model. Are we considering all the possible potential good predictors? These color are confusing sometimes, consider change the colo palette 5.4 Next steps Explore the model: Quantify the importance of each predictor on the model outcome Assess the direction of the effect of each predictor, that is analyze which features have the most impact on driving predictions towards each cluster. SHAP Values are a good way to address that. "],["references.html", "References", " References "],["notes.html", "6 Notes 6.1 ML model interpretation 6.2 ML model explanation", " 6 Notes 6.1 ML model interpretation ROC Curve: The curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold settings. The true positive rate is on the y-axis, and the false positive rate is on the x-axis. Performance: A perfect classifier would have a point in the upper left corner of the graph, where the true positive rate is 1 (or 100%) and the false positive rate is 0. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. Diagonal Line: The dotted diagonal line represents a no-skill classifier (e.g., random guessing). A good classifier stays as far away from this line as possible (toward the upper left corner). Area Under the Curve (AUC): The area under each ROC curve (AUC) is a measure of the test’s accuracy. An AUC of 0.5 suggests no discrimination (no better than random chance), while an AUC of 1.0 suggests perfect discrimination. 6.2 ML model explanation SHAP values: help in understanding how each predictor in the dataset contributed to each particular prediction. A high positive SHAP value for a feature increases the probability of a certain prediction, while a high negative SHAP value decreases it. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs_book/01-data.Rmd b/docs_book/01-data.Rmd index 1c0a7b4..f080742 100644 --- a/docs_book/01-data.Rmd +++ b/docs_book/01-data.Rmd @@ -20,7 +20,7 @@ The FishBase length-to-weight conversion tables offer species-level taxonomic re To address the scarcity of measured nutrient values for fish, which are typically limited to a few species and countries. To overcome this data limitation, MacNeil et al. developed a Bayesian hierarchical model that leverages both phylogenetic information and trait-based information to predict concentrations of seven essential nutrients: calcium, iron, omega-3 fatty acids, protein, selenium, vitamin A, and zinc for both marine and inland fish species globally (see Hicks et al. 2019). For each catch, the nutritional yield was calculated by combining the validated weight estimates for each fish group with the modelled nutrient concentrations. Specifically, we used the highest posterior predictive density values for each of the seven nutrients, which can be found in the repository (). For non-fish groups---including octopuses, squids, cockles, shrimps, crabs, and lobsters---nutritional yield information was not available in the NutrientFishbase repository models. We retrieved the necessary data for these groups from the [Global food composition database](https://www.fao.org/documents/card/en/c/I8542EN/), using the same methodological approach as for the fish groups to estimate their nutritional content. To represent the nutrient concentration associated with each fish group, we used the median value as a summarizing metric. -```{r nutdispersion, echo=FALSE, fig.align='center', fig.asp=.75, fig.cap="Distribution of nutrients' concentration for each fish group. Dots represent the median, fig.height=4, fig.width=10, message=FALSE, warning=FALSE, bars represent the 95% confidence interval.", out.width='80%'} +```{r nutdispersion, echo=FALSE, fig.cap="Distribution of nutrients' concentration for each fish group. Dots represent the median, bars represent the 95% confidence interval.", fig.height=4, fig.width=10, message=FALSE, warning=FALSE} setwd("../..") pars <- read_config() diff --git a/docs_book/04-profiles.Rmd b/docs_book/04-profiles.Rmd index dd742eb..dc73ccd 100644 --- a/docs_book/04-profiles.Rmd +++ b/docs_book/04-profiles.Rmd @@ -14,7 +14,7 @@ Our methodology began with the preparation of the dataset, which was crucial for The scatter plot from the k-means clustering (Figure 5.1) showed the distribution of nutrient profiles across different clusters. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutrient profiles among the fishing trips. The clear separation of clusters in this plot suggests that the fishing trips could be effectively categorized based on their nutrient content. The bar chart (Figure 5.2) displaying nutrient adequacy across clusters indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The segmentation of bars into different nutrients (calcium, iron, omega-3, protein, vitamin A, zinc) across clusters showed variation in nutritional fulfillment. This suggests that different fishing strategies, represented by different clusters, result in catches with varying nutritional values. -```{r echo=FALSE, fig.cap="Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, fig.width=8, message=FALSE, warning=FALSE, providing a visual guide to the cluster density and separation.", fig.height=5} +```{r echo=FALSE, fig.cap="Cluster analysis of nutrient profiles using k-means clustering. The scatter plot visualizes the distribution of data points in a two-dimensional space defined by the first two principal components which explain 39% and 26% of the variance. The convex hulls represent the boundaries of each cluster, providing a visual guide to the cluster density and separation.", fig.height=5, fig.width=8, message=FALSE, warning=FALSE} library(ggplot2) df <- @@ -56,7 +56,7 @@ factoextra::fviz_cluster(k2, theme(legend.position = "bottom") ``` -```{r echo=FALSE, fig.cap="Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), fig.width=6, message=FALSE, warning=FALSE, iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters.", fig.height=5} +```{r echo=FALSE, fig.cap="Distribution of nutrient adequacy across k-means clusters. The bar chart represents the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for each nutrient within different clusters. Each bar is segmented into six categories corresponding to the nutrients analyzed: calcium (dark purple), iron (blue), omega-3 (green), protein (teal), vitamin A (dark teal), and zinc (yellow). Clusters are labeled on the y-axis, indicating distinct groupings based on nutrient profile similarities derived from the cluster analysis. The x-axis quantifies the number of individuals who meet the RNI, highlighting the variation in nutritional fulfillment across clusters.", fig.height=5, fig.width=6, message=FALSE, warning=FALSE} clusterdf <- dplyr::tibble( clusters = as.character(k2$cluster), @@ -87,7 +87,7 @@ clusterdf %>% ### XGBoost model -The model's predictive capacity was quantitatively assessed via receiver operating characteristic (ROC) analysis across five distinct clusters. The ROC (see [ML model interpretation][notes]) curves illustrate a differential capacity of the model to classify each cluster based on the nutritional profiles derived from various fishing strategies. Cluster 2 and 5 demonstrated superior model performance, indicated by a curve proximate to the top-left, suggesting high sensitivity and specificity. Clusters 1 and 4 showed marginally lower but comparable discrimination ability. Cluster 3 indicated a slight decrease in sensitivity and exhibited the model's lowest performance, with a curve markedly farther from the ideal top-left position. Collectively, an aggregate AUC of 0.86 signifies a strong overall ability of the model to differentiate between the clusters, albeit with varying degrees of precision. These findings underscore the model's effectiveness in predicting nutritional outcomes based on fishing strategies, with implications for tailoring nutrient-sensitive fisheries management interventions. +The model's predictive capacity was quantitatively assessed via receiver operating characteristic (ROC) analysis across five distinct clusters. The ROC curves (see [ML model interpretation][notes]) illustrate a differential capacity of the model to classify each cluster based on the nutritional profiles derived from various fishing strategies. Cluster 2 and 5 demonstrated superior model performance, indicated by a curve proximate to the top-left, suggesting high sensitivity and specificity. Clusters 1 and 4 showed marginally lower but comparable discrimination ability. Cluster 3 indicated a slight decrease in sensitivity and exhibited the model's lowest performance, with a curve markedly farther from the ideal top-left position. Collectively, an aggregate AUC of 0.86 signifies a strong overall ability of the model to differentiate between the clusters, albeit with varying degrees of precision. These findings underscore the model's effectiveness in predicting nutritional outcomes based on fishing strategies, with implications for tailoring nutrient-sensitive fisheries management interventions. ```{r model-settings, echo=FALSE, fig.cap="Receiver Operating Characteristic (ROC) Curves with Data Points for Cluster-Based Classification. The curves delineate the sensitivity versus 1-specificity for the five clusters derived from the XGBoost classification model. Each cluster is represented by a distinct color with data points marked, which illustrates the true positive rate against the false positive rate for each respective cluster. The closeness of each curve to the top-left corner indicates the model’s classification efficacy per cluster, with Cluster 1 and 2 showing the highest performance. The overall model demonstrates substantial predictive accuracy with a composite AUC value of 0.86.", fig.height=5, fig.width=6, message=FALSE, warning=FALSE} df_field <- @@ -262,6 +262,8 @@ p1 <- sv_dependence(sha, "habitat") - Are we considering all the possible potential good predictors? +- These color are confusing sometimes, consider change the colo palette + ## Next steps Explore the model: diff --git a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png index 07f8a9d..157e00d 100644 Binary files a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png and b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/nutdispersion-1.png differ diff --git a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png index 9b180c1..18a7932 100644 Binary files a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png and b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png index dd69ee5..648427c 100644 Binary files a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png and b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-5-1.png differ