diff --git a/.DS_Store b/.DS_Store index dacedf3..1e50435 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/data/model_outputs.rda b/data/model_outputs.rda index d98a7a0..673bc58 100644 Binary files a/data/model_outputs.rda and b/data/model_outputs.rda differ diff --git a/data/palettes.rda b/data/palettes.rda index ca5e5f1..504b0f0 100644 Binary files a/data/palettes.rda and b/data/palettes.rda differ diff --git a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png index 917ace7..445b750 100644 Binary files a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png and b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png differ diff --git a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png index 02b85f0..c3696e8 100644 Binary files a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png and b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png index 03bc91b..a534338 100644 Binary files a/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png and b/docs/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/highlight.html b/docs/highlight.html index 5783e0a..cf61a69 100644 --- a/docs/highlight.html +++ b/docs/highlight.html @@ -188,8 +188,8 @@

3.1 Timor-Est SSF nutritional sce -
- +
+ diff --git a/docs/index.html b/docs/index.html index 4dea337..903364f 100644 --- a/docs/index.html +++ b/docs/index.html @@ -143,7 +143,7 @@

1 Content

diff --git a/docs/profiles.html b/docs/profiles.html index 3f3588c..a00ab63 100644 --- a/docs/profiles.html +++ b/docs/profiles.html @@ -173,8 +173,7 @@

5.2.1 Clusters Distribution of nutrient adequacy across k-means clusters. The bar chart delineates the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch within identified k-means clusters. Each bar is categorized into six segments corresponding to the evaluated nutrients. The clusters are enumerated on the y-axis, message=FALSE, warning=FALSE, each representing a group with a distinct nutritional profile as determined by the cluster analysis. The x-axis and the white labels in the bars quantify the count of individuals within each cluster that meet the RNI for the respective nutrients, underlining the variability in nutritional adequacy across clusters. Panels (A) through (D) compare these distributions across different fishing practices and locations, namely Atauro and the Mainland, using all gear types or exclusively gill nets.

@@ -190,8 +189,8 @@

5.2.1 Clusters

- +
+

Table 5.1: Results of PERMANOVA analysis assessing the homogeneity of nutritional profiles within fishing trip clusters. The analysis was conducted across four datasets: Atauro with all gears (atauro_AG), Atauro with gill nets (atauro_GN), Mainland with all gears (mainland_AG), and Mainland with gill nets (mainland_GN). For each dataset, the term ‘clusters’ represents the within-group sum of squares (SUMOFSQS), which measures the variance within the nutritional profiles, while ‘Residual’ represents the variance between nutritional profiles Degrees of Freedom (DF), R-squared values (R2), and associated statistics indicate the strength and significance of the clustering. The R2 value quantifies the proportion of variance explained by the clusters.

@@ -206,8 +205,8 @@

5.2.2 XGBoost model performance

-
- +
+

Table 5.2: Performance Metrics for XGBoost Model Across Fishing Data Subsets. This table provides a comprehensive overview of the predictive performance of an XGBoost classification model for four distinct subsets of fishing data: Atauro with all gears (ATAURO AG), Atauro with gill nets (ATAURO GN), Mainland with all gears (MAINLAND AG), and Mainland with gill nets (MAINLAND GN). Key performance indicators include ROC-AUC (area under the receiver operating characteristic curve), accuracy, Kappa (kap), sensitivity (sens), specificity (spec), positive predictive value (ppv), negative predictive value (npv), Matthew’s correlation coefficient (mcc), Youden’s J index (j_index), balanced accuracy (bal_accuracy), detection prevalence, precision, recall, and F measure (f_meas). The metrics collectively reflect the model’s ability to discriminate between nutritional profiles, its overall accuracy, and the balance between the sensitivity and specificity for each subset.


diff --git a/docs/search_index.json b/docs/search_index.json index 22d65b7..3de7a39 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "Modelling scenarios for nutrient-sensitive fisheries management 1 Content", " Modelling scenarios for nutrient-sensitive fisheries management Lorenzo Longobardi Last update: 2024-05-23 1 Content This book contains analyses and reports of the paper ‘Modelling scenarios for nutrient-sensitive fisheries management’. All data and code to generate the analyses are in organised in https://github.com/WorldFishCenter/timor.nutrients. "],["data.html", "2 Data 2.1 Catch weight and nutrional content 2.2 Checks and limitations", " 2 Data The research presented in this book relies on two primary sources of data: Recorded Catch (RC): This dataset comprises detailed records of fishing trips that were documented by data collectors in the coastal municipalities of East Timor starting from January 2018. Estimated Catch (EC): This dataset provides a broader view of catch data on a regional level. It is created by combining RC with additional information, including the frequency of fishing trips made by each fishing boat and the total number of boats surveyed (censused) in each municipality. This combination extrapolates the recorded catch data to a larger scale. 2.1 Catch weight and nutrional content The total estimated catch weight is determined by the number of individuals and the length range of each catch. Specifically, during the initial phase of the Peskas project (July 2017 - April 2019), the standard length measurement used was the fork length (FL), which later changed to the total length (TL) in the subsequent and current version of the project. We utilized the API service offered by the FishBase database to incorporate length-to-length and length-to-weight conversion tables, using information from survey landings to calculate the weight in grams based on the following formula: W = a × L^b Here, W represents the weight in grams, L is the total length (TL) in centimeters, and a and b are the conversion parameters obtained from FishBase for each fish species. The FishBase database provides length-to-length and length-to-weight relationships for over 5,000 fish species. Typically, there are multiple records for the parameters a and b for each species. Since the length measurements in Peskas’ first version pertained to FL, we initially standardized all length measurements to TL using the FishBase length-to-length conversion tables. Subsequently, we applied the TL-to-weight conversion tables to estimate the weights. The FishBase length-to-weight conversion tables offer species-level taxonomic resolution. To derive a singular length-to-weight relationship for each fish group, we calculated the median values of parameters a and b for all species within a particular fish group. To ensure relevance to the region of interest, we refined the species list using FAO country codes (https://www.fao.org/countryprofiles/iso3list/en/) pertinent to Timor-Leste and Indonesia (country codes 626 and 360, respectively). For instance, to ascertain the weight of a catch categorized under the fish group labeled ECN (representing the Echeneidae family), we first identified the species within ECN documented in Timor-Leste and Indonesia. After this, we computed the average values of the parameters a and b for the identified species, which in this case were Echeneis naucrates and Remora remora (as illustrated in the figure below). To address the scarcity of measured nutrient values for fish, which are typically limited to a few species and countries. To overcome this data limitation, MacNeil et al. developed a Bayesian hierarchical model that leverages both phylogenetic information and trait-based information to predict concentrations of seven essential nutrients: calcium, iron, omega-3 fatty acids, protein, selenium, vitamin A, and zinc for both marine and inland fish species globally (see Hicks et al. 2019). For each catch, the nutritional yield was calculated by combining the validated weight estimates for each fish group with the modelled nutrient concentrations. Specifically, we used the highest posterior predictive density values for each of the seven nutrients, which can be found in the repository (https://github.com/mamacneil/NutrientFishbase). For non-fish groups—including octopuses, squids, cockles, shrimps, crabs, and lobsters—nutritional yield information was not available in the NutrientFishbase repository models. We retrieved the necessary data for these groups from the Global food composition database, using the same methodological approach as for the fish groups to estimate their nutritional content. To represent the nutrient concentration associated with each fish group, we used the median value as a summarizing metric. Figure 2.1: Distribution of nutrients’ concentration for each fish group. Dots represent the median, bars represent the 95% confidence interval. 2.2 Checks and limitations Check groups with higher dispersion… Dow we need to narrow species grouping? "],["highlight.html", "3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario", " 3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario The table uses the EC dataset and summarizes the main statistics on nutrient supply for each region related to WRA, the number of woman of reproductive age (15-49 years old). Below is a description of each table’ column: MUNICIPALITY (POPULATION): Municipality and WRA number in 2022. NUTRIENT: Nutrient of reference ANNUAL SUPPLY: Aggregated annual value in kg. These values represent municipal-level estimates based on the number of fishing boats recorded in the 2021 Timor-Leste boat census, average number of fishing trips per boat and average landing weight values for each fish group. N. PEOPLE SUPPLIED DAILY: It describes the number of people meeting the nutrient’ RNI for each municipality. RNI values used are the following: Selenium Zinc Protein Total -3 PUFA Calcium Iron Vitamin-A 0.000026 0.0049 46 2.939 1 0.0294 0.0005 The 20% of RNIs values was take as reference in consideration of the fact that an ‘adequate diet’ is expected to comprise 5 food group. RNIs were then converted from grams to kg (dividing by 1000) and the requirements was calculated as: \\(\\frac{Anuual\\ supply\\ (kg)}{(RNI\\times 0.20) \\ / 1000} /365\\) POPULATION MEETING RNI REQUIREMENTS: Percentage of the WRA population meeting the RNI requirements in each municipality: \\(\\frac{Number\\ of\\ people\\ supplied\\ daily}{Municipality\\ population} \\times 100\\) "],["distribution.html", "4 Nutrients distribution 4.1 Fish groups 4.2 Habitat and gear type 4.3 Nutritional contribution and economic profiling", " 4 Nutrients distribution This section presents the analyses that illustrates the distribution of nutrients within various components of small-scale fisheries in East Timor. 4.1 Fish groups Figure 4.1: The bar chart illustrates the contribution of a variety of marine food sources to the Recommended Nutrient Intake (RNI) for six fundamental nutrients, based on a 100g portion. Each bar is a color-segmented stacked visual, with distinct hues corresponding to individual nutrients, and white numbers within indicating the specific percentage contribution of each nutrient. The chart incorporates the mean annual catch in metric tons for each marine species from 2018 to 2023, presented at the end of each bar, providing a view of both the nutritional value and the harvest volume of these essential food sources. The transparency of these values is adjusted to reflect each species’ relative contribution to the mean annual catch 4.2 Habitat and gear type Figure 4.2: Sankey diagram showing the relative distribution of key nutrients across various marine habitats and the corresponding extraction by different fishing gear types used in Timor-Est small-scale fisheries. 4.3 Nutritional contribution and economic profiling Figure 4.3: Nutritional and economic profiling of key fish groups within the Timor-Leste fishery.Panel A, Distribution of nutritional content among different functional fish groups: Small pelagics, Large pelagics, Small demersals, Large demersals, Sharks and rays and Other groups, that includes shrimps, molluscs, cephalopods and crustaceans. The plot shows the ranked contribution of each functional fish to the supply of calcium, omega-3, iron, protein, vitamin A, and zinc during the period 2018-2023. Panel B, Comparative analysis of nutritional score versus economic accessibility for key fish groups. This scatter plot displays the relationship between the cumulative nutritional score and the market price for various fish groups within Timor-Leste fishery. The x-axis quantifies the cumulative contribution to the Recommended Nutrient Intake (RNI) for six essential nutrients (zinc, protein, omega-3, calcium, iron, vitamin A) from a 100g portion of each fish group. The y-axis represents the average market price per kilogram for each group. Dot size and the accompanying numerical labels reflect the relative catch percentage of each group, serving as an index of accessibility and availability. Panel C, The bar chart illustrates the contribution of each habitat to the Recommended Nutrient Intake (RNI) for six fundamental nutrients, based on a 100g portion. Each bar is a color-segmented stacked visual, with distinct hues corresponding to individual nutrients, and white numbers within indicating the specific percentage contribution of each nutrient. "],["profiles.html", "5 Timor SSF nutritional profiles 5.1 Methods 5.2 Results 5.3 Preliminary considerations", " 5 Timor SSF nutritional profiles 5.1 Methods In this section, we identified recurrent nutritional profiles based on RC data, then, we predicted and explained the nutritional profiles on the basis of the fishing strategy and environmental factors. 5.1.1 Data analysis design and subset division As a first step we addressed the inherent imbalance in the RC data, a critical aspect for ensuring accurate and unbiased analysis. Notably, a substantial portion of the data, exceeding 40%, is from Atauro, with gill net being the most frequently reported gear type across all the municipalities. To mitigate the skew caused by this overrepresentation, we strategically divided the dataset into four distinct subsets: Atauro GN: Focused on data from Atauro using gill nets. Atauro AG: Included data from Atauro using fishing methods other than gill nets. Mainland GN: Comprised of gill net data from all municipalities excluding Atauro. Mainland AG: Encompassed data from all other municipalities using non-gill net fishing methods. This subdivision of the dataset was intended to reduce biases and enhance analytical precision. Furthermore, by isolating gill net data, we were able to specifically examine the impact of mesh size on the prediction of nutritional profiles in gill net catches, providing a more focused and detailed analysis of this gear type’s influence on nutritional outcomes. 5.1.2 Clustering and Classification After data partition, we identified recurrent nutritional profiles for each dataset. We assessed the total within sum of square (WSS) of six nutrient concentrations—excluding selenium—to identify the optimal number of clusters (distinctive nutritional profiles). Once established the optimal number of clusters for each dataset, we proceeded with the K-means clustering method to organize the data into distinct groups based on similarities in nutrient concentrations. Each trip was grouped based on its nutrient concentration profile, thereby enabling us to discern patterns and categorize trips according to their nutritional profile. The K-means algorithm functions by assigning each data point to the nearest cluster, based on the mean value of the points in the cluster. This iterative process continues until the assignment of points to clusters no longer changes, indicating that the clusters are as distinct as possible. The result is a set of clusters that represent unique nutritional profiles, each characterized by a specific combination of nutrient concentrations. Subsequent to the clustering, we conducted Permutational Multivariate Analysis of Variance (PERMANOVA) to validate the clustering methodology across four distinct datasets: Atauro AG, Atauro GN, Mainland AG, and Mainland GN. PERMANOVA is a robust non-parametric statistical test that evaluates whether there are significant differences between groups. Unlike traditional ANOVA, PERMANOVA does not rely on assumptions of normality and is therefore suitable for ecological data, which often do not follow normal distributions. Our PERMANOVA analysis was conducted on each of the four subsets on a distance matrix representing pairwise dissimilarities in nutrient concentrations across all fishing trips. This approach allowed us to test the hypothesis that the nutritional profiles of fishing trips within the same cluster are more similar to each other than to trips in different clusters. Finally, we performed a XGBoost model to each data subset to predict the nutritional profiles based on the fishing strategy, habitat and season. We employed the XGBoost algorithm due to its effectiveness in preventing overfitting and its ability to highlight key predictors. We used mesh size, habitat, quarter of the year, and vessel type as predictors for gill net subsets. For other gear types, the models used habitat x gear interaction, habitat, gear type, quarter of the year, and vessel type as predictors. Model tuning was conducted dynamically, adjusting several parameters including the number of trees, tree depth, loss reduction, sample size, and early stopping. The 4 data subsets were split into training (80%) and testing (20%) sets, with 10-fold cross-validation applied to the training set for enhanced accuracy and generalizability. The models’ performance was assessed using accuracy, ROC AUC, sensitivity, and specificity, providing a comprehensive understanding of their ability to accurately distinguish between different nutritional profiles. The ROC curves and AUC values offered an additional layer of model effectiveness evaluation. We employed SHapley Additive exPlanations (SHAP) values to dissect and quantify the influence of various predictors on the nutritional profiles predicted by our XGBoost models. SHAP values, rooted in cooperative game theory, offer a nuanced approach to understanding machine learning model outputs. They decompose a model’s prediction into contributions from each feature, illuminating not only the significance of these features but also the direction of their impact on the prediction. Specifically, for subsets involving gill net fishing methods (Atauro GN and Mainland GN), our focus was on understanding the impact of mesh size. In contrast, for the other subsets (Atauro AG and Mainland AG), which included different fishing methods, we concentrated on analyzing how the habitat and gear type interacted and influenced the nutritional profile predictions. 5.2 Results 5.2.1 Clusters The WSS analysis indicated that either 4 or 5 clusters were the best for organizing each subset of our data. We decided to use 5 clusters for all subsets to maintain uniformity across our analyses and to better represent the varied patterns in nutritional profiles. The bar chart (Figure 5.1) displaying nutrient adequacy across nutritional profiles indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The profiles are the result of k-means clustering, reflecting distinct groupings based on the type and quantity of nutrients present in the catch. These clusters elucidate the variation in nutrient content obtained through different fishing gear types and locations. For Atauro, considering all gear types (Panel a), the results demonstrate variability in nutrient adequacy across clusters. Notably, clusters 5 and 4 are prominent for their high vitamin A content. Conversely, calcium is more consistently distributed across all clusters, reflecting a degree of nutritional stability in this element. Protein content appears more uniformly spread, albeit with a slight elevation in cluster 1. Cluster 3 is remarkable for its zinc content, which is markedly higher than in other clusters, while iron content is predominantly higher in cluster 1, distinguishing it significantly from others. When the focus narrows to gill net gear in Atauro (Panel b), there is a distinct distribution pattern where calcium is notably more abundant in cluster 3. Additionally, clusters 2 and 4 are characterized by a higher concentration of vitamin A, suggesting that gill net gear may selectively capture species with higher amounts of these nutrients. The mainland dataset utilizing all gear types (Panel c) also reveals a distinct distribution of nutrients. Clusters 1 and 2 have higher levels of calcium, with cluster 2 showing a particularly high value that surpasses other clusters, while cluster 1 is particular rich in omega-3. Vitamin A shows a significant peak in cluster 2, indicating a unique subset of catch composition in terms of this nutrient. Finally, focusing on the mainland using only gill net gear (Panel d), the data suggests a more even distribution of omega-3 across the clusters, with clusters 4 and 2 showing marginally higher values. Calcium have a higher occurrence in clusters 2 and 4, while zinc is considerably more prevalent in cluster 5. Iron, although present in all clusters, is most concentrated in cluster 4. ## Warning in get_plot_component(plot, "guide-box"): Multiple components found; returning the first one. To return all, use ## `return_all = TRUE`. Figure 5.1: Distribution of nutrient adequacy across k-means clusters. The bar chart delineates the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch within identified k-means clusters. Each bar is categorized into six segments corresponding to the evaluated nutrients. The clusters are enumerated on the y-axis, message=FALSE, warning=FALSE, each representing a group with a distinct nutritional profile as determined by the cluster analysis. The x-axis and the white labels in the bars quantify the count of individuals within each cluster that meet the RNI for the respective nutrients, underlining the variability in nutritional adequacy across clusters. Panels (A) through (D) compare these distributions across different fishing practices and locations, namely Atauro and the Mainland, using all gear types or exclusively gill nets. The scatter plot from the k-means clustering (Figure 5.2) showed the distribution of nutritional profiles across different clusters in each data subset. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutritional profiles among the fishing trips. Figure 5.2: Nutritional profile clustering of fishing trips by region and gear type. Each plot presents a k-means clustering analysis of fishing trip observations, grouped by their nutritional contributions to the Recommended Nutrient Intake (RNI) for six nutrients. The four panels, labeled (A) through (D), display data subsets for Atauro and the Mainland, utilizing all gear types and gill nets specifically. The scatter plots within each panel are charted in a two-dimensional space defined by the first two principal components, with the axes denoting the percentage of explained variance. Points are color-coded to denote distinct nutritional profile clusters derived from the k-means algorithm. Convex hulls define the periphery of each cluster, providing insight into the cluster density and separation. Convex hulls around the clusters aid in visualizing the distribution and delineation of nutritional profile groupings across different fishing methods and geographic areas. The PERMANOVA analyses (Table 5.1) revealed statistically significant differences between clusters, suggesting robust groupings based on the nutritional profiles. The pseudo-F statistics were remarkably high in all cases, indicating strong differentiation between clusters. Specifically, the R² values were 0.87, 0.88, 0.84, and 0.81 for Atauro AG, Atauro GN, Mainland AG, and Mainland GN respectively, indicating that between 81% to 88% of the variance in nutrient concentrations was explained by the clusters. The high R² values underscore the distinctness of the clusters, reinforcing the validity of the K-means clustering. These findings were consistent across all the datasets, with p-values below 0.001, providing clear evidence to reject the null hypothesis of no difference between clusters. Hence, the PERMANOVA results robustly support the effectiveness of the K-means algorithm in capturing meaningful patterns in nutritional profiles. Table 5.1: Results of PERMANOVA analysis assessing the homogeneity of nutritional profiles within fishing trip clusters. The analysis was conducted across four datasets: Atauro with all gears (atauro_AG), Atauro with gill nets (atauro_GN), Mainland with all gears (mainland_AG), and Mainland with gill nets (mainland_GN). For each dataset, the term ‘clusters’ represents the within-group sum of squares (SUMOFSQS), which measures the variance within the nutritional profiles, while ‘Residual’ represents the variance between nutritional profiles Degrees of Freedom (DF), R-squared values (R2), and associated statistics indicate the strength and significance of the clustering. The R2 value quantifies the proportion of variance explained by the clusters. 5.2.2 XGBoost model performance In the analysis of the XGBoost model’s predictive performance, both quantitative and visual assessments were conducted, detailed in Table 5.3 and Figure 5.4, respectively. The Receiver Operating Characteristic (ROC) curves (see ML model interpretation) presented in Figure 5.3 offer a graphical evaluation of the model’s sensitivity and specificity across four subsets of fishing data, categorized by region and gear type. These curves plot the true positive rate against the false positive rate for each nutritional profile group identified within the data. An examination of the ROC curves reveals variability in the model’s ability to distinguish between nutritional profile groups. The areas under the curves (AUC) provide a numerical measure of the model’s discriminative power, with a value of 1 representing perfect prediction and 0.5 indicating no discriminative power. While none of the profile groups reach perfection, several demonstrate substantial AUC values, indicating a robust ability to classify observations accurately. In comparing these visual findings with the statistical data from Table 5.2, it is observed that subsets from Atauro (both with all gears and gill nets) yield higher AUC, accuracy, and kappa statistics, suggesting a more consistent and accurate classification of nutritional profiles. These subsets also show higher sensitivity and specificity, indicating a balanced predictive capability for identifying true positives and true negatives. Conversely, the Mainland subsets exhibit lower performance metrics, indicating a more challenging classification scenario. This is reflected in the ROC curves where the lines for the Mainland subsets are farther from the top-left corner, suggesting a lower true positive rate relative to the false positive rate compared to the Atauro subsets. The positive predictive value (PPV) and negative predictive value (NPV), which provide insight into the model’s precision and reliability, also align with the ROC curve analysis, showing higher values for the Atauro subsets. This indicates that when the model predicts a particular nutritional profile for these subsets, it is more likely to be correct. The Matthew’s correlation coefficient (MCC) values, a balanced measure of quality for binary classifications, corroborate the ROC analysis by indicating that the Atauro subsets maintain a higher quality of prediction across classes. The integrated analysis of Table 5.2 and Figure 5.3 reveals a differentiated performance of the XGBoost model across various subsets of fishing data. The model showcases commendable predictive strength in the Atauro subsets, with high AUC, accuracy, and kappa metrics indicating a reliable classification of nutritional profiles. The ROC curve analysis further supports this, with curves for Atauro subsets nearer to the desired top-left corner, denoting higher sensitivity and specificity. In contrast, the Mainland subsets, despite achieving moderate success, suggest an area for improvement, as seen by their relative distance from the optimal point on the ROC curves and lower performance metrics. This suggests that while the model is effective in identifying nutritional profiles in certain contexts, its performance is not uniformly high across all subsets. Figure 5.3: Receiver Operating Characteristic (ROC) Curves for evaluating the performance of a cluster-based XGBoost classification model across four distinct fishing datasets: Atauro with all gears (a), Atauro with gill nets (b), Mainland with all gears (c), and Mainland with gill nets (d). Each curve represents one of the five clusters obtained from the classification, with different colors marking each cluster. Data points on the curves indicate the trade-off between sensitivity (true positive rate) and 1-specificity (false positive rate) for each cluster. The proximity of the curves to the top-left corner reflects the accuracy of the model in classifying the nutritional profiles into the correct clusters. Table 5.2: Performance Metrics for XGBoost Model Across Fishing Data Subsets. This table provides a comprehensive overview of the predictive performance of an XGBoost classification model for four distinct subsets of fishing data: Atauro with all gears (ATAURO AG), Atauro with gill nets (ATAURO GN), Mainland with all gears (MAINLAND AG), and Mainland with gill nets (MAINLAND GN). Key performance indicators include ROC-AUC (area under the receiver operating characteristic curve), accuracy, Kappa (kap), sensitivity (sens), specificity (spec), positive predictive value (ppv), negative predictive value (npv), Matthew’s correlation coefficient (mcc), Youden’s J index (j_index), balanced accuracy (bal_accuracy), detection prevalence, precision, recall, and F measure (f_meas). The metrics collectively reflect the model’s ability to discriminate between nutritional profiles, its overall accuracy, and the balance between the sensitivity and specificity for each subset. 5.2.2.1 Models explanation The analysis of SHAP values (see ML model explanation) from gill net models (Figure 5.4, A-B), illuminates the influence of how the mesh size and habitat collectively determine the nutritional profiles in the Atauro (panel A) and Mainland (panel B). In Atauro, mesh sizes smaller than 40 mm are significantly associated with a higher likelihood of predicting the nutritional profile NP-2 across diverse habitats, including reefs, deep habitats, and to a lesser extent, Fish Aggregating Devices (FADs). Furthermore, these smaller mesh sizes exhibit a reduced association with nutritional profile NP-3, particularly when utilized within mangrove environments, and with NP-1 in beach and deep habitat settings. Conversely, mesh sizes around 50 mm are predominantly linked with nutritional profile NP-3, mainly within reef and FAD environments. As the analysis extends to larger mesh sizes, those measuring 60 and 80 mm show a strong correlation with nutritional profiles NP-4 and NP-5, respectively, especially when fishing occurs across reefs and seagrass areas, with a minor association observed in beach environments for the 80 mm mesh size. This includes a modest connection to NP-1, notably when fishing takes place in deep areas. For mesh sizes exceeding 80 mm, the data indicates a notable shift, with nutritional profile NP-1 becoming the most prevalent prediction among the various profiles, particularly when fishing in reef and mangrove habitats. Upon evaluating SHAP values derived from Mainland data, a more diverse pattern of associations emerges. Mesh sizes smaller than 30 mm, employed in beach, reef, and mangrove settings, are linked with nutritional profile NP-2, with a similar association observed in deep habitats. Meshes ranging from 30 to 40 mm are strong indicators of nutritional profile NP-1 across a broad spectrum of environments, especially in deep and reef areas. Increasing the mesh size to between 40 and 70 mm shifts the likelihood towards nutritional profile NP-5 as the most probable outcome for various fishing grounds, including reefs, deep environments, mangroves, and beaches. At the larger end of the spectrum, mesh sizes above 70 mm are more likely to predict nutritional profile NP-3, particularly in deep and FAD environments where SHAP values are notably high, with reefs, mangroves, and beaches also displaying relatively high values, and a noted increase in the likelihood of NP-3 outcomes in FAD grounds when using a 100 and 130 mm mesh sizes. The SHAP value analysis for all fishing gear types other than gill nets reveals the complex interplay between the habitat where fishing occurs, the type of gear used, and whether the boats are motorised or unmotorised (Figure 5.4, panels C and D). In the Atauro dataset, as shown in panel C, the nutritional profile NP-1 is commonly associated with the use of long lines in deep water habitats, particularly from unmotorised boats. Hand lines in the same deep water habitats, however, shift the prediction towards nutritional profile NP-2. For nutritional profile NP-4, seine nets emerge as the most likely gear to yield this outcome in deep environments, though the use of hand lines also contributes to a lesser degree. Profiles NP-3 and NP-5 display a similarity in that they are both frequently predicted when fishing with spear guns; the former is more associated with unmotorised boats and the latter with motorised vessels. These two profiles are also set apart by their wider spread across various habitats that are connected to coastal areas, in contrast to the other profiles which are predominantly linked with deeper waters. In Mainland, the application of cast nets in reef habitats shows a strong link to nutritional profile NP-1. In FAD settings, nutritional profile NP-2 emerges as the most common outcome regardless of the gear type employed, with the notable exception of long lines, which instead suggest a higher likelihood of resulting in profile NP-3. The profiles NP-4 and NP-5 are distinctively aligned with certain fishing practices: NP-4 is closely associated with the use of hand lines in deep habitats, and NP-5 is characteristic of manual collection and spearfishing in littoral zones such as reefs and beaches. Figure 5.4: Differential influence of mesh size and habitat x gear type interaction on the nutritional profile predictions in Atauro (A and C) and in Mainland (B and D). Panels A-B: These panels elucidate the impact of mesh size on the probability of observing various nutritional profiles in Atauro (Panel A) and Mainland (Panel B). Each panel includes five plots corresponding to distinct nutritional profiles (NP1-NP5), as forecasted by gill net XGBoost models. The plots exhibit distributions of SHAP values over a range of mesh sizes. Each data point is color-coded to represent different habitats (Beach, Deep, FAD, Mangrove, Reef and Seagrass), clarifying the mesh size’s influence on the accuracy of predictions within each habitat. The y-axis details mesh size ranges, while the x-axis measures SHAP values, where higher values signal a stronger likelihood of a particular nutritional profile’s presence. The size and opacity of each point are proportionate to the SHAP value’s magnitude, visually indicating the significance of each data point in influencing the model’s predictions. Panels C-D: In these panels, the interplay among habitat, gear type, and vessel type (motorized or unmotorized) is analyzed in relation to nutritional profiles in Atauro (Panel C) and Mainland (Panel D). Each plot showcases SHAP value distributions for the five nutritional profiles (NP1-NP5) predicted by XGBoost models applied to datasets encompassing all gear types, excluding gill nets. Data points are color-coded to differentiate between motorized and unmotorized vessels, shedding light on how vessel type, alongside habitat and gear interactions, modulates nutritional profile predictions. Echoing Panels A-B, elevated SHAP values on the x-axis indicate a heightened probability of a specific nutritional profile. Concurrently, the points’ size and opacity correspond to the SHAP values, denoting their relative impact on the outcome prediction. 5.3 Preliminary considerations By using a profiling approach, we can avoid overfishing and habitat depletion. Indeed, instead of focusing on just one species, we spread our fishing efforts across multiple fish groups when sourcing a particular nutrient. The results suggest that in order to get a certain nutriotional supply (for example iron-rich foods) we can leverage on a diversified combination of gear types and habitats. From the results we can infer that gathering more information, particularly from less represented environments and fishing practices, can lead to new opportunities to improve the supply of foods targeting specific nutritional needs. "],["simple.html", "6 In simple terms 6.1 ML model interpretation 6.2 ML model explanation", " 6 In simple terms 6.1 ML model interpretation ROC Curve: The curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold settings. The true positive rate is on the y-axis, and the false positive rate is on the x-axis. Performance: A perfect classifier would have a point in the upper left corner of the graph, where the true positive rate is 1 (or 100%) and the false positive rate is 0. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. Diagonal Line: The dotted diagonal line represents a no-skill classifier (e.g., random guessing). A good classifier stays as far away from this line as possible (toward the upper left corner). Area Under the Curve (AUC): The area under each ROC curve (AUC) is a measure of the test’s accuracy. An AUC of 0.5 suggests no discrimination (no better than random chance), while an AUC of 1.0 suggests perfect discrimination. 6.2 ML model explanation SHAP values: help in understanding how each predictor in the dataset contributed to each particular prediction. A high positive SHAP value for a feature increases the probability of a certain prediction, while a high negative SHAP value decreases it. "],["afigures.html", "7 Other figures", " 7 Other figures Figure 7.1: To define Figure 7.2: To define "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "Modelling scenarios for nutrient-sensitive fisheries management 1 Content", " Modelling scenarios for nutrient-sensitive fisheries management Lorenzo Longobardi Last update: 2024-05-25 1 Content This book contains analyses and reports of the paper ‘Modelling scenarios for nutrient-sensitive fisheries management’. All data and code to generate the analyses are in organised in https://github.com/WorldFishCenter/timor.nutrients. "],["data.html", "2 Data 2.1 Catch weight and nutrional content 2.2 Checks and limitations", " 2 Data The research presented in this book relies on two primary sources of data: Recorded Catch (RC): This dataset comprises detailed records of fishing trips that were documented by data collectors in the coastal municipalities of East Timor starting from January 2018. Estimated Catch (EC): This dataset provides a broader view of catch data on a regional level. It is created by combining RC with additional information, including the frequency of fishing trips made by each fishing boat and the total number of boats surveyed (censused) in each municipality. This combination extrapolates the recorded catch data to a larger scale. 2.1 Catch weight and nutrional content The total estimated catch weight is determined by the number of individuals and the length range of each catch. Specifically, during the initial phase of the Peskas project (July 2017 - April 2019), the standard length measurement used was the fork length (FL), which later changed to the total length (TL) in the subsequent and current version of the project. We utilized the API service offered by the FishBase database to incorporate length-to-length and length-to-weight conversion tables, using information from survey landings to calculate the weight in grams based on the following formula: W = a × L^b Here, W represents the weight in grams, L is the total length (TL) in centimeters, and a and b are the conversion parameters obtained from FishBase for each fish species. The FishBase database provides length-to-length and length-to-weight relationships for over 5,000 fish species. Typically, there are multiple records for the parameters a and b for each species. Since the length measurements in Peskas’ first version pertained to FL, we initially standardized all length measurements to TL using the FishBase length-to-length conversion tables. Subsequently, we applied the TL-to-weight conversion tables to estimate the weights. The FishBase length-to-weight conversion tables offer species-level taxonomic resolution. To derive a singular length-to-weight relationship for each fish group, we calculated the median values of parameters a and b for all species within a particular fish group. To ensure relevance to the region of interest, we refined the species list using FAO country codes (https://www.fao.org/countryprofiles/iso3list/en/) pertinent to Timor-Leste and Indonesia (country codes 626 and 360, respectively). For instance, to ascertain the weight of a catch categorized under the fish group labeled ECN (representing the Echeneidae family), we first identified the species within ECN documented in Timor-Leste and Indonesia. After this, we computed the average values of the parameters a and b for the identified species, which in this case were Echeneis naucrates and Remora remora (as illustrated in the figure below). To address the scarcity of measured nutrient values for fish, which are typically limited to a few species and countries. To overcome this data limitation, MacNeil et al. developed a Bayesian hierarchical model that leverages both phylogenetic information and trait-based information to predict concentrations of seven essential nutrients: calcium, iron, omega-3 fatty acids, protein, selenium, vitamin A, and zinc for both marine and inland fish species globally (see Hicks et al. 2019). For each catch, the nutritional yield was calculated by combining the validated weight estimates for each fish group with the modelled nutrient concentrations. Specifically, we used the highest posterior predictive density values for each of the seven nutrients, which can be found in the repository (https://github.com/mamacneil/NutrientFishbase). For non-fish groups—including octopuses, squids, cockles, shrimps, crabs, and lobsters—nutritional yield information was not available in the NutrientFishbase repository models. We retrieved the necessary data for these groups from the Global food composition database, using the same methodological approach as for the fish groups to estimate their nutritional content. To represent the nutrient concentration associated with each fish group, we used the median value as a summarizing metric. Figure 2.1: Distribution of nutrients’ concentration for each fish group. Dots represent the median, bars represent the 95% confidence interval. 2.2 Checks and limitations Check groups with higher dispersion… Dow we need to narrow species grouping? "],["highlight.html", "3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario", " 3 Highlight statistics 3.1 Timor-Est SSF nutritional scenario The table uses the EC dataset and summarizes the main statistics on nutrient supply for each region related to WRA, the number of woman of reproductive age (15-49 years old). Below is a description of each table’ column: MUNICIPALITY (POPULATION): Municipality and WRA number in 2022. NUTRIENT: Nutrient of reference ANNUAL SUPPLY: Aggregated annual value in kg. These values represent municipal-level estimates based on the number of fishing boats recorded in the 2021 Timor-Leste boat census, average number of fishing trips per boat and average landing weight values for each fish group. N. PEOPLE SUPPLIED DAILY: It describes the number of people meeting the nutrient’ RNI for each municipality. RNI values used are the following: Selenium Zinc Protein Total -3 PUFA Calcium Iron Vitamin-A 0.000026 0.0049 46 2.939 1 0.0294 0.0005 The 20% of RNIs values was take as reference in consideration of the fact that an ‘adequate diet’ is expected to comprise 5 food group. RNIs were then converted from grams to kg (dividing by 1000) and the requirements was calculated as: \\(\\frac{Anuual\\ supply\\ (kg)}{(RNI\\times 0.20) \\ / 1000} /365\\) POPULATION MEETING RNI REQUIREMENTS: Percentage of the WRA population meeting the RNI requirements in each municipality: \\(\\frac{Number\\ of\\ people\\ supplied\\ daily}{Municipality\\ population} \\times 100\\) "],["distribution.html", "4 Nutrients distribution 4.1 Fish groups 4.2 Habitat and gear type 4.3 Nutritional contribution and economic profiling", " 4 Nutrients distribution This section presents the analyses that illustrates the distribution of nutrients within various components of small-scale fisheries in East Timor. 4.1 Fish groups Figure 4.1: The bar chart illustrates the contribution of a variety of marine food sources to the Recommended Nutrient Intake (RNI) for six fundamental nutrients, based on a 100g portion. Each bar is a color-segmented stacked visual, with distinct hues corresponding to individual nutrients, and white numbers within indicating the specific percentage contribution of each nutrient. The chart incorporates the mean annual catch in metric tons for each marine species from 2018 to 2023, presented at the end of each bar, providing a view of both the nutritional value and the harvest volume of these essential food sources. The transparency of these values is adjusted to reflect each species’ relative contribution to the mean annual catch 4.2 Habitat and gear type Figure 4.2: Sankey diagram showing the relative distribution of key nutrients across various marine habitats and the corresponding extraction by different fishing gear types used in Timor-Est small-scale fisheries. 4.3 Nutritional contribution and economic profiling Figure 4.3: Nutritional and economic profiling of key fish groups within the Timor-Leste fishery.Panel A, Distribution of nutritional content among different functional fish groups: Small pelagics, Large pelagics, Small demersals, Large demersals, Sharks and rays and Other groups, that includes shrimps, molluscs, cephalopods and crustaceans. The plot shows the ranked contribution of each functional fish to the supply of calcium, omega-3, iron, protein, vitamin A, and zinc during the period 2018-2023. Panel B, Comparative analysis of nutritional score versus economic accessibility for key fish groups. This scatter plot displays the relationship between the cumulative nutritional score and the market price for various fish groups within Timor-Leste fishery. The x-axis quantifies the cumulative contribution to the Recommended Nutrient Intake (RNI) for six essential nutrients (zinc, protein, omega-3, calcium, iron, vitamin A) from a 100g portion of each fish group. The y-axis represents the average market price per kilogram for each group. Dot size and the accompanying numerical labels reflect the relative catch percentage of each group, serving as an index of accessibility and availability. Panel C, The bar chart illustrates the contribution of each habitat to the Recommended Nutrient Intake (RNI) for six fundamental nutrients, based on a 100g portion. Each bar is a color-segmented stacked visual, with distinct hues corresponding to individual nutrients, and white numbers within indicating the specific percentage contribution of each nutrient. "],["profiles.html", "5 Timor SSF nutritional profiles 5.1 Methods 5.2 Results 5.3 Preliminary considerations", " 5 Timor SSF nutritional profiles 5.1 Methods In this section, we identified recurrent nutritional profiles based on RC data, then, we predicted and explained the nutritional profiles on the basis of the fishing strategy and environmental factors. 5.1.1 Data analysis design and subset division As a first step we addressed the inherent imbalance in the RC data, a critical aspect for ensuring accurate and unbiased analysis. Notably, a substantial portion of the data, exceeding 40%, is from Atauro, with gill net being the most frequently reported gear type across all the municipalities. To mitigate the skew caused by this overrepresentation, we strategically divided the dataset into four distinct subsets: Atauro GN: Focused on data from Atauro using gill nets. Atauro AG: Included data from Atauro using fishing methods other than gill nets. Mainland GN: Comprised of gill net data from all municipalities excluding Atauro. Mainland AG: Encompassed data from all other municipalities using non-gill net fishing methods. This subdivision of the dataset was intended to reduce biases and enhance analytical precision. Furthermore, by isolating gill net data, we were able to specifically examine the impact of mesh size on the prediction of nutritional profiles in gill net catches, providing a more focused and detailed analysis of this gear type’s influence on nutritional outcomes. 5.1.2 Clustering and Classification After data partition, we identified recurrent nutritional profiles for each dataset. We assessed the total within sum of square (WSS) of six nutrient concentrations—excluding selenium—to identify the optimal number of clusters (distinctive nutritional profiles). Once established the optimal number of clusters for each dataset, we proceeded with the K-means clustering method to organize the data into distinct groups based on similarities in nutrient concentrations. Each trip was grouped based on its nutrient concentration profile, thereby enabling us to discern patterns and categorize trips according to their nutritional profile. The K-means algorithm functions by assigning each data point to the nearest cluster, based on the mean value of the points in the cluster. This iterative process continues until the assignment of points to clusters no longer changes, indicating that the clusters are as distinct as possible. The result is a set of clusters that represent unique nutritional profiles, each characterized by a specific combination of nutrient concentrations. Subsequent to the clustering, we conducted Permutational Multivariate Analysis of Variance (PERMANOVA) to validate the clustering methodology across four distinct datasets: Atauro AG, Atauro GN, Mainland AG, and Mainland GN. PERMANOVA is a robust non-parametric statistical test that evaluates whether there are significant differences between groups. Unlike traditional ANOVA, PERMANOVA does not rely on assumptions of normality and is therefore suitable for ecological data, which often do not follow normal distributions. Our PERMANOVA analysis was conducted on each of the four subsets on a distance matrix representing pairwise dissimilarities in nutrient concentrations across all fishing trips. This approach allowed us to test the hypothesis that the nutritional profiles of fishing trips within the same cluster are more similar to each other than to trips in different clusters. Finally, we performed a XGBoost model to each data subset to predict the nutritional profiles based on the fishing strategy, habitat and season. We employed the XGBoost algorithm due to its effectiveness in preventing overfitting and its ability to highlight key predictors. We used mesh size, habitat, quarter of the year, and vessel type as predictors for gill net subsets. For other gear types, the models used habitat x gear interaction, habitat, gear type, quarter of the year, and vessel type as predictors. Model tuning was conducted dynamically, adjusting several parameters including the number of trees, tree depth, loss reduction, sample size, and early stopping. The 4 data subsets were split into training (80%) and testing (20%) sets, with 10-fold cross-validation applied to the training set for enhanced accuracy and generalizability. The models’ performance was assessed using accuracy, ROC AUC, sensitivity, and specificity, providing a comprehensive understanding of their ability to accurately distinguish between different nutritional profiles. The ROC curves and AUC values offered an additional layer of model effectiveness evaluation. We employed SHapley Additive exPlanations (SHAP) values to dissect and quantify the influence of various predictors on the nutritional profiles predicted by our XGBoost models. SHAP values, rooted in cooperative game theory, offer a nuanced approach to understanding machine learning model outputs. They decompose a model’s prediction into contributions from each feature, illuminating not only the significance of these features but also the direction of their impact on the prediction. Specifically, for subsets involving gill net fishing methods (Atauro GN and Mainland GN), our focus was on understanding the impact of mesh size. In contrast, for the other subsets (Atauro AG and Mainland AG), which included different fishing methods, we concentrated on analyzing how the habitat and gear type interacted and influenced the nutritional profile predictions. 5.2 Results 5.2.1 Clusters The WSS analysis indicated that either 4 or 5 clusters were the best for organizing each subset of our data. We decided to use 5 clusters for all subsets to maintain uniformity across our analyses and to better represent the varied patterns in nutritional profiles. The bar chart (Figure 5.1) displaying nutrient adequacy across nutritional profiles indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The profiles are the result of k-means clustering, reflecting distinct groupings based on the type and quantity of nutrients present in the catch. These clusters elucidate the variation in nutrient content obtained through different fishing gear types and locations. For Atauro, considering all gear types (Panel a), the results demonstrate variability in nutrient adequacy across clusters. Notably, clusters 5 and 4 are prominent for their high vitamin A content. Conversely, calcium is more consistently distributed across all clusters, reflecting a degree of nutritional stability in this element. Protein content appears more uniformly spread, albeit with a slight elevation in cluster 1. Cluster 3 is remarkable for its zinc content, which is markedly higher than in other clusters, while iron content is predominantly higher in cluster 1, distinguishing it significantly from others. When the focus narrows to gill net gear in Atauro (Panel b), there is a distinct distribution pattern where calcium is notably more abundant in cluster 3. Additionally, clusters 2 and 4 are characterized by a higher concentration of vitamin A, suggesting that gill net gear may selectively capture species with higher amounts of these nutrients. The mainland dataset utilizing all gear types (Panel c) also reveals a distinct distribution of nutrients. Clusters 1 and 2 have higher levels of calcium, with cluster 2 showing a particularly high value that surpasses other clusters, while cluster 1 is particular rich in omega-3. Vitamin A shows a significant peak in cluster 2, indicating a unique subset of catch composition in terms of this nutrient. Finally, focusing on the mainland using only gill net gear (Panel d), the data suggests a more even distribution of omega-3 across the clusters, with clusters 4 and 2 showing marginally higher values. Calcium have a higher occurrence in clusters 2 and 4, while zinc is considerably more prevalent in cluster 5. Iron, although present in all clusters, is most concentrated in cluster 4. ## Warning in get_plot_component(plot, "guide-box"): Multiple components found; returning the first one. To return all, use `return_all = TRUE`. Figure 5.1: Distribution of nutrient adequacy across k-means clusters. The bar chart delineates the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch within identified k-means clusters. Each bar is categorized into six segments corresponding to the evaluated nutrients. The clusters are enumerated on the y-axis, message=FALSE, warning=FALSE, each representing a group with a distinct nutritional profile as determined by the cluster analysis. The x-axis and the white labels in the bars quantify the count of individuals within each cluster that meet the RNI for the respective nutrients, underlining the variability in nutritional adequacy across clusters. Panels (A) through (D) compare these distributions across different fishing practices and locations, namely Atauro and the Mainland, using all gear types or exclusively gill nets. The scatter plot from the k-means clustering (Figure 5.2) showed the distribution of nutritional profiles across different clusters in each data subset. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutritional profiles among the fishing trips. Figure 5.2: Nutritional profile clustering of fishing trips by region and gear type. Each plot presents a k-means clustering analysis of fishing trip observations, grouped by their nutritional contributions to the Recommended Nutrient Intake (RNI) for six nutrients. The four panels, labeled (A) through (D), display data subsets for Atauro and the Mainland, utilizing all gear types and gill nets specifically. The scatter plots within each panel are charted in a two-dimensional space defined by the first two principal components, with the axes denoting the percentage of explained variance. Points are color-coded to denote distinct nutritional profile clusters derived from the k-means algorithm. Convex hulls define the periphery of each cluster, providing insight into the cluster density and separation. Convex hulls around the clusters aid in visualizing the distribution and delineation of nutritional profile groupings across different fishing methods and geographic areas. The PERMANOVA analyses (Table 5.1) revealed statistically significant differences between clusters, suggesting robust groupings based on the nutritional profiles. The pseudo-F statistics were remarkably high in all cases, indicating strong differentiation between clusters. Specifically, the R² values were 0.87, 0.88, 0.84, and 0.81 for Atauro AG, Atauro GN, Mainland AG, and Mainland GN respectively, indicating that between 81% to 88% of the variance in nutrient concentrations was explained by the clusters. The high R² values underscore the distinctness of the clusters, reinforcing the validity of the K-means clustering. These findings were consistent across all the datasets, with p-values below 0.001, providing clear evidence to reject the null hypothesis of no difference between clusters. Hence, the PERMANOVA results robustly support the effectiveness of the K-means algorithm in capturing meaningful patterns in nutritional profiles. Table 5.1: Results of PERMANOVA analysis assessing the homogeneity of nutritional profiles within fishing trip clusters. The analysis was conducted across four datasets: Atauro with all gears (atauro_AG), Atauro with gill nets (atauro_GN), Mainland with all gears (mainland_AG), and Mainland with gill nets (mainland_GN). For each dataset, the term ‘clusters’ represents the within-group sum of squares (SUMOFSQS), which measures the variance within the nutritional profiles, while ‘Residual’ represents the variance between nutritional profiles Degrees of Freedom (DF), R-squared values (R2), and associated statistics indicate the strength and significance of the clustering. The R2 value quantifies the proportion of variance explained by the clusters. 5.2.2 XGBoost model performance In the analysis of the XGBoost model’s predictive performance, both quantitative and visual assessments were conducted, detailed in Table 5.3 and Figure 5.4, respectively. The Receiver Operating Characteristic (ROC) curves (see ML model interpretation) presented in Figure 5.3 offer a graphical evaluation of the model’s sensitivity and specificity across four subsets of fishing data, categorized by region and gear type. These curves plot the true positive rate against the false positive rate for each nutritional profile group identified within the data. An examination of the ROC curves reveals variability in the model’s ability to distinguish between nutritional profile groups. The areas under the curves (AUC) provide a numerical measure of the model’s discriminative power, with a value of 1 representing perfect prediction and 0.5 indicating no discriminative power. While none of the profile groups reach perfection, several demonstrate substantial AUC values, indicating a robust ability to classify observations accurately. In comparing these visual findings with the statistical data from Table 5.2, it is observed that subsets from Atauro (both with all gears and gill nets) yield higher AUC, accuracy, and kappa statistics, suggesting a more consistent and accurate classification of nutritional profiles. These subsets also show higher sensitivity and specificity, indicating a balanced predictive capability for identifying true positives and true negatives. Conversely, the Mainland subsets exhibit lower performance metrics, indicating a more challenging classification scenario. This is reflected in the ROC curves where the lines for the Mainland subsets are farther from the top-left corner, suggesting a lower true positive rate relative to the false positive rate compared to the Atauro subsets. The positive predictive value (PPV) and negative predictive value (NPV), which provide insight into the model’s precision and reliability, also align with the ROC curve analysis, showing higher values for the Atauro subsets. This indicates that when the model predicts a particular nutritional profile for these subsets, it is more likely to be correct. The Matthew’s correlation coefficient (MCC) values, a balanced measure of quality for binary classifications, corroborate the ROC analysis by indicating that the Atauro subsets maintain a higher quality of prediction across classes. The integrated analysis of Table 5.2 and Figure 5.3 reveals a differentiated performance of the XGBoost model across various subsets of fishing data. The model showcases commendable predictive strength in the Atauro subsets, with high AUC, accuracy, and kappa metrics indicating a reliable classification of nutritional profiles. The ROC curve analysis further supports this, with curves for Atauro subsets nearer to the desired top-left corner, denoting higher sensitivity and specificity. In contrast, the Mainland subsets, despite achieving moderate success, suggest an area for improvement, as seen by their relative distance from the optimal point on the ROC curves and lower performance metrics. This suggests that while the model is effective in identifying nutritional profiles in certain contexts, its performance is not uniformly high across all subsets. Figure 5.3: Receiver Operating Characteristic (ROC) Curves for evaluating the performance of a cluster-based XGBoost classification model across four distinct fishing datasets: Atauro with all gears (a), Atauro with gill nets (b), Mainland with all gears (c), and Mainland with gill nets (d). Each curve represents one of the five clusters obtained from the classification, with different colors marking each cluster. Data points on the curves indicate the trade-off between sensitivity (true positive rate) and 1-specificity (false positive rate) for each cluster. The proximity of the curves to the top-left corner reflects the accuracy of the model in classifying the nutritional profiles into the correct clusters. Table 5.2: Performance Metrics for XGBoost Model Across Fishing Data Subsets. This table provides a comprehensive overview of the predictive performance of an XGBoost classification model for four distinct subsets of fishing data: Atauro with all gears (ATAURO AG), Atauro with gill nets (ATAURO GN), Mainland with all gears (MAINLAND AG), and Mainland with gill nets (MAINLAND GN). Key performance indicators include ROC-AUC (area under the receiver operating characteristic curve), accuracy, Kappa (kap), sensitivity (sens), specificity (spec), positive predictive value (ppv), negative predictive value (npv), Matthew’s correlation coefficient (mcc), Youden’s J index (j_index), balanced accuracy (bal_accuracy), detection prevalence, precision, recall, and F measure (f_meas). The metrics collectively reflect the model’s ability to discriminate between nutritional profiles, its overall accuracy, and the balance between the sensitivity and specificity for each subset. 5.2.2.1 Models explanation The analysis of SHAP values (see ML model explanation) from gill net models (Figure 5.4, A-B), illuminates the influence of how the mesh size and habitat collectively determine the nutritional profiles in the Atauro (panel A) and Mainland (panel B). In Atauro, mesh sizes smaller than 40 mm are significantly associated with a higher likelihood of predicting the nutritional profile NP-2 across diverse habitats, including reefs, deep habitats, and to a lesser extent, Fish Aggregating Devices (FADs). Furthermore, these smaller mesh sizes exhibit a reduced association with nutritional profile NP-3, particularly when utilized within mangrove environments, and with NP-1 in beach and deep habitat settings. Conversely, mesh sizes around 50 mm are predominantly linked with nutritional profile NP-3, mainly within reef and FAD environments. As the analysis extends to larger mesh sizes, those measuring 60 and 80 mm show a strong correlation with nutritional profiles NP-4 and NP-5, respectively, especially when fishing occurs across reefs and seagrass areas, with a minor association observed in beach environments for the 80 mm mesh size. This includes a modest connection to NP-1, notably when fishing takes place in deep areas. For mesh sizes exceeding 80 mm, the data indicates a notable shift, with nutritional profile NP-1 becoming the most prevalent prediction among the various profiles, particularly when fishing in reef and mangrove habitats. Upon evaluating SHAP values derived from Mainland data, a more diverse pattern of associations emerges. Mesh sizes smaller than 30 mm, employed in beach, reef, and mangrove settings, are linked with nutritional profile NP-2, with a similar association observed in deep habitats. Meshes ranging from 30 to 40 mm are strong indicators of nutritional profile NP-1 across a broad spectrum of environments, especially in deep and reef areas. Increasing the mesh size to between 40 and 70 mm shifts the likelihood towards nutritional profile NP-5 as the most probable outcome for various fishing grounds, including reefs, deep environments, mangroves, and beaches. At the larger end of the spectrum, mesh sizes above 70 mm are more likely to predict nutritional profile NP-3, particularly in deep and FAD environments where SHAP values are notably high, with reefs, mangroves, and beaches also displaying relatively high values, and a noted increase in the likelihood of NP-3 outcomes in FAD grounds when using a 100 and 130 mm mesh sizes. The SHAP value analysis for all fishing gear types other than gill nets reveals the complex interplay between the habitat where fishing occurs, the type of gear used, and whether the boats are motorised or unmotorised (Figure 5.4, panels C and D). In the Atauro dataset, as shown in panel C, the nutritional profile NP-1 is commonly associated with the use of long lines in deep water habitats, particularly from unmotorised boats. Hand lines in the same deep water habitats, however, shift the prediction towards nutritional profile NP-2. For nutritional profile NP-4, seine nets emerge as the most likely gear to yield this outcome in deep environments, though the use of hand lines also contributes to a lesser degree. Profiles NP-3 and NP-5 display a similarity in that they are both frequently predicted when fishing with spear guns; the former is more associated with unmotorised boats and the latter with motorised vessels. These two profiles are also set apart by their wider spread across various habitats that are connected to coastal areas, in contrast to the other profiles which are predominantly linked with deeper waters. In Mainland, the application of cast nets in reef habitats shows a strong link to nutritional profile NP-1. In FAD settings, nutritional profile NP-2 emerges as the most common outcome regardless of the gear type employed, with the notable exception of long lines, which instead suggest a higher likelihood of resulting in profile NP-3. The profiles NP-4 and NP-5 are distinctively aligned with certain fishing practices: NP-4 is closely associated with the use of hand lines in deep habitats, and NP-5 is characteristic of manual collection and spearfishing in littoral zones such as reefs and beaches. Figure 5.4: Differential influence of mesh size and habitat x gear type interaction on the nutritional profile predictions in Atauro (A and C) and in Mainland (B and D). Panels A-B: These panels elucidate the impact of mesh size on the probability of observing various nutritional profiles in Atauro (Panel A) and Mainland (Panel B). Each panel includes five plots corresponding to distinct nutritional profiles (NP1-NP5), as forecasted by gill net XGBoost models. The plots exhibit distributions of SHAP values over a range of mesh sizes. Each data point is color-coded to represent different habitats (Beach, Deep, FAD, Mangrove, Reef and Seagrass), clarifying the mesh size’s influence on the accuracy of predictions within each habitat. The y-axis details mesh size ranges, while the x-axis measures SHAP values, where higher values signal a stronger likelihood of a particular nutritional profile’s presence. The size and opacity of each point are proportionate to the SHAP value’s magnitude, visually indicating the significance of each data point in influencing the model’s predictions. Panels C-D: In these panels, the interplay among habitat, gear type, and vessel type (motorized or unmotorized) is analyzed in relation to nutritional profiles in Atauro (Panel C) and Mainland (Panel D). Each plot showcases SHAP value distributions for the five nutritional profiles (NP1-NP5) predicted by XGBoost models applied to datasets encompassing all gear types, excluding gill nets. Data points are color-coded to differentiate between motorized and unmotorized vessels, shedding light on how vessel type, alongside habitat and gear interactions, modulates nutritional profile predictions. Echoing Panels A-B, elevated SHAP values on the x-axis indicate a heightened probability of a specific nutritional profile. Concurrently, the points’ size and opacity correspond to the SHAP values, denoting their relative impact on the outcome prediction. 5.3 Preliminary considerations By using a profiling approach, we can avoid overfishing and habitat depletion. Indeed, instead of focusing on just one species, we spread our fishing efforts across multiple fish groups when sourcing a particular nutrient. The results suggest that in order to get a certain nutriotional supply (for example iron-rich foods) we can leverage on a diversified combination of gear types and habitats. From the results we can infer that gathering more information, particularly from less represented environments and fishing practices, can lead to new opportunities to improve the supply of foods targeting specific nutritional needs. "],["simple.html", "6 In simple terms 6.1 ML model interpretation 6.2 ML model explanation", " 6 In simple terms 6.1 ML model interpretation ROC Curve: The curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold settings. The true positive rate is on the y-axis, and the false positive rate is on the x-axis. Performance: A perfect classifier would have a point in the upper left corner of the graph, where the true positive rate is 1 (or 100%) and the false positive rate is 0. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. Diagonal Line: The dotted diagonal line represents a no-skill classifier (e.g., random guessing). A good classifier stays as far away from this line as possible (toward the upper left corner). Area Under the Curve (AUC): The area under each ROC curve (AUC) is a measure of the test’s accuracy. An AUC of 0.5 suggests no discrimination (no better than random chance), while an AUC of 1.0 suggests perfect discrimination. 6.2 ML model explanation SHAP values: help in understanding how each predictor in the dataset contributed to each particular prediction. A high positive SHAP value for a feature increases the probability of a certain prediction, while a high negative SHAP value decreases it. "],["afigures.html", "7 Other figures", " 7 Other figures Figure 7.1: To define Figure 7.2: To define "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs_book/04-profiles.Rmd b/docs_book/04-profiles.Rmd index 1fddf10..8eafe36 100644 --- a/docs_book/04-profiles.Rmd +++ b/docs_book/04-profiles.Rmd @@ -182,20 +182,97 @@ plot_profiles <- function(x) { scale_color_manual(values = timor.nutrients::palettes$clusters_palette) } -plots <- purrr::map(data$data_raw, plot_profiles) +plots1 <- plot_profiles(data$data_raw$timor_AG_raw) #purrr::map(data$data_raw$timor_AG_raw, plot_profiles) + +means_dat <- + data$data_raw$timor_GN_raw %>% + dplyr::rename_with(~ stringr::str_to_title(.x), .cols = c(.data$zinc:.data$vitaminA)) %>% + dplyr::rename( + "Vitamin-A" = .data$Vitamina, + "Omega-3" = .data$Omega3 + ) %>% + tidyr::pivot_longer(c(Zinc:"Vitamin-A")) %>% + dplyr::group_by(clusters, name) %>% + dplyr::summarise( + mean = mean(value, na.rm = TRUE), + sd = sd(value, na.rm = TRUE), + n = dplyr::n(), + se = sd / sqrt(n), + ci_lower = mean - qt(0.99, df = n - 1) * se, + ci_upper = mean + qt(0.99, df = n - 1) * se + ) + +all_dat <- + data$data_raw$timor_GN_raw %>% + dplyr::rename_with(~ stringr::str_to_title(.x), .cols = c(.data$zinc:.data$vitaminA)) %>% + dplyr::rename( + "Vitamin-A" = .data$Vitamina, + "Omega-3" = .data$Omega3 + ) %>% + tidyr::pivot_longer(c(Zinc:"Vitamin-A")) + +plots2 <- + ggplot() + + theme_bw() + + geom_jitter(data = all_dat, mapping = aes(x = value, y = name, color = clusters), alpha = 0.01, size = 1, position = position_dodge(width = 0.5)) + + geom_point(data = means_dat, mapping = aes(x = mean, y = name, color = clusters), size = 5, position = position_dodge(width = 0.5)) + + labs( + x = "", + y = "", + color = "Profiles" + ) + + ggplot2::theme( + legend.position = "", + plot.margin = unit(c(0, 0, 0, 0), "cm"), + panel.grid = ggplot2::element_blank() + ) + + coord_cartesian(xlim = c(0, 5)) + + scale_fill_manual(values = timor.nutrients::palettes$clusters_palette) + + scale_color_manual(values = timor.nutrients::palettes$clusters_palette) + + annotate( + 'text', + x = 3.5, + y = 2.5, + label = 'On average NP-3 provides enough calcium\nfor 2.2 people per 1kg of catch,\nwhile NP-1 and NP-2 support\n1.5 and 1.7 people, respectively', + size = 2.5 + ) + + annotate( + 'rect', + xmin = 0, + xmax = 3, + ymin = 0.5, + ymax = 1.5, + #alpha = 0.5, + color = rgb(0, 0, 0, alpha = 0.85), + linewidth = 0.3, + fill = "transparent", + linetype = 2 + ) + + annotate( + 'curve', + x = 2, # Play around with the coordinates until you're satisfied + y = 2.4, + yend = 1.7, + xend = 1.5, + col = 'black', + curvature = 0.25, + linewidth = 0.3, + arrow = arrow(length = unit(0.25, 'cm')) + ) + plots <- list( - plots$timor_GN_raw + ggplot2::labs(subtitle = "Gill nets"), - plots$timor_AG_raw + ggplot2::labs(subtitle = "Other gears") + plots2 + ggplot2::labs(subtitle = "Gill nets"), + plots1 + ggplot2::labs(subtitle = "Other gears") ) legend_plot <- cowplot::get_legend(plots[[1]] + - ggplot2::theme( - legend.position = "right", - legend.key.size = ggplot2::unit(0.55, "cm"), - legend.title = ggplot2::element_text(size = 12) - )) + ggplot2::theme( + legend.position = "right", + legend.key.size = ggplot2::unit(0.55, "cm"), + legend.title = ggplot2::element_text(size = 12) + )) combined_plots <- cowplot::plot_grid(plotlist = plots, ncol = 2, labels = "AUTO") x_label <- cowplot::draw_label("NDS per 1kg of catch", x = 0.5, y = 0.05) diff --git a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png index 917ace7..445b750 100644 Binary files a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png and b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/model-settings-1.png differ diff --git a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png index 02b85f0..c3696e8 100644 Binary files a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png and b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png index 03bc91b..a534338 100644 Binary files a/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png and b/docs_book/Timor-nutrient-sensitive-fisheries-management_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/inst/config.yml b/inst/config.yml index 533b0ed..14a7f27 100644 --- a/inst/config.yml +++ b/inst/config.yml @@ -25,4 +25,4 @@ default: vitaminA: 0.0007 omega3: 1.1 pal_nutrients: ["#E07A5F", "#3D405B", "#81B29A", "#F2CC8F", "#0a9396", "#b392ac"] - pal_clusters: ["#258ea6","#ac3931","#c9c94e","#e2b1b1","#b9c6ae"] + pal_clusters: ["#c3b99e", "#527995", "#b76366"] diff --git a/rfish-table__20240525005951_de58c54__.rds b/rfish-table__20240525005951_de58c54__.rds new file mode 100644 index 0000000..b11683c Binary files /dev/null and b/rfish-table__20240525005951_de58c54__.rds differ