-
Clusters
-
The WSS analysis indicated that either 4 or 5 clusters were the best for organizing each subset of our data. We decided to use 5 clusters for all subsets to maintain uniformity across our analyses and to better represent the varied patterns in nutrient profiles.
-
The bar chart (Figure 5.1) displaying nutrient adequacy across nutrient profiles indicated the number of individuals meeting the Recommended Nutrient Intake (RNI) per 1kg of catch for various nutrients. The profiles are the result of k-means clustering, reflecting distinct groupings based on the type and quantity of nutrients present in the catch. For the Atauro dataset using all gear types (Panel a),we observe diverse distributions of nutrient adequacy across the profiles Specifically, clusters 3 and 5 exhibit a notably higher content of vitamin A relative to the other clusters, whereas calcium and protein appear more evenly distributed among all nutrient profiles. The distribution of zinc varies greatly, with cluster 1 showing the greatest concentration. Iron is most abundant in cluster 4, distinguishing it from the rest.
-
For the subset of data from Atauro using only gill net gear (Panel b), the distribution is characterized by higher proportions of calcium in clusters 2 and 4. Additionally, clusters 1 and 4 stand out due to their higher vitamin A content….etc…etc…
-
-
The scatter plot from the k-means clustering (Figure 5.2) showed the distribution of nutrient profiles across different clusters in each data subset. The first two principal components explained a significant portion of the variance, indicating distinct groupings in nutrient profiles among the fishing trips.
-
-
The PERMANOVA analyses (Table 5.1) revealed statistically significant differences between clusters, suggesting robust groupings based on the nutrient profiles. The pseudo-F statistics were remarkably high in all cases, indicating strong differentiation between clusters. Specifically, the R² values were 0.86, 0.82, 0.85, and 0.92 for Atauro AG, Atauro GN, Mainland AG, and Mainland GN respectively, indicating that between 82% to 92% of the variance in nutrient concentrations was explained by the clusters. The high R² values underscore the distinctness of the clusters, reinforcing the validity of the K-means clustering.
-
These findings were consistent across all the datasets, with p-values below 0.001, providing clear evidence to reject the null hypothesis of no difference between clusters. Hence, the PERMANOVA results robustly support the effectiveness of the K-means algorithm in capturing meaningful patterns in nutrient profiles.
-
-
-
Table 5.1: Results of PERMANOVA analysis assessing the homogeneity of nutrient profiles within fishing trip clusters. The analysis was conducted across four datasets: Atauro with all gears (atauro_AG), Atauro with gill nets (atauro_GN), Mainland with all gears (mainland_AG), and Mainland with gill nets (mainland_GN). For each dataset, the term ‘clusters’ represents the within-group sum of squares (SUMOFSQS), which measures the variance within the nutritional profiles, while ‘Residual’ represents the variance between nutritional profiles Degrees of Freedom (DF), R-squared values (R2), and associated statistics indicate the strength and significance of the clustering. The R2 value quantifies the proportion of variance explained by the clusters.
-
-
XGBoost model
-
In the analysis of the XGBoost model’s predictive performance, both quantitative and visual assessments were conducted, detailed in Table 5.2 and Figure 5.3, respectively. The Receiver Operating Characteristic (ROC) curves (see ML model interpretation) presented in Figure 5.3 offer a graphical evaluation of the model’s sensitivity and specificity across four subsets of fishing data, categorized by region and gear type. These curves plot the true positive rate against the false positive rate for each nutritional profile group identified within the data.
-
An examination of the ROC curves reveals variability in the model’s ability to distinguish between nutritional profile groups. The areas under the curves (AUC) provide a numerical measure of the model’s discriminative power, with a value of 1 representing perfect prediction and 0.5 indicating no discriminative power. While none of the profile groups reach perfection, several demonstrate substantial AUC values, indicating a robust ability to classify observations accurately.
-
In comparing these visual findings with the statistical data from Table 5.2, it is observed that subsets from Atauro (both with all gears and gill nets) yield higher AUC, accuracy, and kappa statistics, suggesting a more consistent and accurate classification of nutritional profiles. These subsets also show higher sensitivity and specificity, indicating a balanced predictive capability for identifying true positives and true negatives. Conversely, the Mainland subsets exhibit lower performance metrics, indicating a more challenging classification scenario. This is reflected in the ROC curves where the lines for the Mainland subsets are farther from the top-left corner, suggesting a lower true positive rate relative to the false positive rate compared to the Atauro subsets.
-
The positive predictive value (PPV) and negative predictive value (NPV), which provide insight into the model’s precision and reliability, also align with the ROC curve analysis, showing higher values for the Atauro subsets. This indicates that when the model predicts a particular nutritional profile for these subsets, it is more likely to be correct. The Matthew’s correlation coefficient (MCC) values, a balanced measure of quality for binary classifications, corroborate the ROC analysis by indicating that the Atauro subsets maintain a higher quality of prediction across classes.
-
In summary, the integrated analysis of Table 5.2 and Figure 5.3 reveals a differentiated performance of the XGBoost model across various subsets of fishing data. The model showcases commendable predictive strength in the Atauro subsets, with high AUC, accuracy, and kappa metrics indicating a reliable classification of nutritional profiles. The ROC curve analysis further supports this, with curves for Atauro subsets nearer to the desired top-left corner, denoting higher sensitivity and specificity. In contrast, the Mainland subsets, despite achieving moderate success, suggest an area for improvement, as seen by their relative distance from the optimal point on the ROC curves and lower performance metrics. This suggests that while the model is effective in identifying nutritional profiles in certain contexts, its performance is not uniformly high across all subsets.
-
-
-
-
Table 5.2: Performance Metrics for XGBoost Model Across Fishing Data Subsets. This table provides a comprehensive overview of the predictive performance of an XGBoost classification model for four distinct subsets of fishing data: Atauro with all gears (ATAURO AG), Atauro with gill nets (ATAURO GN), Mainland with all gears (MAINLAND AG), and Mainland with gill nets (MAINLAND GN). Key performance indicators include ROC-AUC (area under the receiver operating characteristic curve), accuracy, Kappa (kap), sensitivity (sens), specificity (spec), positive predictive value (ppv), negative predictive value (npv), Matthew’s correlation coefficient (mcc), Youden’s J index (j_index), balanced accuracy (bal_accuracy), detection prevalence, precision, recall, and F measure (f_meas). The metrics collectively reflect the model’s ability to discriminate between nutritional profiles, its overall accuracy, and the balance between the sensitivity and specificity for each subset.
-
The analysis of SHAP values from gill net models reveals the interaction between mesh size and habitat in predicting nutrient profiles. In the Atauro region, as depicted in Figure 5.4, smaller mesh sizes (below 40 mm) are consistently linked to a higher prediction of nutrient profile NP1 across various habitats, especially reefs, beaches, and mangroves. This suggests that smaller mesh sizes are generally effective across these diverse marine environments for predicting NP1.
-
For nutrient profile NP2, there is a noticeable increase in SHAP values within the 40 to 60 mm mesh size range, with reefs and beaches showing this pattern most clearly. This indicates that medium mesh sizes are particularly predictive of NP2 in these ecological settings.
-
Larger mesh sizes, specifically those between 60 and 70 mm, have been associated with nutrient profiles NP3 and NP4 across several habitats, including reefs, beaches, and mangroves. A more specific association is observed with mesh sizes between 70 and 80 mm, which are predominantly linked to predicting NP4. For the largest mesh sizes analyzed, nutrient profile NP5 emerges as the most likely prediction among the various profiles, especially in the Atauro data subset.
-
The SHAP values derived from the mainland data present a more varied pattern. Small mesh sizes (less than 35 mm) used in deep water and FAD environments are linked with the prediction of nutrient profiles NP3 and NP4, with the latter also being associated with reef and beach habitats. Mesh sizes in the range of 40 to 65 mm are strong predictors for nutrient profiles NP1 and NP5. Profile NP1 is most commonly predicted in reef and FAD settings, while NP5 is typically associated with deeper waters. At the larger end of the mesh size spectrum, nutrient profile NP2 becomes the most probable prediction, particularly when fishing occurs in deeper habitats.
-
-
SHAP results of all gears models …
-
-
-