Introduction

Since 1985, human settlements have rapidly expanded into flood-prone zones through urban and industrial development1,2. In recent years, the intensity and frequency of floods, especially in densely populated and economically active areas, have increased, indicating a shift in flood occurrence patterns. Research shows that areas with high population density, active economies, heavy rainfall, and low elevation are most vulnerable to flood impacts3. Studies indicate that climate change may contribute to an increased risk of flooding, with evidence pointing to a potential link between rising temperatures and the likelihood of more frequent or intense storms in some regions4. In addition to changes to the frequency and intensity of storms, rapid population growth and uncontrolled development have increased exposure to floods, leading to higher societal vulnerability to natural hazards5,6,7,8,9. Therefore, understanding the mechanisms influencing floods and gaining deeper insight into their main causes can significantly help communities adapt, build resilience, manage and plan appropriately, protect human lives and ecosystems, and reduce potential damage10.

Flood occurrence probability maps, which consider a set of factors and criteria influencing them, such as climatic, hydrological, land cover, and topographic features11,12,13,14, provide reliable flood zoning. Accurate assessment of flood zones requires simultaneously considering a wide range of variables using advanced analytical methods. Recent advances in machine learning (ML) enable models to analyze complex data, identify hidden patterns, and uncover linear/nonlinear relationships between variables15,16. These capabilities make ML pivotal for flood prediction, zoning, and susceptibility management. One of the key outputs of these models is the production of flood occurrence probability maps, which, as essential tools in flood engineering and management, provide precise spatial information, enabling the identification and classification of high-susceptibility areas. These maps not only aid in planning and sustainable development but also play a vital role in increasing public awareness and facilitating decision-making for preventive measures and policy-making17. In recent years, numerous studies have investigated the efficiency of ML models in flood mapping and identifying areas under susceptibility15,18,19,20,21,22,23,24,25,26,27,28,29,30. The use of ML models for flood mapping and classification of flood-prone areas has emerged as an innovative and effective approach in recent studies.

Overall, these studies have used various ML techniques and models, such as Tree-based models, etc., to map flood probabilities and analyze variables. The results of these studies indicate that the use of ML models can improve prediction accuracy and provide a better understanding of the impacts of various variables on flood occurrence. Chowdhury et al., focusing on northeastern Bangladesh, evaluated the vulnerability to flash floods via ML models such as ANN, RNN, RFGB, and CatBoost, alongside rainfall indices, topographic features, and satellite data (Sentinel-1). A comparison of the model performance revealed that the ANN, with its greater accuracy in predicting high-susceptibility areas, was the most suitable model for this analysis31. Prasad et al., on Majuli Island, India, aimed to produce a flood susceptibility map via multitemporal radar images and six powerful ensemble ML models. After creating a six-year flood inventory map, 12 key variables (e.g., elevation, slope, rainfall, land use) were selected from 17 initial candidates using the Boruta algorithm and multicollinearity analysis. The RF model was identified as the most accurate model for flood zoning32. Gholami et al. developed a flood risk map in southern Iran by integrating a bidirectional long short-term memory (bLSTM) deep learning model for flood susceptibility mapping and the COPRAS multi-criteria decision-making (MCDM) model for flood vulnerability assessment. The results showed that topographic variables were the most influential factors, and the final model accurately classified flood risk into five categories, from very low to very high33. In another study, Li et al. applied ML models including K-means, BP, SVM, and RF along with hydrodynamic simulation of a heavy rainfall event using the TELEMAC-2D model to assess urban flood susceptibility. Their results demonstrated that the RF model outperformed the others in accurately identifying flood-prone areas, with its risk map closely matching observed inundation zones. This approach provides a reliable scientific basis for urban flood risk management and prevention34. Also, in another study conducted by Norallahi and Seyed Kaboli, the aim of the research was to develop an urban flood susceptibility map for Kermanshah, Iran, using limited data and ML models including RF, MaxEnt, Naïve Bayes, and Genetic Algorithm Rule-Set Production. The results showed that the RF model performed best with evaluation metrics such as Area Under the Curve (AUC)-ROC (99.5%) and Kappa (98%), and that distance to canal and land use were the most influential factors35. In another study aimed at developing a flood susceptibility zoning map, 22 influential factors and five ML models including Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Naïve Bayes (NB), Decision Tree (DT), and a hybrid model were used. The models were trained and evaluated using data from flood and non-flood locations. The results showed that the RF, GBM, and hybrid models performed best, each achieving an AUC value of 97%36. Wei and colleagues assessed urban flood susceptibility in Shanghai using five ML models (such as CatBoost and XGBoost). All models showed high accuracy, with CatBoost performing best at 95% accuracy. Key factors influencing flooding included high road network density, high population, proximity to rivers, and low land slope. The results showed that central areas of the city are at higher flood risk than peripheral areas. This research demonstrates that ensemble models can be useful tools for urban planning and flood risk reduction37.

Overall, although previous studies have widely employed individual ML models and their ensembles for flood susceptibility mapping, few have comprehensively assessed their combined predictive performance across multiple algorithms in large-scale, data-scarce, and geomorphologically diverse basins. A major contribution of this study lies in the selection of the Great Karun River Basin as a complex and strategically critical study area. This basin, which spans several provinces in southwestern Iran, has experienced numerous devastating floods in recent years, resulting in severe infrastructural and socioeconomic damages. Yet, despite its significance, it has remained underexplored in terms of advanced flood modeling using ML approaches. In this study, factors that have not been previously examined in flood susceptibility mapping for the study area such as distance to dams and snow cover were included. The significance of these factors was analyzed to determine whether they should be considered as key variables in future flood-zoning models. Thus, this research aims to evaluate the actual role of these factors in the modeling process and their impact on improving the accuracy of flood susceptibility mapping.

Our study introduces an innovative Voting-based ensemble framework that integrates the outputs of five different ML models including RF, XGB, LightGBM, DT, and GLM, at the pixel level. Ultimately, this integrated voting model outperforms all the individual models used in this study in terms of accuracy. Unlike previous approaches that relied on single-model predictions or simple model averaging, our method incorporates spatial agreement metrics among models during the integration process to enhance the reliability of predictions and reduce uncertainty in high-susceptibility areas. Given that the study area is large and complex, the proposed framework can also be useful for similarly complex basins. This framework is developed using multi-source data and various ML models, and the accuracy of GFD data have been validated against observational flood data. In addition to advancing innovation in ML-based flood modeling, this approach provides a transferable tool for assessing flood susceptibility in complex basins worldwide.

Materials and methods

Study area

The Great Karun River Basin, encompassing ten sub-basins, spans an area of approximately 66,700 square kilometers. It is considered a sub-basin within the larger drainage basin of the Persian Gulf and the Gulf of Oman. Within this basin lies the Karun River, the most voluminous, largest, and longest river in Iran, which plays a vital role in supplying the region’s water and hydrological resources. Approximately 74.6% of the Great Karun Basin consists of mountainous and elevated areas, whereas 25.4% comprises plains and low-lying regions. This basin is one of Iran’s most significant and strategically complex watersheds, as it encompasses parts of six major provinces: Khuzestan, Lorestan, Chaharmahal and Bakhtiari, Kohgiluyeh and Boyer-Ahmad, Markazi, and Isfahan. Khuzestan Province accounts for the largest portion of the basin, covering an area of approximately 27,320 square kilometers, whereas the smallest share belongs to Markazi Province, with approximately 980 square kilometers. The basin is characterized by substantial climatic diversity, ranging from arid and semi-arid areas in the southern zones to regions with higher precipitation in the central and western zones, as well as snow-prone areas in the east and northeast. These climatic variations result in significant spatial variability in precipitation patterns, temperatures, and evaporation rates across the basin. In addition, the basin’s diverse geography—which includes vast plains, high mountains, deep valleys, and lowlands, has a major influence on the flow and hydrological behavior of its rivers. Extensive dam construction has taken place within the Great Karun Basin, with over 40 dams built to date, significantly impacting its hydrological regime. Nevertheless, the changes caused by dam construction and large-scale inter-basin water transfer projects from the river’s main headwaters have introduced considerable challenges and complexities in the management and distribution of water resources in the basin. Moreover, the region has periodically experienced severe flooding, among which the devastating flood of spring 2019 stands out, causing substantial damage to infrastructure and local livelihoods. From a strategic perspective, the Karun River holds particular significance. It is the primary source of drinking, agricultural, and industrial water for Khuzestan, Chaharmahal, Bakhtiari, and other provinces within the basin. It also plays a crucial role in waterborne transportation and hydroelectric power generation via its dams. Consequently, the basin is recognized as one of the most critical, complex, and strategically important watersheds in Iran, playing a key role in national policy-making and planning. The geographical location of the study area is illustrated in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Map of the study area (created using QGIS v3.44 (https://qgis.org). Background basemap sources: Esri, Garmin, USGS, and NPS).

Datasets

To assess flood susceptibility in the study area, the spatial data utilized were categorized into four main groups: climatic data, hydrological data, land characteristics and land cover data, and topographic and geo-morphological data. All the data were defined within the WGS 84 UTM Zone 39 N coordinate system, and a spatial resolution of 300 m was adopted. All datasets used in this research were collected and processed from 2000 to 2019. A portion of these data was collected through official data-producing organizations and made freely available via reference websites. Additionally, more data was extracted and processed via the Google Earth Engine platform, which enables extensive and precise spatial data analysis. Table S1 and Table 1 presents the complete specifications of the data sources used, the types of data, and the criteria and sub-criteria associated with each category. This table outlines the systematic structure of the research data and illustrates how a combination of various data sources, including shapefiles, raster files, and computed datasets, was employed to analyze flood susceptibility. Additionally, for model validation, observed flood points were also used to evaluate the performance of the ML models.

Table 1 Details of the main criteria and sub-criteria in the Great Karun Basin.

Extraction of flood-prone areas and their spatial distribution

In this study, the Google Earth Engine platform was used to extract flood extent areas of the study region from the Global Flood Database (GFD)38. This database contains spatial maps of flood events and their extents for the period from 2000 to 2018. Each pixel in the images, with a resolution of 250 m, was accurately classified as either flooded or non-flooded/permanent water. Figure 1 shows the flooded areas in the study region during the period from 2000 to 2018.

As one of the most prominent examples in the history of flood events within the Great Karun River Basin, around the March 2019 flood in Great Karun basin with a particular emphasis on the Khuzestan Province stands as a vivid and alarming illustration of the consequences of unsustainable water resource management and unbalanced urban development in flood-prone regions of Iran. This event is considered one of the most severe and widespread hydrological phenomena in recent decades, triggered by intense, simultaneous, and widespread rainfall across major southwestern catchments. According to expert assessments, the flood caused extensive inundation of cities, villages, transportation infrastructure, railways, and agricultural lands, with cascading impacts reaching critical sectors such as oil, energy, commerce, industry, and agriculture. Key aggravating factors included the poor spatial planning of urban and rural settlements within riverbeds and floodplains, excessive reliance on upstream dam performance without contingency planning for emergency reservoir releases, disregard for the natural role of floodplains, and lack of flood-resilient urban planning practices.

Estimates indicate that the return period of this flood exceeded 500 years, classifying it as a rare but extremely high-susceptibility event in terms of intensity and spatial extent. The disaster highlighted, once again, the crucial importance of flood susceptibility mapping as a strategic tool for identifying high-susceptibility zones, managing flood susceptibility, guiding land-use planning, and informing long-term policy-making in natural disaster management. In this context, the integration of advanced technologies such as ML-based models plays a vital role in enhancing the accuracy, automation, and real-time updating of flood susceptibility modeling and vulnerability prediction processes.

In order to accurately map and delineate the flooded areas of the Great Karun River Basin, inundated during a catastrophic flood event occurred in March 2019, we applied a Normalized Difference Water Index (NDWI) change detection technique in Google Earth Engine (GEE). This methodology employed imagery from the Sentinel-3 OLCI sensor to contrast surface water conditions before and after the floods. The index was estimated for two time points, before (April 2018) and after flood end (April 2019). Following a positive increased NDWI (NDWI > 0.1) were new flooded areas that are related to flood affected areas were described. Therefore, we produced a new flood extent data set for the years 2000–2019 in the Great Karun Basin. The dataset applied in this study is available from Google Earth Engine, and it does not necessitate any particular copyright or permission for academic use.

Method

The overall methodology of the research is illustrated in Fig. 2. In the first stage, the factors influencing flood susceptibility assessment in the study area were identified. This stage involved selecting indicators and criteria that have the greatest impact on flood occurrence based on climatic, topographic, hydrological, and land cover characteristics. In the second step, spatial maps related to each influencing factor were generated via spatial analyses within a GIS environment. These maps were then used as inputs for ML models. In the third stage, training and testing datasets were created, and various ML models were trained. After training, the models were applied to maps of the influencing criteria, and flood susceptibility maps were generated for the study area based on each model. Consequently, the susceptibility maps produced by different models were combined using a voting models approach, where the final and integrated flood susceptibility map for the study area was generated based on majority voting among the models.Also, to examine the correlation between the criteria influencing flood occurrence, Pearson’s correlation test was used. This test measures the strength and direction of the linear relationship between two quantitative variables, and the Pearson correlation coefficient was used to numerically express the correlation between different criteria. The correlation results will be displayed using a heatmap to visually illustrate the relationships and the strength of the correlations.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Flowchart of the proposed method (Generated using Microsoft PowerPoint version 2016).

Influential criteria

The criteria influencing flood susceptibility mapping for each geographical location are categorized into four main groups: climatic criteria, hydrological criteria, land cover and land characteristics criteria, and topographic criteria. The selection of these effective criteria was based on a comprehensive review of relevant literature as well as the expert opinions of specialists familiar with the region. Details of these criteria are provided in Table 1.

  • Climatic

This category includes four sub-criteria: precipitation, temperature, evapotranspiration, and snow cover. These sub-criteria represent the climatic parameters that influence the likelihood of flooding. Each of these sub-criteria may contribute directly or indirectly to flood occurrence. The combination of these variables leads to specific climatic patterns, and analyzing their interrelationships is essential for improving the accuracy of flood susceptibility assessments.

  • Hydrological

This criterion includes the sub-criteria of distance from rivers, distance from dams, drainage density, stream power index, and flow accumulation. This group focuses on the hydrological and physiographic characteristics of the watershed, each of which influences surface water flow behavior and the basin’s response to rainfall and runoff generation. Analyzing and integrating these sub-criteria enable the identification of flood-prone areas based on hydrological conditions.

  • Land Cover and Land Characteristics

This group examines the influence of surface features and land cover types on hydrological processes. Its sub-criteria include soil type, geological type, vegetation cover, and land use/land cover. These factors affect the infiltration and water retention capacity of the soil, the direction and intensity of surface runoff, and ultimately the likelihood and severity of floods.

  • Topographic

This category addresses the physical and morphological features of the terrain and their role in the movement and distribution of surface water. The sub-criteria include elevation, slope, topographic wetness index, curvature, and sediment transport index. These sub-criteria shape runoff patterns and flow behavior and play a key role in flood susceptibility assessment and optimal flood susceptibility management.

Mapping of influential criteria

To generate proximity maps for various locations, the Euclidean distance function in the Spatial Analyst tool was used. The Reclassify function in the Spatial Analyst tool was applied to the DEM to create the landform map. To generate maps of evapotranspiration, temperature, and precipitation, the Topo to Raster function (Spatial Analyst) was utilized. Other maps, such as distance from rivers, drainage density, land curvature, and various indices, including the Topographic Wetness Index39, Stream Power Index19, and Sediment Transport Index40 , were also generated via commands in QGIS. Various tools have been applied to raw data in this software to create criteria maps. For spatial analysis, modeling with ML algorithms requires standardization of the different criteria into a common unit for comparison. Criteria can have either a positive or negative relationship with the research objectives. Positive criteria prefer higher values, whereas negative criteria prefer lower values. Therefore, the minimum method was used for standardizing positive criteria, and the maximum method was applied for negative criteria.

Machine learning models

In this study, six ML models were employed for flood susceptibility modeling and analysis, including five base models and one voting model that integrates the outputs of all base learners. Most of these models are based on DT architectures, while the Generalized Linear Model (GLM) serves as a linear and interpretable benchmark. The selection of this model ensemble was intended to cover a wide range of analytical approaches from simple and interpretable models such as GLM to more complex and nonlinear tree-based algorithms. The selection of the machine learning model ensemble in this study was conducted deliberately and based on data-driven and analytical considerations. Within this framework, although Random Forest is itself an ensemble method composed of multiple decision trees, the simultaneous inclusion of Decision Tree as a base model plays an important role in analyzing decision-making behavior and interpreting the rule-based structure underlying the data. The DT model enables direct examination of decision paths, thresholds, and the relative contribution of variables, thereby serving as an interpretable reference for comparison with more complex models. In contrast, RF reduces variance and enhances prediction stability by averaging over an ensemble of trees, resulting in greater robustness against noise and overfitting. The concurrent use of these two models allows for the evaluation of differences between single-model rule-based learning and ensemble learning, thereby facilitating a clearer understanding of the impact of aggregation on prediction accuracy and stability. Overall, the selection of this ensemble of models was grounded in theoretical considerations within the framework of statistical learning theory and the analysis of the functional structure of the data. The Generalized Linear Model (GLM), as a representative of parametric models, assumes a linear structure in the feature space and enables formal statistical inference, significance testing of coefficients, and direct interpretation of variable effects. Accordingly, it was employed as a benchmark to assess the adequacy of linear relationships in the data. In contrast, the Decision Tree (DT) constitutes a nonparametric, piecewise approximation of the response function, capable of modeling nonlinear relationships, threshold effects, and high-order interactions without requiring distributional assumptions. Tree-based ensemble algorithms, including Random Forest (RF), XGBoost, and LightGBM, leverage the principles of bagging and gradient boosting to theoretically reduce variance, control bias, and improve convergence in the error space, thereby establishing a more optimal bias–variance trade-off. Moreover, these algorithms are computationally efficient for high-dimensional data and complex structural patterns, and they demonstrate greater robustness to noise and overfitting in terms of generalization performance. Consequently, the deliberate integration of parametric linear models, rule-based nonparametric methods, and advanced ensemble algorithms enables a comprehensive evaluation of both linear and nonlinear functional structures, enhances predictive stability, and achieves maximal predictive accuracy while preserving interpretability and generalization capability.

All in all, employing this ensemble of models allows for a comprehensive comparison of performance and accuracy across different ML approaches and contributes to the extraction of more reliable spatial patterns for identifying and classifying flood-prone and low-risk areas. This integrated approach provides a scientifically and technically sound foundation for developing flood susceptibility maps with high accuracy and interpretability. Finally, employing this ensemble of models enables a systematic comparison of performance and accuracy across different ML approaches, leading to more reliable spatial pattern extraction for identifying and classifying flood-prone and low-risk zones. This integrated approach provides a strong scientific and technical foundation for developing high-accuracy and interpretable flood susceptibility maps. Furthermore, to optimize the hyperparameter values of all models, the Grid Search algorithm41,42,43 was applied to ensure optimal model performance and parameter tuning. In addition, the dataset was randomly divided into two subsets, with 80% of the data used for training and 20% reserved for testing the models’ predictive performance. It is important to note that, due to the nature of natural hazard modeling such as floods and the inherent class imbalance between flooded and non-flooded areas, a common statistical random oversampling technique44,45,46 was applied in the training stage to prevent the model from becoming biased toward the non-flooded class.

  • Extreme Gradient Boosting (XGBoost)

XGBoost is a state of the art ML algorithm which is a type of tree boosting model. Based on the construction of Gradient Boosting, XGBoost improves its model fitting performance by optimizing more advanced methods including regularization, shrinkage and using a technically sound way to handle missing values. In classification tasks, XGBoost generates a series of weak DT and iteratively combines them to create a strong model capable of explaining complex nonlinear relationships in the data accurately. These algorithms are designed to avoid overfitting and to support parallel computation, making them widely adopted in data science research47,48,49. In this study, the hyperparameters of XGBoost were optimized using Grid Search. The three most important hyperparameters were set as follows based on standard practice for raster-based classification: the number of estimators (n_estimators) was 200 to provide sufficient ensemble size, the maximum tree depth (max_depth) was 6 to balance model complexity and overfitting, and the learning rate (eta) was 0.1 to ensure gradual and stable convergence of the boosting process.

  • Decision Trees (DT)

DT is a supervised learning algorithm used for classification and regression problems. This model makes decisions based on the features of the data in a hierarchical, tree-like structure50. In general, DT algorithms are among the most commonly used and accurate ML methods for flood susceptibility mapping51. By analyzing environmental and hydrological data, these models identify flood-prone areas and generate flood susceptibility maps. In this modeling, the three most critical hyperparameters were set as follows: the maximum tree depth (max_depth) was 8 to capture sufficient complexity of spatial and environmental patterns without overfitting, the minimum number of samples required to split an internal node (min_samples_split) was 4 to prevent creating nodes from very small pixel subsets, and the minimum number of samples required at a leaf node (min_samples_leaf) was 2 to ensure stable predictions while preserving spatial detail.

  • Random Forest (RF)

The RF algorithm is an advanced and widely used supervised ML model that leverages an ensemble of decision trees, providing high capability for classifying and predicting complex and nonlinear data. This model generates multiple independent DT through random sampling of data and features and then aggregates their results via average or majority voting, which results in stable, accurate, and overfitting-resistant performance52. In flood susceptibility mapping, RF can uncover hidden patterns in the relationships among variables by integrating spatial, hydrological, and climatic data and generating high-accuracy spatial susceptibility maps53. These features make RF a powerful tool for susceptibility management, regional planning, and reducing flood damage. The three primary hyperparameters were set as follows: the number of trees (n_estimators) was 250 to ensure robust ensemble performance, the maximum depth of each tree (max_depth) was limited to 10 to balance complexity and overfitting risk, and the minimum number of samples required at a leaf node (min_samples_leaf) was set to 2 to maintain spatial resolution while stabilizing predictions.

  • Light Gradient Boosting Machine (LightGBM)

The LightGBM is an advanced gradient boosting framework, released by Microsoft which has been designed to be used for classification and regression type problems. It is built upon DT based learning, with emphasis on speed and memory efficiency, which makes it suitable for use even on very large datasets. Distinguished from level-wise growth, LightGBM grows the tree leaf-wise and can lead to lower loss and higher accuracy. It is also armed with features like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), which can handle diverse types of data, perform extremely well across all kinds of datasets54. For the present study, hyperparameters of LightGBM were optimized and the three most critical hyperparameters were set as follows: the number of boosting iterations (n_estimators) was 200 to provide sufficient ensemble strength, the maximum depth of each tree (max_depth) was 8 to prevent overfitting while capturing spatial patterns, and the learning rate (learning_rate) was 0.05 to ensure stable convergence and accurate modeling of complex flood-prone patterns.

  • Generalized Linear Model (GLM)

The GLM is a flexible and important statistical framework in supervised learning that allows for modeling relationships between independent and dependent variables, even when the distribution of the response variable is not normal55,56. Unlike classical linear regression, which assumes a normal distribution, GLM can accommodate different distributions (such as Poisson, binomial, gamma, and Gumbel distributions)57, making it a highly suitable option for environmental, climatic, and hydrological data, which are often non-normal and scattered. For the GLM applied in this research, three key hyperparameters were optimized using Grid Search: the family of the distribution (family) was set to binomial to suit the binary classification of flood-prone versus non-flooded pixels, the link function (link) was set to logit to model the probability of occurrence appropriately, and the regularization parameter (alpha) was set to 0.01 to prevent overfitting while maintaining interpretability.

Performance assessment

After generating flood susceptibility maps based on the outputs of different machine learning models, the study area was classified into five flood susceptibility classes: very low susceptibility (0–0.2), low susceptibility (0.2–0.4), moderate susceptibility (0.4–0.6), high susceptibility (0.6–0.8), and very high susceptibility (0.8–1.0). This classification was performed using equal interval thresholds to ensure consistent and fair comparison among the different models. The primary objective of this approach was to facilitate model comparison and to enhance the practical usability of the results for decision-making purposes.To assess the accuracy of the models used in producing these maps, the Prediction Rate and AUC index were employed. First, the area of each susceptibility class within the study area was calculated, and the percentage of each class relative to the total area of the region was determined. Next, the percentage of areas where actual flood occurrences were recorded and which fell into each class was calculated as the Producer’s Accuracy for that class. Finally, the prediction rate for each class was computed by dividing the Producer’s Accuracy of that class by the percentage of the class area in the entire study region. The general formula for calculating the prediction rate is presented as follows.

$${\text{Prediction Rate}} = \frac{{{\text{Producers}} {\text{Accuracy}}}}{{{\text{Area }}\;{\text{Percentage}}\; {\text{of}}\;{\text{the}} \;{\text{Class}}}}$$
(1)

The higher this rate is for high and very high-susceptibility zones, the better the model is at accurately identifying flood-prone areas.

In classification tasks, each model prediction can fall into one of four categories: TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative). In imbalanced datasets, where the number of samples in one class greatly exceeds the other, simple metrics can be misleading.This is why AUC serves as a key metric for evaluating model performance. AUC measures the model’s ability to distinguish between classes and is largely unaffected by class imbalance. The closer the AUC value is to 1, the better the model is at correctly identifying positive and negative instances, providing a more reliable overall assessment of performance. The higher the AUC value close to 1, the better model performance; when around 0.5, indicate that model classification ability is random or poor. The AUC (Area Under the Curve) formula, expressed as

$$\mathbf{A}\mathbf{U}\mathbf{C}= \frac{(\sum \text{TP}+\sum \text{TN})}{\text{P}+\text{N}}$$

Additionally, in this study, the Global Flood Database (GFD) dataset was carefully evaluated and compared with local observational data to ensure that the generated flood susceptibility maps were consistent with observed flood events, thereby enhancing the scientific validity and reliability of the results.

Ensemble of machine learning models (voting approach)

In this research, a voting ensemble ML technique (soft voting) was used to develop the ultimate flood susceptibility map in this study. Rather than fusing binary classification maps of the various models, this approach combines the probability outputs produced by different algorithms to improve predictive performance and robustness. Five base classifiers were applied: GLM, DT, RF, XGB, and LightGBM. This method enhances predictive performance, reduces individual model bias, and provides more stable and reliable spatial predictions, making it particularly suitable for flood classification tasks with complex and nonlinear relationships among environmental and hydrological factors.

All input conditioning factors were digitized as raster maps and converted into vectors, whereas the output variable was obtained from the inundation inventory map. The dataset was divided into training, validation and test sets by stratifying them, while handling class imbalance in the training data with the random oversampling of the minority (flooded) class. All these base models were trained separately and with equally-balanced weights, and their outputs were finally ensembled via a soft voting manner, i.e., the final flood probability of each pixel was calculated as the average of all predicted probabilities for them across the models.

The performance of the ensemble model was assessed in terms of performance, as well as the AUC of the receiver operating characteristics curve for all two sets – training, and test. ROC curves were drawn to represent the classification performance of ensemble model at various thresholds. Finally, the probabilistic predictions of the model were reconstructed to generate a spatially continuous flood probability map that was exported in the form of a GeoTIFF file. This ensemble collective voting method results in a more stable and accurate engine for flood susceptibility analysis by utilizing the advantages of several ML algorithms as well as reducing each model s bias58,59.

Result

The results obtained from the Pearson correlation matrix, as shown in Fig. 3, display the statistical linear relationships between various parameters influencing the probability and susceptibility of flooding. This matrix directly indicates how changes in one parameter can affect other parameters. For a more detailed examination of these relationships, the sub-factors affecting flood susceptibility were analyzed within four main groups: climatic factors, hydrological factors, land cover, and surface features (topography). The results of these analyses are presented in four separate correlation matrices, which are discussed in detail below. In the climatic factor group, relatively high correlations between some sub-factors are observed. For example, evaporation and temperature have a high correlation of 0.88, indicating a strong relationship between these two climatic variables. Evapotranspiration also has a moderate correlation with precipitation, with a value of 0.63. On the other hand, snow cover has a relatively weaker correlation with other factors, ranging from 0.43 to 0.61. This level of correlation is expected in this group, as climatic variables typically exhibit high interdependence. In the hydrological factor group, most sub-factors show low correlations with each other. The highest negative correlation in this group is between the cumulative flow and the flow power index, with a value of -0.43. Other sub-factors, such as distance to rivers and drainage density, have much lower correlations, generally approximately 0.09 or less. This indicates the relative independence of these variables from one another, which is considered advantageous for accurate modeling. In the land cover and surface features group, the correlations are generally very low. The highest positive correlation observed is between geology classification and land cover, with a value of 0.26. The other correlation coefficients in this group mostly range from -0.15–0.19, indicating a high degree of independence between these factors. This low spread in correlation coefficients contributes to improving the predictive ability of each variable independently.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Pearson correlation heatmap between sub-criteria’s influencing flood susceptibility mapping (Generated using Python 3.11.4 (www.python.org) within the within the Google Colaboratory environment (https://colab.google/)).

Finally, in the topographic feature group, the range of correlations is broader. The highest negative correlation in this group is between the DEM and the TWI, with a value of -0.50. Additionally, a significant positive correlation exists between the STI and the curvature of the terrain, with a value of 0.56. The other correlation coefficients in this group range from 0.25–0.46. These correlation patterns are logical, as certain topographic features naturally correlate with each other.

A map of the flood occurrence probability criteria used in ML models is presented in Figure S1. This image includes a collection of environmental and geomorphological parameter maps generated for spatial analyses and hydrological and environmental modeling within a specific watershed. These maps were developed via remote sensing data, Geographic Information Systems (GIS), and DEM, covering a range of factors influencing natural and human processes at the regional scale. The top rows of the maps display indicators such as the DEM, Precipitation, and Evapotranspiration, which are critical in hydrological analyses, water resources potential assessments, and evaluation of the impacts of climate change on weather-related processes. The spectral variations in these maps represent the spatial variability of each parameter and are differentiated via standard color codes. Subsequent maps illustrate parameters like Temperature, the NDVI, Snow Cover, and Distance to Water Sources (streams and dams). These datasets play key roles in assessing ecological conditions, analyzing vegetation cover, and evaluating surface runoff potential. Additionally, maps of the TWI, SPI, and STI are used to model erosion and sedimentation, analyze surface processes, and gain better insights into watershed dynamics. Finally, maps such as slope, curvature, and landcover provide comprehensive information about the region’s morphological and anthropogenic land-use features. In particular, land-use maps, with detailed classifications, including forests, agricultural land, residential areas, water bodies, and rangeland cover, serve as key tools for land-use analysis, spatial planning, and assessing human impacts on the environment. To evaluate the relationships among the sub-criteria affecting flood occurrence, correlation analysis was conducted across four main groups: climatic, hydrological, land cover, and topographic factors. The results show that in the climatic group, there is a strong and significant correlation of 0.88 between evaporation and temperature, indicating the direct influence of increasing temperature on evaporation in the region. Additionally, precipitation shows a moderate correlation with both evaporation (0.63) and temperature (0.51), reflecting a triangular relationship among humidity, precipitation, and evapotranspiration processes in the area. In contrast, snow cover exhibited lower correlations with other climatic parameters, which may be due to its more independent behavior in response to seasonal changes. Within the hydrological group, correlations are generally low, showing relative independence among the parameters. The most significant negative correlation is observed between the cumulative flow and stream power index, with a value of -0.43, which is logical, greater flow accumulation leads to reduced flow concentration and diminished erosive force at local scales. The other correlation coefficients in this section are very low, close to zero, indicating that each variable contributes unique and non-redundant information to the model. In the land cover and surface characteristics group, the correlations are also very low. The highest positive correlation, 0.26, is observed between geology class and land cover, whereas other criteria, such as soil class and the NDVI, have very weak correlations. These results highlight the independence of these variables and their high informational value in flood susceptibility modeling. In the topographic feature group, the range of correlations is broader. The highest negative correlation is between elevation and the topographic wetness index, with a value of -0.50, indicating that moisture accumulation decreases with increasing elevation. Furthermore, the highest positive correlation is found between slope and terrain curvature, with a value of 0.56, reflecting a strong dependency of these features on the region’s physical structure. Overall, the correlations in this group are acceptable, and the relative independence of the criteria is maintained. The results of these correlation analyses indicate that the selected dataset is sufficiently diverse and independent, making it well suited for flood probability modeling. The presence of certain correlations, particularly within the climatic and topographic groups, is natural due to the inherent characteristics of the data and does not pose issues for modeling processes, especially given the ability of ML models to handle intra-group correlations.

Overall, this set of maps illustrates the influence of factors affecting flood occurrence across the watershed. As observed, elevation, slope, hydrological indices, and vegetation cover create varying patterns of flood susceptibility in different parts of the region. Downstream areas, particularly the southern and southwestern sections of the watershed, show higher values in elevation and related indices, indicating runoff concentration and increased flood risk. In contrast, maps of precipitation, temperature, evapotranspiration, and snow cover indicate that the northern and northwestern parts are more sensitive to flooding due to climatic conditions and surface characteristics. The spectral variability in these maps reflects the intensity and spatial distribution of each parameter, enabling a comprehensive analysis of the combined effects of natural and environmental factors on flood hazard patterns within the region. The spatial distribution of factors and correlation analyses confirm that the dataset is appropriate, both in terms of spatial variation and statistical structure, for flood susceptibility modeling and can serve as an effective input for predictive models.

Analysis of the spatial distribution maps of land cover, soil classes, and geological classes in the study area provides valuable information about the region’s physical and environmental characteristics and their impacts on flood susceptibility condition (Figure S2). The Great Karun River Basin exhibits a remarkable diversity in its physiographic, land cover, soil, and geological characteristics, all of which play a crucial role in shaping its hydrological behavior and flood dynamics. Analysis of environmental and spatial parameters indicates that the basin comprises a balanced combination of various land use types, including agricultural lands, rangelands, urban areas, alluvial plains, and regions with dense vegetation cover. This diversity reflects the influence of climatic variability, topographic complexity, and geological processes across the basin, which collectively affect soil permeability, runoff intensity, and ecosystem stability. From a land cover perspective, the coexistence of both dense and sparse vegetation across different areas has created conditions that, in some regions, facilitate higher water infiltration and reduced surface runoff, while in others with poor or bare surface coverage, the potential for increased runoff and flood vulnerability is more pronounced. Such heterogeneity contributes to a spatially variable hydrological response, where flood behavior is strongly influenced by local land surface and vegetation characteristics. In terms of soil properties, the presence of multiple soil types with distinct physical and chemical characteristics indicates a high variability in water retention and infiltration capacity throughout the basin. Soils with favorable texture and permeability play a key role in reducing flood risk and enhancing hydrological stability, whereas those with lower permeability and less stable structures can lead to greater surface runoff and localized flood susceptibility. Geologically, the basin encompasses a wide range of lithological and structural units, where differences in rock permeability, hardness, and formation structure cause significant spatial variation in surface water movement. The mixture of alluvial deposits, sedimentary formations, and impermeable bedrock layers in different zones contributes to diverse patterns of water storage, infiltration, and surface flow pathways across the basin. Overall, these characteristics demonstrate that the Great Karun Basin possesses a favorable ecological and geomorphological diversity, which, while increasing the complexity of its hydrological processes, provides a valuable foundation for more accurate flood susceptibility modeling and integrated water resource management. This structure underscores the importance of simultaneously considering the interactions among geological, soil, and land cover factors in any comprehensive flood assessment or spatial modeling framework for the basin.

Figure 4 presents a set of flood susceptibility maps for a specific region developed using geographical models to delineate various flood susceptibility levels across different parts of the area. The Figure includes five distinct panels, each classifying flood susceptibility levels via a color-coded system ranging from navy blue (very low susceptibility) to red (very high susceptibility). These maps offer a detailed analysis of flood threats in the region, with high spatial accuracy and scales ranging from a few kilometers to over 180 km. Navy blue areas represent regions with very low flood susceptibility and are typically characterized by geographic or hydrological features that reduce the likelihood of flooding. These may include elevated terrains or natural structures such as forests or dense vegetation, which help mitigate the impact of severe floods. In such areas, the flood susceptibility is significantly reduced because of favorable geological and climatic conditions. The turquoise and green areas correspond to low and moderate flood susceptibility levels, respectively. In these zones, floods may occur under specific conditions, such as intense rainfall or sudden changes in water flow. These areas are often affected by seasonal or environmental changes and may experience minor to moderate floods during certain periods. Careful planning for water resource management and optimal land use is essential in these regions. Finally, orange and red areas indicate high to very high flood susceptibility and are vulnerable to severe flood susceptibility. These zones are typically located near rivers, wetlands, or low-lying plains, where heavy rainfall or snowmelt can easily trigger destructive floods. These regions require urgent attention and the implementation of targeted measures to strengthen infrastructure and mitigate flood damage. These maps are particularly valuable for urban planning and disaster management. Since floods can cause severe damage to infrastructure, water resources, and human populations, incorporating these maps into local and national policymaking is crucial for identifying high-susceptibility zones and allocating resources accordingly. Moreover, similar maps can be used to simulate the impacts of climate change on flood patterns and identify new areas that may face increased susceptibility in the future. Especially in the context of climate change, which is leading to more intense precipitation and shifts in weather patterns, such maps have become essential tools for forecasting and reducing flood susceptibility. To summarize, these maps serve as vital tools for flood susceptibility assessment and can be utilized in scientific and applied research, environmental policymaking, and natural and water resource management. They also play a key role in developing susceptibility mitigation strategies and preparedness plans for managing flood-related disasters.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Flood susceptibility mapping via ML models (XGB, DT, RF, LightGBM, GLM, & Voting). Generated using Python 3.11.4 (www.python.org) and QGIS version 3.44 (https://qgis.org).

The results for six models in Fig. 5 (including RF, XGB, LightGBM, DT, GLM and Voting Model) are strong evidence demonstrate divergence in predicting the flood occurrence probability in the research area since they all have their own feature to recognize areas at different risk level.DT and voting Model both predicted the largest proportions of the study area within the “very low flood probability” category (84.22% and 73.38%, respectively). This finding suggests that these models can accurately predict low-susceptibility areas and represent them appropriately, so as not to overestimate risk. The GLM attributed the smallest proportion of area (38.57%) to this category but a larger proportion of flood likelihood was assigned across the area by it against other models.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Percentage of the study area assigned to each flood susceptibility class by the different machine-learning models (XGB, RF, DT, LightGBM, GLM, and Voting model). Generated using Python 3.11.4 (www.python.org).

For the “low susceptibility” category, GLM has the largest portion at 24.72%, observe that XGB ranks with 22.58%and RF wins only by a strikingly distant from XGB, i.e., 16.25%. DT achieves a minimum value in this class of 6.98% indicating its ability to locate low-Risk areas with precision. Within the “moderate probability” class, GLM outnumbers the other models again with the highest percentage: 15.03%, and DT showed the smallest among them :1.48%, indicating that GLM can classify mid range flood susceptibility better than other models; other models are more useful for separating low risk areas from high risk ones. At “high probability” class, LightGBM with 4.67% is ahead GLM with 10.82%. DT has the middle position with 2.69%, but RF, Voting, XGB have comparable portions in this class (3.33%, 3.04% and 4.13% respectively). These findings show that the models differ in their response, i.e., sensitivity for the high flood susceptibility areas, while ensemble approaches such as voting compromise between sensitivity and correctness. At last of the classes, in “very high probability”, GLM has biggest fraction with 10.86%, followed by LightGBM and XGB respectively: 5.45% and 4.45%. RF, voting and DT – 5.04%, 4.39% and 4.63%, respectively. These results illustrate that the voting model is effective at identifying high-specific-susceptibility areas, as well giving reasonable and fair proportions to all probability classes, thus allowing for its use in flood-related management and planning. In general, the results show that each model has its own strengths and sensitivities in delineating areas with different flood probabilities. The voting model performs well and not too optimistic, aggregating various information from several models, good to cover low-, medium-, and high- susceptibility regions. This property makes it a quite applicable tool in urban planning, water resources exploitation and flood prevention, which reflects its great practicability.

The performance results from six models RF, XGB, LightGBM, DT, GLM and the voting model indicate that they are quite different in predicting flood occurrences with their true locations within the studied area (Fig. 6). These numbers are the ratio of actual flood points by flood susceptibility class (very low, …, very high) In the very low susceptibility category the XGB model has least percentage of 0.42%, followed by voting model with 0.47% then DT Models gives highest value at 4.89%. This suggests that the XGB and voting models successfully identified very low-risk areas, demonstrating good precision in separating truly safe-from-flood areas or not flood-inclined zones. The proportion is higher in GLM model (5.75%) and lower in voting model (0.96%) for Low Susceptible label. These results indicate that GLM tends to map out a broader range of low-susceptibility flood zones, while the voting model provides more focused and largely consistent spatial identification for this class.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Percentages of actual flood occurrence locations within each flood probability class in the study area, as predicted by the six ML models (XGB, RF, DT, LightGBM, GLM, and Voting model). Generated using Python 3.11.4 (www.python.org).

The share of GLM is also the largest (8.21%) for this class, followed by RF (5.41%), then XGB (5.11%). The voting model has the lowest percentage (2.58%), which implies its preference for concentrating on higher risk zones rather than spreading across mid-risk propensity range areas. For the high susceptibility class, GLM also has highest accuracy of 18.01%, while voting model can recognize only 8.65% of actual flood locations within this category. That’s a testament to the realism of the voting model, which gives high susceptibility only in places that are really at heightened risk of flooding (unlike other models that may overestimate flood risk across larger areas). Lastly, for the very high susceptibility class – most important category in flood hazard analysis; VM does one of the best performances with accuracy reaching 87.34% of true positive cases (i.e., actual occurrence points which are accurately classified). And then LightGBM, RF and XGB with 82.99%, 80.79% and 80.28%. On the other hand, DT (75.03%), and GLM (64.67%) results are less accurate in this key class.

Overall, the findings indicate that each model had different sensitivities and accuracies in classifying areas with different flood susceptibilities. Although GLM and LightGBM run quite well in medium–high looking classes, the voting model ultimately strikes the best trade-off between sensitivity and specificity. Because the voting model performance is consistent with reality and provides realistic flood outcomes, it picks up most of the real flood locations in the very high susceptibility class without significantly overestimating other classes. Thus, the voting model can be considered the most accurate and operational model to predict flood susceptibility mapping in the study area. It can serve as the useful and reliable tool for disaster management, urban planning as well as risk assessment of natural hazards.

The final evaluation results of the model performance in terms prediction rates (PR) are provided in Fig. 7. The comparison among models in terms of misclassification is well discrepant from one sensitivity zone to another according to the new updating date. These values are the ratios of the actual flood occurrence point within each susceptibility class (i.e., from very low to very high). The lowest value of 0.006% was achieved by the Voting model in the very low susceptibility class, followed by LightGBM and DT models with 0.007%, and 0.058% respectively. These results suggest that the Voting and LightGBM models successfully found genuinely low-susceptibility or flood-free zones, minimizing the classification of flood susceptibility on safe areas. For the low susceptibility class, in the same table, most percentage is 0.72% with DT model, while least is 0.072% with voting model. This to indicate that the voting model delivers a more reasonable and realistic low-susceptibility classification compared to DT which in general dismisses a wider volume of area in this class. In the moderate susceptibility class, DT model shows relatively higher percentage (2.12%) indicating that it has a tendency to predict more flood-at-risk point in medium susceptibility range. The voting model has the least share (0.43%), indicating that it will more precisely concentrate on high-risk zones rather than generalize relatively to mid-level susceptibility. Other models like RF (0.59%), XGB (0.45%) and LightGBM (0.58%) also show a pretty similar behaviour in this class. For the high vulnerability class, the DT model depicts as the best-performing model with 4.42% and GLM attains its lowest at 1.66%. The voting model ranks in between with 2.85% showing a more conservative behaviour by distributing high susceptibility only on areas which are actually flood prone. In the most susceptible class, i.e. the very high susceptibility class – the first category in flood vulnerability assessment—the voting model had the best classification, predicting 19.87% of true flood occurrence points. It is then XGB (18.05%), DT (16.22%), RF (16.02%) and LightGBM (15.20%). The GLM had the lowest for this main class, with only 5.95%. These findings point out that the voting model not only is effective in correctly pinpointing most danger zones, but it also ensures that risk is not overestimated on scarce-affected areas either.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Prediction Rate for different flood susceptibility classes obtained from various ML models. Generated using Python 3.11.4 (www.python.org).

Figure 8 presents the ROC curves for the six evaluated models (XGB, RF, DT, LightGBM, GLM, and the voting model), illustrating their predictive performance in terms of the AUC metric. From a statistical perspective, the voting model achieved the highest predictive capability with an AUC = 0.994, outperforming all individual models. This value, being very close to one, indicates an exceptionally high level of accuracy and strong discriminative ability in distinguishing actual flood-prone points from non-flooded areas. Following this, the XGBoost (AUC = 0.985), RF (AUC = 0.984), and LightGBM (AUC = 0.982) models also demonstrated very strong and stable performances, highlighting the robustness of tree-based and boosting algorithms in flood susceptibility prediction tasks. In contrast, the DT model, with an AUC = 0.972, showed moderate performance. Although this value still represents acceptable accuracy, its relatively lower performance compared to ensemble models can be attributed to the inherent limitations of single DT and their sensitivity to data noise. The GLM model exhibited the weakest performance with an AUC = 0.879, indicating that this linear model is less capable of capturing the complex and nonlinear relationships among environmental, topographic, and hydrological variables influencing flood occurrence. Overall, the ROC curve results and AUC values confirm that the voting model, by integrating the strengths of multiple ML algorithms, achieved an optimal balance between accuracy, stability, and generalization ability. Furthermore, the XGB, RF, and LightGBM models, with performance levels very close to that of the voting model, can also be considered reliable algorithms for flood susceptibility mapping across various regions. Conversely, simpler models such as GLM and DT demonstrated limited capacity in reproducing the complex spatial and hydrological patterns, especially in heterogeneous basins. In summary, the high accuracy and stability of the voting model indicate that it can serve as a reliable and realistic predictive tool for regional and local-scale flood hazard assessment, contributing effectively to early warning systems and flood risk management applications.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

ROC Curve comparison for six models (XGB, RF, DT, LightGBM, GLM, and Voting model). Generated using Python 3.11.4 (www.python.org).

Validation of flood susceptibility mapping models via observe flooded points

To validate the flood susceptibility maps derived from the ML models, historical observed flood event information (2000–2020) was used. These data included a total of ~ 150 flood occurrences, which were field-referenced and documented within the high spatial-accuracy corresponding two authoritative sources: Khuzestan Water and Power Authority (www.kwpa.ir) and the Iran Water Resources Management Company (www.wrm.ir) (Fig. 9). Field flood points were overlain with model predicted flood prone zones in a GIS, during validation, to assess spatial convenience of observed phenomenon and model outputs. This assessment was taken as a reference for measuring their performance, accuracy and generalization of the models to predict flooding-prone areas. The validations indicated that both the employed models could effectively reproduce the spatial patterns of floods and delineate the high-risks areas with reasonable accuracy. Quantitative evaluations and comparison of the models’ performance in field data are reported next.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Assessment of observe flood points distribution and their correspondence with flood susceptibility classes predicted by ML models. Generated using Python 3.11.4 (www.python.org).

The distribution of the number of such observed flood points (Fig. 1a) among susceptibility classes (Very low, Low, Moderate, High, Very high) indicate that ML models ability to identify high susceptibility zones is different. As can be seen in the Figures, most of the models have been able to place most of the corresponding actuals for flood under High and Very High class category which indeed indicates high capability of identifying flood areas. Specifically, the voting ensemble model is the best performer in this regard (percentage of observed points correctly classified in High and Very High classes). To be exact, it has classified 77.7% of the actual/observed flood points into the Very High class. On a larger scale, 87.4% of such field-observed flood points were also categorized by the voting ensemble model as High and Very High and that was rated as a very good accuracy. The results demonstrate that the integration of models by voting form combined generalization to space effectively (superior than individual model) and reduces the possibility for error of single model.

Conversely, models that allocated higher proportion of known observations to Moderate (or Low) classes reveal certain conservativeness or less sensitivity on greater risk regions. In sum, the quantitative analysis and percentage distribution of flood point reveal that a majority of models were successful in mimicking spatial flood occurrence pattern with acceptable accuracy and specifically, voting ensemble which is more focused on capturing high exposed-risk areas can be a valid tool to predict flood-vulnerable zones. Summary These findings suggest that ensemble models to identify flood in high risk areas are highly sensitive and accurate, which further confirms the reliability of using these model types for studies regarding risk management and planning urban development.

The flood data of GFD show satisfactory accuracy and reliability for the application in modeling, hazard assessment and susceptibility mapping. The results indicate that such data sets are powerful for representing the spatio-temporal patterns of flood frequency and have well performance in distinguishing regions with differential risk. In addition, a literature review supports the soundness and good performance of GFD data since it has been extensively adopted for several international scientific studies as reliable and trustful benchmark for flood hydrological analysis. In general, these findings indicate that GFD dataset could be a useful and reliable tool for flood studies60,61,62,63,64,65.

Feature importance

The feature importance analysis of voting model (AUC = 0.994) is a useful interpretation tool in ML, especially for high-dimensional data (Fig. 10). The VM, as an ensemble of multiple models, has outperformed other models with respect to predict flood susceptibility mapping. Also, one of the popular measures to check feature importance like MDI has been used for this model52,66,67. When a feature is used to split a node in decreasing the impurity, this decrease is allocated to the feature, so that we can have an importance score for each feature. The three most important predictor variables based on the voting model of feature importance are Distance to Stream (1.20), Slope (1.14) and NDVI (0.91). All these attributes are core for hydrological, environmental and erosional modelling. Distance from streams and slope are directly related to water movement, surface runoff, and erosion risk. In contrast, NDVI reflects vegetation condition and indirectly affects the ecological and hydrological processes. The strong influence of such terrain and vegetation variables on the model implies a strong need for such environmental constraints in model predictions. At a mid-level of importance additional features such as DEM (0.76), Flow Accumulation (0.75), Distance to Dam (0.70) and climatic variables which are Precipitation (0.66) and Temperature (0.51). These attributes are generally used as secondary variables in terrain and hydrological analyses. For example, DEM and Flow Accumulation combined with slope and proximity to water bodies support the approximation of surface water flow direction and amount. Climatic drivers as rainfall and temperature although not nearly as important on their own, together with the topographic and biological variables, they improve model performance. Their low importance indicate fairness of the model in application of geological and meteorological evidence to the lithological classification. With less importance, the Geology Class (0.20), Soil Class (0.18), and Snow Cover (0.11) features are seen at the lower end of this spectrum. Their low contribution could be related to the limited variability in the study area as well as data quality or coverage restrictions. Further, some of these features may have high correlations with more important variables (i.e., DEM or NDVI), causing the model to select directly dominant predictors. Although such components could conceivably be pruned in dimensionality reduction, the possibility of their acting as complimentary maps under other spatial or temporal conditions should not be dismissed. In summary, the analysis for feature importance of voting model not only verifies the importance ranking other than features but also yields effective implications for model optimization and feature engineering, leading to an acceptable performance in middle and small floods susceptibility prediction by this model.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Voting model feature importance via the MDI algorithm. Generated using Python 3.11.4 (www.python.org).

Discussion

In the present study, climatic, hydrological, topographic and land cover parameters were considered as the main factors during flood susceptibility mapping assessment. Rainfall, temperature, evapotranspiration and snow cover were found to be important for simulating the flood process. Because the two processes are highly correlated, rainfall and evapotranspiration play important roles in the surface flow and soil moisture changes that control flooding. In hydrological terms, distance to river, density of first order streams, stream power index and cumulative flow are good predictors for simulating runoff behavior and delineating flood prone areas that also enhances water movement pattern accurately. Topographic variables including elevation, slope and topographic wetness index affected the structure of land as well as water course, permitting accurate assessment of flood routing and storage in shallow areas. Lastly, land cover factors such as soil texture, vegetative coverage and land use were affecting water infiltration potential and runoff distribution. High vegetation cover or permeable soils decreased the chance of flooding. All of these characteristics enhance the precision of ML algorithms in flood susceptibility mapping and high-risk area modeling. In the performance assessment, AUC = 0.994 was obtained for voting model, which demonstrates that it is the most dependable and stable model to predict flood susceptibility mapping. The XGB (AUC, 0.985), RF (AUC = 0.984), and LightGBM models (AUC = 0.982) had strong discriminative power that was consistent with the validation results; moderate such predictive ability was noted for the DT model (AUC = 0.972), whereas low accuracy was conveyed by the GLM (AUC = 0.879).

The results show that by considering combined multi-dimensional environmental-hydrological and climatic data, the voting model produces a proper mapping of actual flood-prone areas with high susceptibility, without overestimating risk in low-susceptibility areas. XGB, RF and LightGBM models also achieved relatively good prediction results, combining sensitivity to-cumulative amount of drugs administered, the voting model provides a marginally finer balance between sensitiveness in cumulative amount and prediction accuracy.

In general, the results prove that ML models, particularly as an ensemble combine model were strong and powerful to predict flood susceptibility mapping and impact simulation, on the other hand, the successful applications of tree-based algorithms in accurately delineating the boundaries of inundated regions makes it a suitable option for spatial analysis and modeling related to natural disasters53,68,69,70,71,72,73.

Limitations and future works

This study is subject to a set of conceptual, data-related, and methodological limitations that may influence the accuracy, interpretability, and generalizability of the findings and therefore warrant more in-depth consideration in future research. A clear articulation of these limitations not only facilitates a proper understanding of the results but also provides a coherent framework for guiding and advancing subsequent studies. First, this research is primarily based on remote sensing data and data-driven approaches. While such data enable extensive spatial coverage and the integration of multi-source information, they inherently require the acceptance of implicit assumptions regarding data quality, spatiotemporal resolution, and the degree to which complex physical processes are represented. Intrinsic errors in satellite imagery, uncertainties in the derivation of hydrological variables, and limitations in capturing certain subsurface processes may propagate uncertainty into machine learning models and ultimately into the resulting output maps. From a conceptual standpoint, the primary focus of this study is on flood susceptibility, which mainly represents the physical likelihood of flood occurrence based on natural and environmental conditions. However, the distinction between flood susceptibility and flood risk which additionally incorporates exposure and vulnerability components has not been explicitly addressed. This distinction is of fundamental importance, as flooding is a highly dynamic phenomenon shaped by complex interactions among natural drivers, temporal variability, and human influences. Accordingly, future studies are encouraged to extend the analytical framework beyond susceptibility by integrating infrastructural, economic, social, and demographic indicators, thereby enabling a more comprehensive assessment of flood risk and enhancing the practical relevance of the results for disaster management and regional planning.

Furthermore, the Karun River basin examined in this study exhibits considerable climatic heterogeneity, with precipitation patterns showing strong spatial and temporal variability. Under such conditions, both riverine flooding and pluvial (rainfall-induced) flooding may play significant roles. Despite the use of multi-source and multi-resolution datasets, precipitation one of the primary flood-triggering factors was incorporated into the model mainly in the form of averaged values. This may lead to an unavoidable simplification of complex rainfall-runoff dynamics and may fail to capture part of the system’s actual variability. Consequently, the use of high–temporal-resolution precipitation data, extreme rainfall indices, and radar-based observations in future research could improve the representation of hydrological processes. At the methodological level, although the performance of different machine learning models was compared, strengthening the scientific foundation of the analyses requires greater attention to result interpretability and process-oriented support. In this regard, future studies are recommended to integrate or at least benchmark data-driven model outputs against physics-based hydrological or hydraulic models (such as HEC-RAS, HEC-GeoRAS, HEC-HMS, and MIKE packages) to enable a more comprehensive evaluation of model behavior. In line with these considerations, future research directions may focus on advancing and refining machine learning models through hybrid approaches, particularly CNN-Transformer models and CNN-Ensemble Boosting, which are capable of simultaneously capturing temporal and spatial patterns and thus offer substantial potential for improving predictive accuracy. Moreover, hyperparameter optimization using meta-heuristic algorithms such as the INFO optimizer74 and the Gradient-Based Optimizer (GBO)75 can further enhance model performance under varying climatic and geographic conditions and contribute to improved model stability. In addition to the limitations discussed above, in many flood susceptibility studies including the present research continuous model outputs are classified into discrete classes using predefined and largely arbitrary thresholds to facilitate result presentation and interpretation. Although this approach can provide an organized and interpretable visualization of the results, the direct physical meaningfulness of such thresholds with respect to real flood-generating processes is often not explicitly established. As a result, this practice may lead to an oversimplification of complex system dynamics and introduce additional uncertainty into the interpretation of flood susceptibility maps. Therefore, future research may benefit from moving beyond fixed, arbitrary thresholds toward probabilistic or data-driven classification approaches. Methods such as probabilistic distribution analysis of model outputs, statistical clustering techniques, or unsupervised machine learning algorithms could help identify thresholds with stronger statistical support and greater physical interpretability, ultimately improving the accuracy and reliability of flood susceptibility and risk maps.

Moreover, the development of near–real-time monitoring and flood mapping systems represents another important research direction that can be achieved through the deployment of IoT-based smart sensors76 and the integration of multi-source data, including precipitation, runoff, soil moisture, land surface temperature77, and high-temporal-resolution satellite imagery. Such approaches facilitate rapid flood prediction and early warning and play a critical role in reducing flood-related damages and improving disaster management. Finally, flood extent mapping using different satellite data sources particularly Sentinel-1 (Synthetic Aperture Radar) and Sentinel-2 can be explored through comparative studies to systematically evaluate the advantages and limitations of each sensor, especially under cloudy conditions, differences in spatial resolution, and access to long-term historical datasets. The outcomes of such studies can guide optimal data selection for future flood modeling efforts and contribute to improving the accuracy of flood susceptibility and risk maps. Overall, addressing these limitations in parallel with the proposed future research directions can facilitate the development of more advanced flood prediction and crisis management frameworks and support more informed decision-making in urban planning and watershed management.

Conclusions

In this study, a voting ensemble model was developed to enhance the accuracy of flood susceptibility mapping, and its performance was evaluated in comparison with individual ML models such as GLM, DT, XGB, LightGBM, and RF. Two distinct datasets, including observational flood data from Iran’s Karun Basin and satellite-based GFD data, were employed for model training and validation. The results demonstrated that the voting model achieved the highest accuracy and stability (AUC = 0.994) in predicting highly flood-susceptible areas and effectively identified spatial flood-risk patterns. The model showed strong agreement with observed data, particularly in the downstream parts of the basin and the Khuzestan Province. Feature importance analysis revealed that variables such as distance to stream, slope, and NDVI had the greatest influence on flood susceptibility mapping, while precipitation, temperature, DEM, and distance to dam played complementary roles. These findings highlight the significance of integrating hydrological, topographic, and environmental factors for improved flood hazard assessment. Overall, the results indicate that ensemble-based ML models, especially the voting model, are effective tools for flood susceptibility prediction and for supporting risk management, sustainable urban development, and natural hazard assessment. Moreover, combining field-based and satellite-derived data within this framework provides a robust approach applicable to other watersheds in Iran and similar regions. Future research should consider incorporating long-term climatic variations to further improve the model’s predictive performance and generalizability.