Low flows hydrological regionalization and definition of homogeneous regions using multivariate statistical analyses in the Uruguai River watershed, on the Rio Grande do Sul State side, Brazil

Regionalização hidrológica de vazões mínimas e definição de regiões homogêneas usando análise estatística multivariada na bacia hidrográfica do Rio Uruguai, lado do estado do Rio Grande do Sul, Brasil


Introduction
Low flow estimation is essential for water resources management, water quality regulation and habitat protection (Smakhtin, 2001;Copatti and Copatti, 2011;Hirschmann, 2015;Schork and Zaniboni Filho, 2018;Requena et al., 2018). Traditionally, statistics have been widely used to measure the natural variability of low flows (Tsakiris et al., 2011). There are several low flow statistics that may be favorable for assessing seasonal water use and habitat needs, with the flow duration curve (FDC) being the most used (Hughes and Smakhtin, 1996;Assani et al., 2011;Mehaiguene et al., 2012;Pugliese et al., 2016;Foulon et al., 2018).
FDC is a graphical representation of the cumulative distribution of percentile flows in a watershed (Pugliesi et al., 2016) that allows to identify the flow equaled or exceeded for a given percent of time (Fouad et al., 2018). Percentile flows are used to make decisions in a variety of fields, such as environmental regulation (Over et al., 2014), transportation engineering (Dudley, 2015), and drought planning (Miller and Fox, 2017). The higher the percentile flows, the longer the flow remains in the river. Despite providing an important characterization of the streamflow behavior in a watershed, FDCs require streamflow dataset to be constructed (Lin and Wang, 2006). According to Pugliesi et al. (2016) a bigger problem occurs in watersheds ungauged, where there is a greater need to understand the streamflow behavior.
This problem can be solved using the regionalization technique. Regionalization is a useful tool to extrapolate certain hydrological information at places devoid of monitoring and without an adequate historical series dataset, using the information of regions previously monitored (WMO, 2008). Several works have already proposed this type of approach in the world (Farhan and Al-Shaikh, 2017;Fouad et al., 2018;Pagliero et al., 2019;Yang et al. 2020) and in Brazil (Pruski et al., 2012;Elesbon et al., 2015;Comini et al., 2020;Lelis et al., 2020). This technical involves two main tasks: delineating the hydrologically homogeneous regions and developing a regional estimation method (Lin and Wang, 2006). The first one is accomplished using a set of data that represent the characteristics of watersheds, which are related to low flows. These datasets are grouped according to the similarity of such characteristics. The second step is the application of a regional model of regression for each homogeneous region that provides equations with suitable adjustment for the estimation of streamflow values.
The multivariate techniques such as cluster analysis (CA), by hierarchical methods, have been widely used in hydrology to group streamflow data and outline hydrologically homogenous regions (Chiang et al., 2002;Nosrati et al., 2015). Hierarchical methods are a well-known group of algorithms that investigates a data structure in various levels (Chiang et al., 2002;Lin andWang, 2006, Nosrati et al., 2015;Farhan and Al-Shaik, 2017;Fouad et al., 2018). One of the hierarchical methods problems is to determine the exact number of clusters (Lin and Wang, 2006). Charrad et al. (2014) presented a variety of objective functions called cluster validity indices to identify the ideal number of clusters in a dataset. In general, the definition of hydrologically homogeneous regions is still very discussed by the scientific community, specially concerning about the uncertainty generated by its highly subjective criterion. Elesbon et al. (2015) used the inertia between jumps criterion, in which the first visible discontinuity of the graphic is defined the cutoff point, to define the homogeneous regions. However, the authors do not discuss the way these regions were validated. Beskow et al. (2016), analyzing low flow Q90 in Rio Grande do Sul state (Brazil), defined homogeneous regions using artificial intelligence. Clusters formed were validated using the homogeneity test, termed as H test by Hosking and Wallis (1993). Beskow et al. (2016) discuss the importance of validating homogeneous regions, diminishing the subjectivity and the uncertainties inherent to this process. Independently of the chosen method there is a lack of consistency and a certain subjectivity in the definition of the hydrologically homogeneous region (hierarchical methods) in dendrogram, and in the predefinition of the number of groups (nonhierarchical methods).
Another widely used technique of multivariate statistics is the discriminant analysis (DA). DA has been used to validate homogeneous regions, commonly generated by CA (Lin and Wang, 2006;Farhan and Al-Shaik, 2017). In addition, as an advantage over the cluster validity indices, the DA allows to explain the regional differences between homogeneous regions according to the variables used (Chiang et al., 2002). Therefore, the objective of this paper to identify the number of homogeneous regions and generate regional regression equations to estimate low flows in the Uruguai River watershed, on the Rio Grande do Sul state side (Brazil), through using multivariate statistical analyses.

Material and Methods
The methodological development for low low hydrological regionalization and definition of homogeneous regions in the Uruguai River watershed, on the Rio Grande do Sul state side (Brazil), consisted of using multivariate statistics, with cluster analysis (CA) coupled with discriminant analysis (DA) and multivariate regression. In the planning and management of water resources field, several studies can be found using approximate methodologies (Byzedi et al., 2014;Youssef et al., 2015;Ahmadi et al., 2018).
The methodology makes use of the GIS (Geographic Information System) environment, in order to make the management and analysis of spatial data easier (Fraga et al., 2019). In this study, multivariate techniques were used to increase the confidence of the results obtained, especially about the definitions of homogeneous regions, followed by regression analysis to obtain regional equations. The flowchart of the methodology used is shown in Figure 1. The following topics detail the main steps for applying the methodology.

Study Area
The Uruguai River watershed is located in Northwestern Rio Grande do Sul state (72.6%) and southern Santa Catarina state (27.4%). This paper refers only to the analysis of the part referring to the state of Rio Grande do Sul (Figure 2), where the main uses of water resources are located (e.g. agroindustrial activities and hydroelectric plants) that depend on low flow. Another justification for working only with the Rio Grande do Sul state is due to the water use rights criteria in both states. While the Santa Catarina state uses the streamflow equal to or greater than 98% of the time (Q98), the Rio Grande do Sul state uses the streamflows that are equaled or exceeded in 90% (Q90) and 95% (Q95) of the time. In this work, regional equations for Q 90 and Q 95 were obtained.
In this study, the Uruguai River watershed encompasses two biomes. Alto Uruguai is in the Mata Atlântica biome and has original cover of Mata of Araucárias predominantly. However, along the course of the river it is observed a transition from the Mata Atlântica to the Pampa biome, where prevails the Southern fields. All the watershed is under subtropical climate, with annual thermal amplitude under 0°C along the winter and higher than 30°C in the summer. The use of the land and the vegetal coverage of the soil in this watershed (Rio Grande do Sul state) are classified as mixed, varying from agriculture, pasture and forest (Collischonn and Tucci, 2005). Predominant soils are the Reddish Brunizem soil and several types of Laterite, all of them with a clay texture. The North part of the Uruguai River watershed is located on the region of the Southern-Brazilian basaltic flow. According to Collischonn and Tucci (2005), that is important under the hydrological viewpoint due to the small capacity of water storage in the aquifers of this type of rock, except for the cases of great density of fractures and of sedimentary formations, located in the South part of the watershed. The Uruguai River watershed, on Rio Grande do Sul state, is in the geotectonic provinces of the Sedimentary Basin of Paraná. According to Nakase (2008) low flows in the watershed are influenced mainly by the sedimentary formations, which increase the storage area.
The average annual precipitation is 1784mm, with an average annual temperature varying between 16 and 20ºC, and an average annual evapotranspiration of 1041mm (MMA, 2006). According to Collischonn and Tucci (2005) the rainfall is relatively well distributed along the year. The natural water availability of the Uruguay River watershed is greatly influenced by important spatial and temporal variations in the rainfall regime, which is reflected in the activities developed, mainly in agriculture (MMA, 2006). The higher streamflow values occur during the period from July to October, having extreme events Tormam, M. F.; Guedes, H. A. S.; Bork, C. K.; Fraga, M. S. in September and October, and the lower streamflow values occur in the period from December to April, with extreme events in January and March (Nakase, 2008).
The most serious cases were registered in the low middle section, where rice irrigation occurs, which showed great losses in the harvest; in the non-irrigated soy plantation areas, where the loss was even greater; and, most seriously, supply problems in the northern region of the basin, where there was serious and intense rationing (MMA, 2006). Thus, it is clear that extreme drought events, when they occur, have had their effect enhanced by the high use of water resources in the watershed, but this information has not yet been systematized, in order to obtain a history of these events, except for the Civil Defense emergency situation records (MMA, 2006). Thus, it is not possible to say with certainty what is the cause of these events, but they are one-off events, which have been assuming the tendency to become cyclical.

Data Selection Criteria
Hydrometeorological data were collected at HidroWeb -Hydrologic Information Systemwebsite, the database from the National Water Agency (ANA -Agência Nacional de Águas e Saneamento Básico). ANA is responsible for the implementation of the National System of Water Resources Management (SINGREH) in Brazil. Streamflow historical series with a collection of at least 10 years of data record (Cupak, 2017) and 31 days of lack of data for each year at the most, since these series should not exceed 10% of the lacking data (Garcia et al., 2017), were defined as database. For rainfall historical series it was adopted the criterion of at least 10 years of continuous record (Caldeira et al., 2015). Thus, 46 streamflow gauging stations and 183 rainfall gauging stations were chosen. The low flows were then calculated from the daily records, using the Q90 e Q95 Quantile.
Sub-basins for each streamflow gauging station were obtained using the Digital Elevation Model (DEM) from Shuttle Radar Topography Mission (SRTM) images with spatial resolution of 30m. The following physiographic characteristics of each sub-basin was considered in this paper: area (A, in km²), perimeter (P, in km), and slope (S, in %). The basin centroid (X and Y centroids, in km) was used as a positioning variable, which may influence the delimitation of homogeneous regions and the potential to describe low flow regimes (Vezza et al., 2010;Beskow et al., 2016;Fouad et al., 2018). The climatic variables total annual precipitation (p, in mm), seasonal total precipitation in summer (PS1, in mm), in autumn (PS2, in mm), in winter (PS3, in mm) and in spring (PS4, in mm), were also used as variables. The climate variables were obtained using the Thiessen Polygon's method (Schneider et al., 2017;Araújo et al., 2018). This method originates from the Voronoi diagrams (Aurenhammer, 1991) assuming that at any point in the basin the value of precipitation is equal to the weighted average of the nearest rainfall stations. In this way, it is possible to trace the areas of influence of the stations to characterize the spatial variability of precipitation (Souza et al., 2017).
The standard z-scores of all the variables were then used for the multivariate statistical analysis to lessen the effects of differences in the units used for measurements and variance and to render the data dimensionless (Gulgundi and Shetty, 2018). Then, the normality of the standardized variables was assessed by the Anderson Darling test (p <0.05), as a condition for performing the multivariate analysis (Gulgundi and Shetty, 2018).

Cluster Analysis (CA)
CA is a multivariate technique that groups objects based on their characteristics, so that each object is same as the others in the cluster according to a predefined selection criterion. In this study, hierarchical method was used to form homogenous groups. This method is composed of a defined number of algorithms that verify data structure in various levels, then results are represent as a dendrogram (tree diagram) (Massart and Kaufman, 1989;Chiang et al., 2002).
In the CA, physiographic (A, P, S), positioning (X and Y centroids), climate (p, PS1, PS2, PS3, PS4) and low flows (Q90, Q95) variables of each sub-basin were used. Euclidean distance and Ward's method (Ward Jr., 1963) were used to identify homogeneous regions, based on various studies on low flows regionalization (Nosrati et al., 2015;Farhan and Al-Shaik, 2017). The Euclidean distance (Equation 1) shows the similarity between two samples and a distance can be represented by the difference between analytical values from the samples (Taoufik et al., 2017).
In this paper, the number of cases k was represented by the number of sub-basins. The Ward's method makes use of an analysis of variance approach for evaluating the distances between clusters, in order to minimize the sum of squares of any two clusters that can be formed at each step (Lin and Wang, 2006;Tziritis et al., 2016).
The spatial variability of the sub-basins for the study area was determined from the Dlink / Dmax ratio, which represents the relationship between the linkage distances for a particular case divided by the maximal linkage distance. To standardize the linkage distance, which is represented on the yaxis, the quotient is then multiplied by 100 (Singh et al., 2005;Gulgundi and Shetty, 2018). The length of the elements demonstrates the proximity of sub-basins. In this study, the data were grouped in a certain level on the dendrogram (Boscarello et al., 2016) using the Inertia between jumps criterion in order to define the homogeneous regions (Elesbon et al., 2015).

Discriminant Analysis (DA)
The homogeneous regions generated through CA were tested using DA, which is a supervised pattern recognition technique used for the classification of objects into exhaustive and mutually exclusive groups based on a set of independent variables. This technique is suitable when the dependent variable is a categorical variable and the independent variables are metric (Hair Jr. et al., 2009). In this study, DA was used to find the variables that discriminate between two or more groups of expected occurrences (Johnson and Wichern, 2007;Gulgundi and Shetty, 2018), generating a discriminant function (DF) for each group (Equation 2).
where Zdi is the vector of scores for n samples on the i-th discriminant function; variables X1, X2, ..., Xp are vectors for all n samples in the entire data set; ai1, ai2, ..., aip are the discriminant function coefficients for the independent variables in the ith discriminant function.
The scores for the n samples on the DF's variables have possible multiple correlations with the groups (Chiang et al., 2002). Wilk's lambda was the method used to validate the significance of DFs, which, according to Todorov (2007), it is a method used in an ANOVA (F) test of mean differences, such that the smaller the lambda index for an independent variable, the more that variable contributes to the DF. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The null hypothesis (H0) was that the average population of the two groups are equal. Therefore, it is expected not to accept H0, because the average has to be significantly different in order to best discriminate the groups generated by the CA (Hair Jr. et al., 2009).
In this paper, DFs were constructed using the stepwise method (Hsu and Huang, 2017), with significance assessed using the F-test, with p-value < 0.05 for variables input into the function and pvalue < 0.1 for variables removal, assuring that errors had a normal distribution. Subsequently, the eigenvalue matrix was generated, which is a relative measure to assess how different the groups are in the DF. I.g., the farther from 1 are the values, the bigger are the variations among the groups explained by the DF (Hair Jr. et al., 2009). Therefore, in this study, the DA tested the groups previously formed by the CA, creating a function based on the independent variables that best separate the homogenous regions.

Regional Regression Analysis and Cross-Validation
Regional regression was performed building a multi-regressive model that relates the Q 90 and Q 95 (dependent variable) to morphoclimatic descriptors (independent variables) as in Equation (3): where xi are the morphoclimatic descriptors and βi are the regression coefficient. Tormam, M. F.; Guedes, H. A. S.; Bork, C. K.; Fraga, M. S. In this study, the independent variables used to form the regional equation were chosen after selecting the variables that best characterize the homogeneous regions, i.g., after the DA result. The power regression model was chosen because it is one of the most used approaches in studies on regionalization of low flows (Lopes et al., 2016;Lujano et al., 2017). NSElog, modification of the Nash-Sutcliffe coefficient (Nash and Sutcliffe, 1970), was used to quantify the performance of the adjusted models. According Razavi and Coulibay (2017), NSElog is an index that best assesses low flow estimation models. The following range of values was chosen: NSElog > 0.75, this model is considered suitable and good; 0.36 < NSElog ≤ 0.75, this model is considered normal; and NSElog ≤ 0.36, this model is considered unsatisfactory.
The regional models were also assessed using the cross-validation procedure, which, according to Vezza et al. (2010), has advantages against other techniques intended for assessment of predictive errors, such as robustness and applicability to all regionalization models. The same validation process was applied by Cassalho et al. (2017). The cross-validation was analyzed for the Q90 and Q95 models using the statistics root mean square error (RMSE) and coefficient of determination (R 2 ), according to recommendations by Beskow et al. (2016), and the confidence index (c), proposed by Camargo and Sentelhas (1997), to allow comparison with the accuracy of the regional models. The c value was calculated using the correlation coefficient (rcorrel) and the accuracy coefficient (d) through Equations (4) and (5). The confidence index was assessed according to the classification proposed by Camargo and Sentelhas (1997): Optimum (c > 0.85); Very Good (0.76 ≤ c ≤ 0.85); Good (0.66 ≤ c ≤ 0.75); Median (0.61 ≤ c ≤ 0.65); Tolerable (0.51 ≤ c ≤ 0.60); Bad (0.41 ≤ c ≤ 0.50); and Terrible (c ≤ 0.40).
where Qest,i are estimated Q90 and Q95 values, Qobs,i are estimated Q90 and Q95 values, and Qobs,m are the observed mean Q90 and Q95 values.
Results Table 1 shows the independent variables for each fluviometric gauging station, with their respective codes, and descriptive analyzes. A great variation was observed both in size of watersheds (349.7 -10153.8km 2 ) and in total annual precipitation (671.9 -1904.5mm). That was also observed in seasonal precipitations with a variation of 54.7 to 1146.7mm in autumn, e.g.
All variables presented normal distribution (p < 0.05), being able to be used in multivariate statistics analysis. CA permitted to generate a dendrogram that indicates the spatial definition pattern of the hydrologically homogeneous regions (Figure 3). The same dendrogram was generated independently of the low flow analyzed and fluviometric stations were grouped in three similar groups. Table 2 shows the hypothesis test, which assessed the significance of DFs for 3 and 4 homogeneous regions. In the first line of Table 2 (N = 3; function test = 1 to 2), the two DFs were tested together. So, it can be concluded that at least the first DF is highly significative (p << 0.05). In the second line (N = 3; function test = 2), only the second function was tested, concluding that it is also significative (p < 0.05). In relation to Wilk's lambda, the nearest of zero, the most separate are the medium values. I.g., there is a great distance between the groups (Kumar et al., 2018). Wilk's lambda also demonstrated that the first function is the one that has the best discrimination power, since its value is the one that is nearest of zero. The values of Chi-square confirmed the significance of the DFs. I.g., how good each function separates the observation into the groups. Then, it was accomplished a test with four homogenous regions. It can be observed in the last line of Table 2 (N = 4; function test = 3) that the value of Wilks' lambda was higher, indicating that the formation of three regions is the best configuration for the study site.
As defined in the DA that the best configuration for the study area would be three homogeneous regions, two discriminating functions were generated. The first DF presents a percentage of 72.8%. That is the function that best contributes to demonstrate the difference between the groups.   * A -Area; P -Perimeter; X -Centroid X; Y -Centroid Y; S -Slope; p -Annual total precipitation; PS1 -Summer total precipitation; PS2 -Autumn total precipitation; PS3 -Winter total precipitation; PS4 -Spring total precipitation; SDstandard deviation. Figure 3. Hydrologically homogeneous regions generated by Cluster Analysis.  Figure 4 presents the spatial distribution of the three hydrologically homogeneous regions generated by the CA and confirmed using the DA in the study area. After the gradual input and the removal of some variables, the variables were reduced from 10 to 5 (A, P, X, Y, and PS1). Therefore, the most important variables in the definition of homogeneous regions in the study area. This result indicates that other variables were not significative (p > 0,05). i.g., it is possible to reduce the quantity of variables to be used in the regional regression without interfering in the formation of homogeneous regions. Results pointed the importance of the effects of seasonality on the separation of homogeneous regions, since the variable PS1 was present in the formation of the DFs (Equations 6 and 7). All equations generated to estimate the Q90 and Q95 streamflows were considered "suitable and good", according to the NSElog index, indicating that the functions adjustment was satisfactory (Table 3). Cross-validation showed that the two regional equations generated for region 1 were considered "very good", according to the c index. The other equations (region 2 and region 3) were considered "good" to estimate the flows Q90 and Q95 in the Uruguai River watershed, on Rio Grande do Sul state side. The estimation error RMSE ranged from 1.73 m 3 .s -1 (Q95 region 1) to 3.70 m 3 .s -1 (Q90 region 3), being considered satisfactory in this study.
In the Table 4 is presented the limit of application of the variables of each equation. Results demonstrate a great regularity since only three variables generate all the equations (Aarea; Pperimeter; and PS1 -summer total precipitation). Region 1 presented the best adjustment only with the variable perimeter; region 2 presented the best adjustment with the variables area and perimeter; and region 3 presented the best adjustment with the variables area and summer total precipitation.
Low flow quantiles regionally estimated by multivariate statistical analysis are visually compared with observed low flow quantiles in Figure 5 and Figure 6. These Q-Q plots show regionally estimated quantiles to be similar to observed quantiles, although several exceptions exist. Low flow quantiles in the region 1 show one clear outlier, site "74370000". The reason for their low flow underestimation was attributed to having a very low perimeter value compared with sites of a similar size. Low flow quantiles in the region 3 show two clear outlier, site "75500000" and "76650000". In this case, low PSI values were responsible for underestimating the low flow quantiles. Tormam, M. F.; Guedes, H. A. S.; Bork, C. K.; Fraga, M. S.

Discussion
This paper proposed to carry out the low flow hydrological regionalization in the Uruguai River watershed, side of the Rio Grande do Sul state (Brazil), using cluster analysis (CA) and discriminant analysis (DA). Although this work presents a case study in Brazil, the methodology used can be applied in any watershed. The use of robust multivariate techniques to define homogeneous regions is encouraged in order to reduce the uncertainties generated by the regression models.
We chose to analyze morphoclimatic variables as descriptors of low flows, since they are easy to obtain, in order to subsidize grant processes, an instrument used by the Brazilian legislation when planning the use of the water resources (Brasil, 1997). Similar proposal was accomplished by Beskow et al. (2016), when they analyzed the Q90 for the Rio Grande do Sul state using the drainage area of hydrographic sub-basins. Incorporating other variables in the study may give robustness to regionalization models. That was done by Pruski et al. (2012), in a study accomplished in the Pará River watershed, subbasin of São Francisco River, Brazil. The authors added the annual average precipitation climatic variable in the study of regionalization of low flows. The precipitation that occurs in the drainage area of a watershed is a factor that interferes directly on streamflows behavior, that is the reason why inclusion of precipitation as an explanatory variable may represent a great improvement in the streamflow regionalization model (Comini et al., 2020;Yang et al., 2020).   Analyzing low flows in watershed is a challenge because the streamflow is controlled by base flows that may arise beyond the superficial outlines of the ground (Rumsey et al., 2015). The Uruguai River watershed is in the borders of the states of Rio Grande do Sul and Santa Catarina, besides being in the frontier of Argentina. That increases the complexity of this study since the base flow in the basin can come from those places. We emphasize that studies with this complexity was not done in that region yet, what is a future tendency. In this study, we considered only the limits of the Rio Grande do Sul state as an analysis of the regionalization study and, in the future, it may be used for water use rights criteria analysis, similar to what was considered by Beskow et al. (2016) and Farhan and Al-Shaikh (2017).
Other important point to be considered is the location of fluviometric stations. In studies of this kind, which propose to study low flows, it is important to consider the natural streamflows, without the interference of dams. However, the presence of dams is remarkable in this watershed. It was used all the fluviometric stations available by the ANA, possible to be used after passing by the statistic criteria. We understand that the number of stations is not the ideal, but that is the reality of several watershed poorly monitored in Brazil (Lelis et al., 2020). Tormam, M. F.; Guedes, H. A. S.; Bork, C. K.; Fraga, M. S. Results show that the best configuration of the hydrologically homogeneous regions of the Uruguai River watershed, on Rio Grande do Sul state side, was the separation in three regions. There is not an agreement in literature about the ideal number of groups, being a particular characteristic of each region. Modarres (2010) affirms that for management purposes it is recommended a greater number of groups to interpret the characteristics of the regions and to use them with more confidence in the applicability of the regional equations. However, Vezza et al. (2010) suggest that the possible consideration of only one homogeneous region is the simplest way for executing the regionalization. The same researchers pointed that the hypothesis of applying a general model generalize all the different important processes for the analysis of low flows. That is why it is important to have mechanisms to verify if the formation of groups is suitable with the particularities of the drainage basin. In this study it was used the DA and the results showed that the three regions formed by the CA are the best configuration for the study site. Models of regional regression indicated that the area (A), perimeter (P) and summer total precipitation (PS1) were the variables that best distinguished the groups formed. The presence of the variable area in 4 of the 6 equations regional agrees with several published studies about regionalization of low flows. Although the variable area did not influence some regional equations, somehow it was represented by the variable perimeter. According to Gasques et al. (2018), the drainage area has a good correlation with the other physical characteristics of the basin and presents influence on water availability along the hydrography. In the comprehensive review of Razavi and Coulibaly (2013), the drainage area is one of the most used attributes and it is one that results in more satisfactory results of the models. Lopes et al. (2016) applied as independent variables: annual average precipitation, drainage density and area. The authors obtained the best performance of the model with only the variable area as independent variable. Beskow et al. (2016) obtained results for regionalization of Q90 for the Uruguai River watershed using only the drainage area as an explanatory variable. The authors found reliability indexes c lower than the ones found in this study, demonstrating that the methodology used here is satisfactory and that other explanatory variables may interfere in the estimative of low flows.
Results point that the use of climatic variables in low flows regionalization models improves equations performances, as it can be observed with the input of the variable summer seasonal precipitation (PS1). This variable collaborated to provide the best adjustment in the regional equations of region 3. The lower flow values in the watershed occurs in the summer period, as described by Nakase (2008). That justifies the results found. Besides, region 3 is where sedimentary geological formations are located, which contributes to the formation of the base flow. Similar results were found by Trudel et al. (2016) and Requena et al. (2018). Perimeter is the main variable that characterizes region 1, appearing also in region 2. Both regions are located mainly over crystalline shields, where water infiltration in order to generate the base flows is lower. Thus, area and perimeter were highlighted when compared to PS1. However, this is a hypothesis that will be analyzed in the future since perimeter may have appeared in the results with a greater importance due to the presence of the variable area.
In this study, only the morphoclimatic variables were analyzed in order to evaluate the low flows in the Uruguai River watershed. We emphasize that further studies should approach variables that may best characterize this environment, such as, lithology (percentage of permeable formation), vegetation coverage and index of use and occupation of the land. It is important to emphasize that the use of inadequate regional equations can result in the estimation of unreliable low flows. Those distortions may occur due to a series of errors, such as the unsatisfactory choose of the independent variables as well as the errors inherent to the methodology. Such factors may reveal spurious estimation of low flows, which may impact on studies related to the policy and the management of water resources.

Conclusions
Based on the results presented and discussed, it can be concluded that: (i) The methodology used, based on multivariate statistical analyses, proved to be robust and suitable for individualization of homogeneous regions, indicating regional Tormam, M. F.; Guedes, H. A. S.; Bork, C. K.; Fraga, M. S. low flow equations to be used by water resources management bodies in the Uruguai River watershed, on Rio Grande do Sul state side, Brazil. (ii) After performing the CA, the DA increased confidence in defining the three hydrologically homogeneous regions aiming at the regionalization of the Q90 and Q95 streamflows in the Uruguay River watershed. (iii) The variables area, perimeter and total summer precipitation are responsible for the best performance of adjustment of the regional regression equations in order to estimate the Q90 and Q95 streamflows concerning the 46 sub-basins in the the Uruguay River watershed. (iv) The adjusted regional equations performed particularly well in estimating Q90 and Q95 streamflows, requiring few variables. This is due to the robust methodology for defining homogeneous regions and cross-validation used in this study. Therefore, the equations proposed can be used in the Uruguay River watershed, on the Rio Grande do Sul state side (Brazil), to estimate the Q90 and Q95 streamflows in sites ungauged.