Assessment of CMIP6 Simulations over Tropical South America

6) CMIP6 and selects better models in simulating the precipitation and the air temperature at 2 meters height climatology over tropical South America (SA) during the historical period (1996-2014). For this reason, some statistical measures are computed. A great number of models have a small bias when compared with observation, however, a lot of them have a poor performance in terms of the Willmott agreement index, which indicates a low performance in representing the temporal variability. Among the 46 models, E3SM-1-0, EC-Earth3, EC-Earth3-AerChem, EC-Earth3-Veg, IPSL-CM6ALR, MPI-ESM1-2-LR and TaiESM1 have a better performance in reproducing SA climate. When the ensemble of the 7 models is compared with that 46 models, there is a reduction in the bias of the variables under study in some sectors of the SA. This indicates that the use of 7 models is enough for application in other studies.


Introduction
The need to systematize the analyses of the coupled ocean and atmosphere models from multiple climate modeling centers lead to the World Climate Research Program (WCRP) to create the Coupled Model Intercomparison Project (CMIP) in September 1995 (Eyring et al., 2016;Carlson et al., 2017). Then, CMIP has developed climate model experiment protocols to ensure model output availability to a wide research community (Carlson et al., 2017). Currently, CMIP is in the sixth phase (CMIP6) and its outputs are free available by the Earth System Grid Federation (ESGF).
Global climate models (GCMs) of CMIP have provided useful information to the Intergovernmental Panel on Climate Change (IPCC) reports over the years. The projections are performed by different research groups and freely available for everyone on a webpage. Then, the researchers produce scientific papers with these data and IPCC collects and synthesizes them. So, it is published in IPCC reports. The last report published was IPCC-AR5 in 2013. Moreover, CMIP outputs have been used as initial and boundary conditions in regional climate models ISSN:1984ISSN: -2295 Revista Brasileira de Geografia Física Homepage: https://periodicos.ufpe.br/revistas/rbgfe (RCMs; Elguindi et al., 2014;Gutowski Jr. et al., 2016;Ambrizzi et al., 2018).
Different studies have assessed the reliability of the simulations for the present climate over South America (Gulizia and Camilloni, 2014;Tian andDong, 2020, Vasconcellos et al., 2020) while others focus on the climate projections (Blázquez and Nuñez, 2013;Torres and Marengo, 2013;Llopart, Reboita and da Rocha, 2020). In general, they show that the models represent the main characteristics of the South American climate in terms of precipitation and air temperature.
A basic requirement in studies using the GCMs of CMIP is to mention why certain models were selected for the study. In this sense, the GCMs performance needs to be evaluated before applying them, such as in dynamical downscaling. In this context, the purpose of this study is to identify the GCMs from CMIP6 that better represent the South American tropical climate in terms of precipitation and air temperature.

Data
Simulations of precipitation and air temperature at 2 meters height of 46 CMIP6-GCMs (Eyring et al., 2016) for the historical period (December 1996-November 2014) are evaluated (model names are presented in Table 1). The simulations were obtained from The Earth System Grid Federation (ESGF; platform (https://esgfnode.llnl.gov/search/cmip6/) and, posteriorly, the area of tropical South America (75 o W -35 o W and 35 o S -0 o ) was selected in the GCMs outputs.
Daily precipitation analysis with a horizontal resolution of one degree from Global Precipitation Climatology Project Version 1.2 (GPCP; Huffman et al., 2001)  All datasets were interpolated to one degree of grid space with the bi-linear technique (Jones, 1999;Chen and Knutson, 2008;Santos, Martins and Torres, 2017) before the analyses. This procedure allows us to compute the grid point differences between simulation and observation and represent them spatially.

Statistical Analyses
a) Spatial representation: The first step of this study is to present the seasonal and annual maps of precipitation and air temperature at 2 m for the observation and the ensemble (average of the 46 GCMs) and the spatial bias between them (eq. 1). Bias = P − O (eq. 1) in that P and O are, respectively, the CMIP6-GCMs ensemble and the observation.
We also computed the spatial correlation (r) (eq. 2) and the bias of the seasonal average of ensemble and observations. The obtained values are included in the title of the figures.
being n is the number of years under study, Pi and Oi, respectively, each simulated and observed value. Correlation can range from +1 to -1, where +1 indicates a perfect positive relationship, -1 indicates a perfect negative relationship, and 0 indicates no relationship exists. Correlation values from 0.91 to 1.0 (-0.91 to -1.0) indicate very high correlation; from 0.71 to 0.9 (-0.71 to -0.9) a high correlation; from 0.51 to 0.7 (-0.51 to -0.7) moderate correlation; from 0.31 to 0.5 (-0.31 to -0.5) a weak correlation; from 0.0 to 0.3 (0.0 to -0.3) a very weak correlation (Hinkle et al., 2003). b) Selecting better models: The second step of the study is to compute the seasonal and annual bias of the precipitation and air temperature for the box of tropical South America (75 o W -35 o W and 35 o S -0 o ) considering each individual CMIP6-GCM. Here we get a single value for the bias and it is represented visually through a heatmap (Metsalu and Vilo, 2015;Yi, 2019). The same is done for the Willmott agreement index (d; eq. 3; Willmott et al., 2011;Reboita et al., 2018): in that Pi and Oi are, respectively, each simulated and observed value while P and O are the averages of the time series, simulated and observed, respectively. The agreement index ranges from 0 to 1 where 1 indicates the perfect agreement between simulation and observation.
Bias and d for seasonal and annual periods are presented in heatmaps. The seasons are defined as DJF (December-January-February), MAM (March-April-May), JJA (June-July-August) and SON (September-October-November).
The analysis of these statistics allows us to define what are the models that better represent the seasonal climatology of precipitation and air temperature over South America. So, these models are selected and a new ensemble is performed. Table 1 Horizontal resolution (longitude x latitude) and references of each CMIP6-GCM.   (2019) EC-Earth3-Veg-LR 320x160 EC-Earth Consortium (2020) TaiESM1 288x192 Lee and Liang (2020) Results Climatologies Figure 1 shows the seasonal and annual mean of precipitation of the observation (Figure 1 a-e) and 46 CMIP6-GCMs (Figure 1 f-j) while Figure 2 shows the bias between them. These figures indicate that CMIP6-GCMs ensemble simulates well the temporal and spatial variability of the precipitation over tropical South America, but with some discrepancies in intensity. For example, during the rainy season of tropical South America, which is in DJF (Reboita et al., 2010;Marrafon and Reboita, 2020), CMIP6-GCMs ensemble displaces the wettest area over northwest Amazonia to east between north and northeast Brazilian regions. Moreover, in all seasons, GCMs are dryer over southern Brazil and wet over the Andes. In this last region, it can be a problem in the models due to the topography representation (Chou et al., 2014;Freire, de Freitas and Coelho, 2015).
The reported biases are also a common problem in regional climate models (Solman et al., 2013;Ambrizzi et al., 2018). Vasconcellos et al. (2020) evaluated the DJF precipitation, in the historical period , over tropical South America simulated by 5 CMIP5 models (CCSM4, GFDL-ESM2G, GFDL-ESM2M, MIROC-ESM-CHEM and CAN-ESM2). As in our results (Figures 2 a-b), the models overestimate the precipitation over northeast Brazil (2-6 mm day -1 ) and underestimate it in the north region. Tian and Dong (2020)  In a summary, 46 CMIP6-GCMs ensemble simulates dryer conditions over the South Atlantic Convergence Zone (Figure 1 f), which is the rainy band from Amazonia to southeast Brazil in DJF (Silva et al., 2019;Pedro et al., 2020). However, comparing our results from that of CMIP3 and CMIP5 ensembles shown by Gulizia and Camilloni (2014), it is apparent that CMIP6 decreases the bias in the precipitation.
For the air temperature at 2 m (Figures 3  and 4), 46 CMIP6-GCMs ensemble has a good performance in representing the observation. Differences occur over the Amazonia and north of Argentina where the ensemble overestimates the air temperature. Dufresne et al. (2013) analyzed bias in the climatology of the annual near-surface temperature (with respect to the period 1961-1990) simulated by the IPSL-CM4 (CMIP3) and IPSL-CM5A-LR, IPSL-CM5A-MR and IPSL-CM5B-LR models from CMIP5. Except for the IPSL-CM4 model, which presented an overestimation in temperature for the entire region of the present study, the other models represented a similar pattern to that found in our Figures 4 i-j. However, those models have larger overestimates, of up to 4.5 ° C, in the North and South of Brazil, and underestimation, of up to 2.5 °C, in the Northeast.

GCMs Selection
Working with simulations is not an easy task since it needs a lot of hard disk space to store the data and good resources for data processing. Then, instead of using all CMIP6 models in the studies for South America, we can select those that simulate better the climate. It helps to save computational resources. In this way, this section focuses on the evaluation and selection of the best models.
For the precipitation (Figure 5), the bias shows, in general, smaller values in DJF than in JJA. It is a good result since the DJF is the rainy season in tropical South America Llopart et al., 2020). EC-Earth3, EC-Earth3-AerChem, EC-Earth3-Veg, EC-Earth3-Veg-LR, INM-CM4-8, INM-CM5-0, IPSL-CM6A-LR and TaiESM1 are the models with the lower biases. If we consider, for example, TaiESM1 model, it has a bias of 0.18 and -0.43 mm day -1 in DJF and JJA, respectively. These values represent an error of 3% in the DJF climatology while 24% in JJA, and are in the error interval expected by the models for precipitation (Giorgi and Mearns, 1999). Regarding the agreement index for precipitation ( Figure 6 For air temperature (Figure 7), E3SM-1-1, EC-Earth3, EC-Earth3-AerChem, EC-Earth3-Veg, IITM-ESM, and TaiESM1 show bias between -0.5 and 0.5 o C. It indicates the good performance of the models since has been accepted bias of up 2 o C in the simulated air temperature (Flato et al., 2013).
Although several models have a small seasonal bias, they show worse performance in terms of agreement index. It is associated with the fact that this index is more able to capture the variability of the time series than the bias (see equation in the methodology). For example, TaiESM1 has in DJF bias=0 and d=0.58 (Figure 7). It means that the year-to-year variability is not well represented. The maximum d obtained in Figure 8 is 0.73 for the annual period in EC-Earth3-Veg. This model also has a good seasonal performance. For example, d=0.63 in JJA. MPI-ESM1-2-HR has also a good skill being d=0.68 in MAM and SON.

Comparison of the Ensembles
In this section, we compared the performance of the ensembles with 46 and 7 members (Figures 1-4), hereafter called 46-GCMs and 7-GCMs, respectively. For precipitation, in DJF, the 7-GCMs (Figure 2b) presents better performance than 46-GCMs ( Figure 2a) over Amazonas, Acre and Rio Grande do Sul. For MAM, 46-GCMs ( Figure 2c) have a lower bias in the studied domain, except in Acre, Acre, Rondônia, Mato Grosso do Sul and São Paulo. In these regions, 7-GCMs ensemble presents a better performance (Figure 2d). In JJA, SON and annually, 7-GCMs (Figures 2f, 2h and 2j, respectively) ensemble, in general, shows lower bias in the whole domain compared to 46-GCMs (Figures 2e, 2g and 2i, respectively). Analyzing the spatial bias and correlation (Figures 1f-o), it can be noted that in most periods there is a better statistical representation of precipitation in the 7-GCMs. According to the correlation range (Hinkle et al., 2003), in 7-GCMs correlation is moderate in DJF (0.74; Figure 1k), very weak in MAM (-0.15; Figure 1l) and JJA (0.14; Figure 1m), and weak in SON (0.42; Figure 1n) and in the annual period (0.42; Figure 1o). For 46-GCMs, it is weak in DJF (0.44; Figure 1f) and very weak in MAM (0.09; Figure 1g), JJA (-0.12; Figure 1h), SON (0.26; Figure 1i) and in the annual period (0.18; Figure  1j). Regarding bias, 7-GCMs ensemble shows better results than 46-GCMs in all periods, except in DJF when 46-GCMs (Figure 1f) has bias of 0.01 and in 7-GCMs ( Figure 1k) it is 0.25 mm day -1 .   The spatial bias and correlation shows higher similarities with the observations for 7-GCMs than 46-GCMs in all periods under study. According to the correlation range (Hinkle et al., 2003) Regarding bias, 7-GCMs ensemble has better results than 46-GCMs in all periods, except JJA ( Figure 3m). In summary, 7-GCMs, in general, simulates a smoother bias than the 46-GCMs.

Conclusions
In this study, we obtained 46 CMIP6-GCMs simulations of the historical period (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) for tropical South America. The purpose was to verify the performance of the 46 CMIP6-GCMs ensemble in reproducing the main features of the precipitation and air temperature climatologies over tropical South America. The second goal was to identify the better models in order to construct an ensemble with a reduced number of members but with the same performance and/or better than that with 46 members.
For both precipitation and air temperature at 2 m, in general, the bias values are acceptable, since according to Giorgi and Mearns (1999) for precipitation the acceptable error is in the range of 5 to 30%, and according to Flato et al. (2013), the acceptable temperature error is up to 2 ° C.
In summary, for tropical South America studies, we recommend the models: E3SM-1-0, EC-Earth3, EC-Earth3-AerChem, EC-Earth3-Veg, IPSL-CM6A-LR, MPI-ESM1-2-LR and TaiESM1, since the individual analysis of these models show good statistical results when compared with the observation and the ensemble construct with them, in general, provided a smaller bias than the ensemble with 46 members.