On the determination of the optimal parameters in the CAM model

In the field of complex systems, it is often possible to arrive at some simple stochastic or chaotic Low Order Models (LOMs) exploiting the time scale separation between leading modes of interest and fast fluctuations. These LOMs, although approximate, might provide interesting qualitative insights regarding some important aspects like the average time between two extreme events. Recently, the simplest example of a LOM with multiplicative noise, namely, a linear system with a linearly state dependent noise [also called correlated additive and multiplicative (CAM) model], has been considered as archetypal for numerous phenomena that present markedly non-Gaussian statistics. We show in this paper that the determination of the parameters of a CAM model from the (few) available data is far from trivial and that the actual most likely parameters might differ substantially from the ones determined directly from a (necessarily limited) short sequence of observations. We illustrate how this problem can be tackled, at least to the extent possible, using an approach that is based on Bayes’ theorem. We shall focus on a CAM modeling the El Niño Southern Oscillation but the methodology can be extended to any phenomenon that can be described by a simplified LOM similar to the one examined here and where the available sequence of data is relatively short. We conclude that indeed a Bayesian approach can fix the problem. © 2021 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.1063/5.0032267 Since the work of Langevin, simple stochastic differential equations have been used to model the dynamics of some “large scale” variable of interest “T” of complex systems. Brownian motion and Ornstein–Uhlenbeck processes are notable examples. If the stochastic process, in the form of a very fast fluctuating force f(t) (a white noise), enters the model additively, then the central limit theorem applies, ensuring a very low probability for “large” (with respect to the standard deviation of the noise) events. In this case, even a quite short record of observational data could be sufficient to infer, with some accuracy, the values of the model parameters through fitting. Unfortunately, often, like in many large scale geophysical phenomena, the intensity of the fluctuating force is state dependent, namely, it is of the form g(T)f(t), where f(t) is again a white noise and g(T) is some function of T. In this case, the noise is said to be “multiplicative,” and the model may exhibit fat tails. In the case of a weak deviation from the classical additive case, the following linear approximation applies g(T) ≈ 1 + βT, where β is a parameter. Even for small β ∼ 0.1–0.3, large fluctuations of T are much more likely than in the additive case (β = 0), but they might fail to show up if a short sequence of observational data is available and fitted, leading to the inferred parameter’s values that differ substantially from the actual most likely ones. Here, we consider the ENSO observation data as a benchmark, and we show that a Bayesian method can be really effective to fix the problem.


I. INTRODUCTION
Large scale geophysical phenomena are the result of complex ocean-atmosphere-human activities-astronomical interactions, leading to a large diversity in amplitude, spatial pattern, and temporal evolution of events. Some important examples are El Niño Southern Oscillation (ENSO), the North Atlantic Oscillation (NAO), or the Sea Surface Temperature (SST) variability in the Gulf Stream and other strong currents. These phenomena have large global impacts 2 on society; thus, understanding the main features of their dynamics and the character of the recurrence of strong events is of crucial importance. Changing time and space scales, motile bacteria, migrating cells, and Brownian swimmers are crucial Chaos ARTICLE scitation.org/journal/cha to human life [3][4][5][6][7][8][9] as well. They can be modeled as mesoscopic active Brownian particles (ABPs), the dynamics of which is the result of the interaction with a very complex and fast environment. 10 In all these cases, which have a big influence on human activities, and in many others less critical (e.g., laser fluctuations 11,12 ), often the complexity and the large time scale separation between the dynamics of the slow variables of interest and the dynamics of the fast fluctuations allow for a description of the mean features of the dynamics of these phenomena by simple few degrees of freedom models or Low Order Models (LOMs, see, e.g., Refs. 13-21 for ENSO and Ref. 10 for ABP). This is the case, of course, for the original Brownian motion, chemical reactions or bistable systems, [22][23][24][25][26][27] and, in general, for statistical mechanics of matter.
The different parameters that appear in a LOM aim at encapsulating the intricate dynamics of systems, with typically an enormous number of degrees of freedom, in a few quantities. Observation data over different time spans can be used to derive the LOM parameters as a function of time, and predictions can be made regarding events in the future.
Since the 1990s, it has been emphasized that in systems driven by correlated additive and multiplicative (CAM) noises, the correlation between these noises is able to change the steady properties of the systems greatly (see, e.g., . In more recent years, the simplest one of such a class of LOMs, namely, a linear system with a linearly state dependent noise [see Eq. (1)], received some interest because it has been shown that it represents a physically sound simplification of many large scale geophysical phenomena 18,[35][36][37][38][39] and mesoscopic ABP. 10 In particular, this LOM is able to catch the main non-Gaussian statistical features observed in these contexts. It has been used (to name just a few studies) to evaluate the impact of correlated noise in an energy depot model 10 or to model mid-latitude ocean-atmosphere local coupling. 35 The same LOM has been exploited in Ref. 37 to discuss the statistics of extreme weather events. In Ref. 18,Eq. (38) is the equivalent Fokker-Planck Equation (FPE), proposed to reproduce some ENSO statistical relevant quantities, like the equilibrium probability density function (PDF) and the timing of the recurrence of strong events. Moreover, it has been also shown, in Ref. 40, that strong ENSO events may not be more predictable that what can be accounted for by a bivariate linear system driven by CAM noise. It is then clear that this model is becoming more and more popular in oceanography (see, for example, Refs. 41, 42, and 26, 27, 43-46).
It is beyond the scope of the present work to discuss the details of how a linear model with CAM noise might emerge in different contexts. What is relevant here is that, for appropriate parameters, this simple LOM is a continuous process realization of Lévy flights; 47 thus, non-Gaussian fat tails in the PDF are in principle present. This clearly might lead to difficulties in the analysis and interpretation of experimental and observational data.
These difficulties have been recognized, for example, in Ref. 35, where the analytical PDF is not known; thus, parameter estimates, with the corresponding errors, are found by comparing the first four moments obtained from data observations and from the model, leading to substantial errors. The same method, involving the first four moments and a deep analysis of the errors, is found in Ref. 37. Although this approach is different from fitting data with the equilibrium analytical PDF or from evaluating directly the FPE coefficients from their definition (two methods here exploited), any kind of direct fit of the data does not take into account the fact that the available observations span a relatively short time range and, given that the models are characterized by a non-Gaussian tail, in principle, the available observations might have failed to capture the non-Gaussian tail details. 37,48 This would lead to a wrong determination of the model parameters, and, in turn, a wrong determination of the non-Gaussian tail features with the consequences that the frequency of extreme events might be badly estimated. Thus, the following is a very relevant question: how good are the available observational data in order to derive the parameters that appear in such a LOM?
Here, we try to estimate the reliability of the inference procedure for such a model by using a Bayesian approach. Of course, the result is case dependent, and here we shall focus on ENSO events and data.
Because of the regularization effect 49 and the often (but not always) better result with respect to other inference approaches 50 (e.g., aka, frequentist), Bayesian techniques (see Ref. 51) have been widely exploited in high-dimensional estimation problems. 52,53 An application in the field of oceanography, with regard to the parameter estimations of a high-dimensional linear stochastic differential equation (SDE), a model for high-latitude sea surface temperature variability, can be found in Ref. 54. There, Bayes' approach is used to provide optimal probabilistic estimates for a very highdimensional system, where the conditional PDF, typically called the conditional marginal likelihood, is assumed to be Gaussian. Other notable examples are in the field of acoustic in underwater ocean environment, where Bayesian treatments have been used to find bounds for field parameter estimations (e.g., in order to properly incorporate terms for the fluid flow's transverse speed and normal speed caused by oceanic streams and swells, 53 or to estimate ocean environmental parameters by exploiting wave modeling of acoustic waveguide propagation 55 or to predict the threshold signalto-noise ratio 56 ). In these cases, the focus is on deriving bounds on the mean-square error performance of the estimator for highdimensional systems where the process dynamics and/or the observation equations are nonlinear. Bayesian techniques are also widely used in the multivariate nonlinear case. A good recent introductory paper is Ref. 57, which also provides algorithms and a few examples.
Our task is less ambitious. We are interested on a simple univariate CAM SDE and the Bayes relation and surrogate sequences are invoked to cure, at least as it is possible, the low reliability of the parameter estimates due to both the small number of data available (poor statistics) and the strongly non-Gaussian nature of large fluctuations (rare events matter).

II. THE DYNAMICAL MODEL AND SOME PRELIMINARY CONSIDERATIONS
The typical time series we will focus on is the directly measured anomaly (deviation from the average) of the monthly Niño3 index, available in the NOAA database, starting from 1950, as shown in Fig. 1. This time series is interesting per se but here it is used mainly as a case study.

A. Detrending the data
Before we can carry out any analysis, we need to address the problem of which average is considered to derive the anomaly series. According to NOAA, the average is evaluated considering the last 30 years' data. We are dealing with a time series, however, which is approximately 70 years long, and it turns out that the average of the whole series is different from zero. Given that we are planning to interpret the anomaly data with a LOM in the form of an oscillator (for which the average anomaly can be expected to be zero, in the limit of a very long sequence) and although we are dealing with a finite time series (for which, on the contrary, the average anomaly could be different from zero), we decided to investigate the possibility of detrending the data, before we proceeded to analyze them (subtracting the average is equivalent to detrending with a constant), in order to bring the dynamics closer to what was expected for an oscillator. We tried low order polynomials, and we eventually opted to detrend the time series with a line, q(t) = a + b(t − 1950), with the time t measured in years, a = −0.531 ± 0.060 • C and b = 0.0104 ± 0.0015 • C/year (shown as a straight line in Fig. 1). Detrending with higher order polynomials yielded coefficients with relative errors on the order of or larger than 100%, and we decided to rule them out. b = 0 in our detrending line may bear climatic meanings but it is beyond the scope of this paper to investigate its role besides the mere use of a tool to detrend. In the following, we will use T to refer to the Niño3 anomaly detrended via q(t) for brevity. Also, the fits mentioned in this paper, including the one used to detrend the data, were done either using MINUIT 58 or NL2SOL: 59,60 in particular, the latter was used in fitting the distributions obtained from the surrogate trajectories (see the following text). When quoted, the errors on the fitted parameters are standard errors, related to the Hessian matrix at the minimum of the χ 2 (routine MINOS from MINUIT).

B. The model
As discussed in Sec. I, a relatively simple LOM can be used as a model of ENSO: the purpose of this LOM is to generate time sequences that have statistical properties similar to the ones measured for the observed time series shown in Fig. 1.
The simplified model is described by a simple, one-dimensional stochastic differential equation (SDE), where W is a standard Wiener process and γ is a frictional coefficient. D controls the intensity of the stochastic forcing and β is the relative intensity of the multiplicative part of the stochastic forcing. This SDE has the associated Fokker-Planck equation given by Eq. (38) of Ref. 18, that for self-consistence we rewrite here as which can be solved to find the equilibrium distribution, which reads 20 with and where the Gamma-like density function f µ (x) is defined as in which (a) is the standard complete Gamma function. The dimensionless parameter µ must be greater than 3 to have both a finite normalization of the reduced PDF and the existence of (at least) the first two moments. This stationary PDF depends only on the value of the β parameter and on the ratio between D and γ . It is easy to check analytically that in the limit β → 0 and D/γ fixed, the stationary PDF in Eqs. (3)-(5) becomes a standard Gaussian with the same variance D/γ . However, for β = 0, the stationary PDF is non-Gaussian. In fact, for large positive T, it has a "power law" tail that makes possible high fluctuations of positive values of T (strong El Niño events). From Eq. (4), we have that the µ parameter, which controls the decay of this "heavy" tail of the stationary PDF, strongly depends on the value of the β parameter. Although the average value of T is zero, the maximum of the stationary PDF is found at T max = −2/ (βµ) ≈ −2β D/γ , i.e., for fixed D/γ proportional to β. Notice also that for negative values of T (La Niña events), the probability of strong events is largely reduced and it has a threshold at T min = −1/β. It is apparent that these non-Gaussian features are shared by the Niño3 data, as seen in Fig. 2. The three parameters γ , D, and β should now be estimated from the observations.
A typical approach to infer these parameters is to find them from the observed stochastic trajectory of Fig. 1. The parameter γ can be evaluated from the drift. Looking at Eq. (1), taking the average over W on both sides after discretizing in time, we have where t is the time separation between the T observation and stands for W averages.
The parameters D and β can be found from the fluctuations. From Eq. (2), we can derive which yields, assuming stationarity and Dβ 2 γ , Equation (1), discretizing in time, recalling that dW 2 ∼ dt, squaring the increment and taking the average over W on both sides, gives From the time series of the T observations, the difference between two subsequent measured temperatures is taken, and fits using Eqs. (6) and (9) are carried out. This approach makes sense as long as t (the frequency in time at which T's are measured) is much smaller than the other typical time scales in the system (1/γ , The results are shown in Figs. 3 and 4. The fits yield the parameters γ = 0.0622 ± 0.0039 month −1 , D = 0.0445 ± 0.0029 • C 2 / month, and β = 0.078 ± 0.029 • C −1 . We can also check whether t is sufficiently small: t is 1 month, and it should be (much) smaller than 1/γ ≈ 16.1 months which is indeed the case. We used a second approach to evaluate γ from the drift data: we binned the temperature range in bins of 0.1 • C and computed the quantity T/ t Tbin in each bin. This is shown as red squares in Fig. 3. The fit to the red square with a function of f(T) = a + βT yields a = −0.002 ± 0.011 • C month −1 and γ = −0.051 ± 0.020 month −1 , consistent with the former approach although with a much larger error. Another possible approach to determine the unknown parameters is to build the experimental distribution of T, P exp (T), and to fit this distribution using Eq. (3). The advantage of the latter approach is that it can be used even when t is not small compared to the other time scales. In this case, we can only determine β = 0.164 ± 0.021 • C −1 and the ratio D/γ = 0.670 ± 0.024 • C 2 , obtaining D = 0.042 ± 0.004 • C 2 /month using the already determined value of γ .
On consideration of the errors on the parameters given by the different approaches, in the following, we are going to determine γ fitting the "gray" drift points of Fig. 3, and β, D/γ fitting the distribution functions.

ARTICLE scitation.org/journal/cha
The value of γ found is consistent with what is expected on the grounds of phenomenological considerations, and similar values were obtained in the past (γ ≈ 1/16 month −1 ≈ 0.0625 [17][18][19][20]. The value of the quantity D/γ [see Eq. (8)] can be determined reasonably reliably as the variance of the data, obtaining D/γ ≈ 0.8 • C 2 . From this ratio and γ , we would get D ≈0.046-0.050 • C 2 /month consistent with what we found in the fits. The fit to the distribution gives a relatively small error on the quantity D/γ , incidentally. On the contrary, the value of the β parameter depends on "finer" features of the statistics of the data, such as the tails of the PDF, where the number of available data is small: this parameter can have a sizeable error, which is confirmed by the larger error found when a fit is done. In general, the fit done using the diffusion yields D with a minimal error but β with a large error ≈ 30%; the fit using the distribution yields a much more accurate value of β than the diffusion fit and a very accurate value for D/γ .
The two procedures used to carry out the fit yield similar D values but different βs. Looking at Fig. 2, this results in two different curves: arguably, the curve with the parameters obtained fitting the distribution is closer to the data than the curve with the parameters obtained fitting via Eq. (9). Also, for large T > 0, the former curve has a fatter tail than the latter one. In the following, we will drop the units, assuming that D, γ , and β are measured in • C 2 /month, month −1 , and • C −1 , respectively.

III. THE PROBLEM OF SHORT STOCHASTIC SERIES
Let us assume that the model of Eq. (1) is indeed the "right" model with some known parameters. If we used this model to generate a single stochastic trajectory of the same length and sampling rate of the one shown in Fig. 1, which is then analyzed to derive the model parameters, do we get back the parameters we used for the simulation?
The reason behind this question is that it should be appreciated that the Niño3 time sequence is a little longer than 800 months. This means that we have a time series that is approximately 40 times longer than the typical time scales of the system under study and we may have a problem with the statistical significance of the data set: put it in other words, we are using only approximately 40 statistically independent data points to carry out the fits. In the presence of non-Gaussian tails, the fitted parameters might be very far from the "true" ones, much further than what the errors associated with the fitted parameters might suggest.
In order to address this problem, we carried out digital simulations of Eq. (1) with parameters taken from the fit of the available

ARTICLE scitation.org/journal/cha
Niño3 data and generated stochastic trajectories of the same length of the Niño3 sequence. Then, we considered these trajectories as "surrogate" Niño3 data and we applied to them the same fitting procedure we used for the original Niño3 time series, starting with the detrending and obtaining some fitted parameters. The algorithm used for the numerical integration is the Heun scheme (see Ref. 61 for details) with a time step of 10 −2 months. Of course, each surrogate Niño3 time series will be different, but hopefully the fitted parameters we find will not differ much from the "true" ones used in the SDE. Unfortunately, it turns out that this is not the case. Figure 5 summarizes the results obtained assuming as parameters in the SDE the values γ = 0.06217, D = 0.04163, and β = 0.1637, simulating 3 × 10 6 surrogate stochastic trajectories, building the corresponding equilibrium distribution, and then fitting them to infer γ , D/γ , and β. The "true" parameters used in the simulations are marked by a cyan dot, and the small rectangle around the dot shows the experimental error we found fitting the El Niño3 data. We notice that the "true" parameters are fairly distant from the maximum of the distributions of the fitted parameters (darkest areas), and the distributions of the fitted parameters have a rather flat maximum. Also, looking at the probability distribution of the β and D/γ fitted parameters, the region where the distribution of the fitted values has the higher values lies below and to the left of the "true" values. This might have interesting consequences. The observed Niño3 anomalies' time series of Fig. 1 is a single realization of an ideal model for which the corresponding parameters are unknown. When we fit it and infer the parameters, we obtain one particular set of β and D/γ . But Fig. 5 shows that if we used these fitted parameters to generate surrogate trajectories, which are in turn fitted to infer the parameters, we get a distribution for the parameters, which appears to yield smaller D and β than the ones used in the model. This implies that the parameters we obtain from the observed Niño3 time series are very likely underestimating the "correct" D/γ and β. Figure 5 shows also the distribution of the γ values we obtained from the drift fit (top left plate). We notice that this distribution has a half width half height of approximately 2.5 × 10 −2 , which is one order of magnitude larger than the error we obtained fitting the El Niño3 data, equal to 3.9 × 10 −3 . In other words, the variability in the parameter we intrinsically expect due to the model is much larger than the error in the model parameter estimation.
The following question then arises: what are the "true" parameters we should use in the model, which are able to better characterize the expected dynamics, and how can we determine them? Figure 6 shows the probability distribution density of the fitted parameters obtained assuming two different sets of values (marked by cyan dots) for the parameters D, β, and γ in the SDE when the surrogated trajectories were generated. The distributions are clearly different. In particular, the probability of obtaining D/γ and β at the point marked by a red square is very different for the two cases shown.

IV. THE DETERMINATION OF THE MOST LIKELY MODEL PARAMETERS
In the following, for compactness, we are going to use θ ≡ {D/γ , β, γ }. We denote the parameters used in the simulations byθ and by θ * a specific set of D/γ , β, and γ found in the fit. The distributions shown in Fig. 6 can be written as This means that, assuming some parameter values (i.e.,θ ) in the SDE simulations, we obtained a distribution as a function of the fitted θ in the surrogate data. This distribution can be used to find the probability P(θ * ) that the fitted parameters are found inside a small area centered Chaos ARTICLE scitation.org/journal/cha on a specific value θ * : with reference to Fig. 6, for example, we may be interested in knowing the probability of finding the values D/γ , β, and γ marked by a red square: the value of this probability is clearly different in the two cases shown, given that the distributions depend onθ (cyan dot). Because this probability depends on θ , really we have that P(θ * ) = P(θ * |θ ), i.e., the probability that we found θ * from the surrogates usingθ in the SDE: as we changeθ , the value of P(θ * ) will change: looking at Fig. 6, the probability of P(θ * ) of the red square is larger for the plate on the right than for the plate on the left. Now, if we fixed θ * and changedθ , we could build a new probability distribution where we plot, as function ofθ , the probability of observing θ * : this would be a distribution of the form P(θ ) = P(θ |θ * ), meaning that this is the probability that, having found θ * in the surrogate, we hadθ in the model.
The distribution P(θ |θ * ) is the distribution we need to find the most likely value of the model parameters: it quantifies the probability that we hadθ in the model if we observed θ * in a single trajectory. P(θ |θ * ) follows from Eq. (11), if we knew the quantities on the right hand side. It will be clear in the following discussion that we need only to compute P(θ * |θ ) for differentθ .
The computation of P(θ * |θ ) is carried out choosing randomly the model parametersθ , generating a trajectory via numerical integration of the SDE of Eq. (1), fitting it, and checking if the fitted . This is the probability that, for a given set of parametersθ in the model, we found a trajectory that, after fitting, yielded parameter values close to θ * , shown as a cyan dot, the cyan rectangle marking the acceptance region around θ * . The white dots (θ = {D/γ = 0.715 ± 0.01, β = 0.166 ± 0.007, γ = 0.061 ± 0.001}) are the maximum of P(θ|θ * ), with the associate error in the maximum determination; the blue dot is the {D/γ , β, γ } values obtained fitting the experimental drift and diffusion (the blue curve in Fig. 2). Right: zoom around θ * . parameters are close to θ * : this loop is repeated until convergence of P(θ * |θ ) is achieved. Details are given in the Appendix. The distribution P(θ |θ * ) is shown in Fig. 7. The cyan dot marks θ * , the D/γ , β, and γ parameters obtained from the fit of the Niño3 anomalies time data. The white dot marks the maximum of the distribution, i.e., the D/γ , β, and γ values (θ = {D/γ = 0.715, β = 0.166, γ = 0.061}), which we should insert in the model to maximize the probability of observing θ * = {D/γ = 0.6696, β = 0.1637, γ = 0.0622} in a single realization. Note also that the maximum of this distribution is fairly wide. The determination of the white dot parameters is already a main result of this paper.
One of the reasons to build models of the El Niño dynamics is to be able to make predictions on how often we might expect to experience a given large ENSO event. In analytical terms, the answer to this question comes from evaluating the Mean First Passage Time for the SST to reach a given T threshold from the typical rest state (for details, see Ref. 19).
It is possible, now, to evaluate the MFPT for the model Eq.
The MFPT expected on the grounds of the parameters obtained fitting the drift and the diffusion of the known Niño3 anomalies time series is the curve shown in blue. The MFPT expected on the grounds of the parameters obtained fitting p s (T) is the curve shown in green. The MFPT expected on the grounds of the model parameters that maximize the probability of finding the observed parameters in the fit of El Niño3 data is the curve shown in red. It should be appreciated that the blue curve yields unrealistic MFPT: according to the blue curve, ENSO events up to T = 3 are expected to happen every 2200 months on average (once in every 180 years). On the other hand, the green and red curves yield more reasonable ENSO events: we get T = 3 events every 1290 and 1000 months (107 and 80 years), respectively.
A final word about the detrending we applied to the El Niño3 data. Given the model of Eq. (1), is the observed trend in the El Niño3 data due to poor statistics (generating a short stochastic sequence is bound to lead to a trend) or perhaps it might have a climatic meaning? In the simulations presented in Sec. III, as we mentioned, we needed to detrend the data. To evaluate if the observed trend in Fig. 1 is simply due to poor statistics or to some cause not represented by the stationary LOM of (1), we built, from the surrogate trajectories, the distribution of the observed detrending parameters a and b, which is shown in Fig. 9. We note that this distribution function has a peak at the origin, as expected, and that the probability of observing the detrending parameters used for the El Niño3 data, the El Niño3 parameters in a surrogate trajectory is only 1/3 of the probability of observing detrending parameters close to the origin. This means that the trend given by the straight line in Fig. 1 is likely due to external factors and not just to statistics. This is also consistent with the conclusion that in the time range of the used data , the ENSO intensity is increasing in average. A behavior that the running 30 years detrending procedure incorporated in the Niño3 anomalies data from NOAA (https://psl.noaa.gov/data/climateindices/) can smooth but not hide; a well known fact (see, e.g., Ref. 62 and references therein) that can be related to global warming.

V. CONCLUSIONS
In this paper, we showed that attention must be given to the determination of model parameters in the presence of a non-Gaussian process with limited number of data. The reliability of the result is case dependent. The case of ENSO we have considered here is directly related to some recent works [13][14][15][16][17][18][19][20][21] where the model of Eq. (1) has been used to infer, from data, the analytical probability distribution and the recurring time of ENSO events. The main conclusion of these works is that, due to the non-Gaussian nature of the model, the probability to have strong and very strong ENSO is far larger (orders of magnitude) than that obtainable from a Gaussian statistics. In particular, 19 "an important result that emerges [. . .] is that the average recurring time of very strong ENSO events is of the order of just some tens of years (about the human average lifetime), with important implications for society. This result is strictly related to the multiplicative nature of the perturbation to the ROM, as expressed by the parameter β." Clearly, because of the simplicity of these LOMs, they provide only a first order estimate that needs to be corroborated in more complex models. However, catching the main large scale features of the physical phenomenon, these simple models should give important information about its statistics. It is then crucial to assess the reliability of such information. At first sight, it seems that such a reliability is poor. In fact, if we use, for the model, the parameter obtained from the fit to observations, the results are most likely incorrect: when we generate surrogate stochastic ENSO data and fit them, the parameters we infer are different from the ones used in the model integrated to generate the trajectories (Fig. 5).
However, this is not a final conclusion. In fact, we then successfully address the problem of what are the most likely parameters that should be used in the model to maximize the probability of observing the parameters inferred from the real Niño3 time sequence. From these most likely parameters, we get more "optimized" models that give the results of Fig. 8. The conclusion is surprising because it essentially confirms that the non-Gaussian feature of the model leads to the aforementioned finding on recurrence time for intermediate to very strong ENSO events. We also concluded that additional care is needed in fitting: the parameters obtained by a fit to the data (drift and diffusion) yield unrealistic MFPT, whereas if the probability distribution is built first and then fitted taking into account the specific model, the MFPT obtained appear to be more realistic.
Finally, we would like to add that the additional step of finding the optimal parameters via a Bayesian analysis should be regarded as a necessary step when models implying non-Gaussian statistics are involved.

AUTHORS' CONTRIBUTIONS
All authors contributed equally to this work.

ACKNOWLEDGMENTS
Discussions about Bayes theorem with Walter Del Pozzo are warmly acknowledged. The authors thank the referees for their many valuable comments and recommendations that helped improve the quality of the paper.
We first worked out the region of the model parametersθ , which we need to sample. Via numerical integration of the SDE, we established the maximal values of D/γ , β, and γ which could yield parameters not too far from θ * after fitting. Then, we discretized this region, taking steps of size 0.03 along D/γ , steps of size 0.015 along β, and 2 × 10 −3 along γ . This means that we covered this region with 48 × 42 × 60 small boxes, of side 0.03 × 0.015 × 0.002. To apply Eq. (11), however, we should have a distribution of model parameters for our stochastic trajectories. So, within each of the small boxes, we generated 4 × 10 4 different model parameters Chaos ARTICLE scitation.org/journal/cha according to a flat distribution for D/γ , β, and γ . This defines P(θ ). For each model parameter, a stochastic trajectory, 830 months long, was numerically integrated (the integration time step was taken to be 10 −2 months); it was detrended, and the drift and the distribution p s (T) (Fig. 2) were built and fitted to obtain the value of θ for that particular trajectory. For each small box, we ended up with 4 × 10 4 θ values (in total, we generated 48 × 42 × 60 × 4 × 10 4 stochastic trajectories). We selected a small region around θ * and counted for how many trajectories the fitted parameters "hit" this region: the fraction of hits gave P(θ * |θ ). The number of hits depends on how wide is the region around θ * we select to define when we have a hit. Given that the fit of Niño3 anomalies yields parameters with some errors, the region was assumed to be equal to the error in the fit, and it is shown as a cyan rectangle in figure. The last unknown quantity on the rhs in Eq. (11) is P(θ * ): this quantity is proportional to the total number of trajectories hitting the target rectangle divided by the total number of trajectories used in the simulations. Given that we are only interested in relatively values for P(θ |θ * ), we disregarded it.

DATA AVAILABILITY
Raw ENSO data for ENSO used throughout this work were generated at the NOAA large scale facility at https://psl.noaa.gov/ data/climateindices/, Ref. 63. Derived data supporting the findings of this study are available from the corresponding authors upon reasonable request.