## Abstract

Important for many science and engineering fields, meaningful nonlinear models result from fitting such models to data by estimating the value of each parameter in the model. Since parameters in nonlinear models often characterize a substance or a system (e.g., mass diffusivity), it is critical to find the optimal parameter estimators that minimize or maximize a chosen objective function. In practice, iterative local methods (e.g., Levenberg–Marquardt method) and heuristic methods (e.g., genetic algorithms) are commonly employed for least squares parameter estimation in nonlinear models. However, practitioners are not able to know whether the parameter estimators derived through these methods are the optimal parameter estimators that correspond to the global minimum of the squared error of the fit. In this paper, a focused regions identification method is introduced for least squares parameter estimation in nonlinear models. Using expected fitting accuracy and derivatives of the squared error of the fit, this method rules out the regions in parameter space where the optimal parameter estimators cannot exist. Practitioners are guaranteed to find the optimal parameter estimators through an exhaustive search in the remaining regions (i.e., focused regions). The focused regions identification method is validated through two case studies in which a model based on Newton’s law of cooling and the Michaelis–Menten model are fitted to two experimental data sets, respectively. These case studies show that the focused regions identification method can find the optimal parameter estimators and the corresponding global minimum effectively and efficiently.

## 1 Introduction

Fitting nonlinear models to data, also called nonlinear curve fitting, is an important inverse problem for many science and engineering fields [1–3]. A major goal of nonlinear curve fitting is to estimate the value of each parameter in a specified nonlinear model. For example, Fick’s second law is fitted to data collected from standard moisture diffusion tests to estimate the mass diffusivity of various composite materials [4,5]. Michaelis–Menten model is fitted to data measured in enzymatic reactions to estimate the maximum reaction rate and the Michaelis constant of these reactions [6,7].

In nonlinear curve fitting problems, parameters in the nonlinear model usually characterize a substance or a system and have clear interpretations, such as mass diffusivity between two species, viscosity of a fluid, and maximum velocity of a chemical reaction [8]. It is therefore critical to find the optimal parameter estimators that minimize or maximize a chosen objective function, such as the squared error of the fit. In practice, iterative local methods (e.g., Levenberg–Marquardt method [9,10] and Nelder–Mead method [11]) and heuristic methods (e.g., genetic algorithms [12] and simulated annealing [13]) are commonly employed for parameter estimation in nonlinear models to minimize the squared error of the fit. However, practitioners are not able to know whether the parameter estimators derived through these extant methods are the optimal parameter estimators that correspond to the global minimum of the squared error of the fit. The iteration process of these methods also does not shed light on where the global minimum may be located. Recent research introduces a solution interval method for least squares parameter estimation in nonlinear models [14]. Practitioners do not need to guess the initial value of each parameter in a nonlinear model to initialize the method, but the solution interval method is still not guaranteed to find the optimal parameter estimators that correspond to the global minimum of the squared error of the fit.

In this paper, a generally applicable method is introduced to reduce the search space for the optimal least squares parameter estimators in nonlinear curve fitting problems. The method applies to nonlinear models that have a closed-form expression. Using expected fitting accuracy and derivatives of the squared error of the fit, this method rules out the suboptimal regions in parameter space where the optimal parameter estimators cannot exist. Notably, the ruled-out regions may include one or more local minima in which some iterative local methods can be trapped. The remaining regions in the parameter space are defined as the focused regions. If adequate computational power is available, practitioners are guaranteed to find the optimal parameter estimators that correspond to the *global minimum* of the squared error of the fit through an exhaustive search in the focused regions. In addition, a new approach to detect potential outliers using inequalities derived by expected fitting accuracy and derivatives of the squared error of the fit is also proposed for nonlinear curve fitting problems.

This paper begins with a brief review of extant least squares methods that are commonly employed to solve nonlinear curve fitting problems in Sec. 2. Section 3 presents the new method to identify the focused regions and search for the optimal least squares parameter estimators in nonlinear models. Four steps to implement the focused regions identification method in practice are provided in Sec. 4. The applications of the method are demonstrated through case studies of the Rumford cooling experiment and the puromycin experiment in Sec. 5. Section 6 presents an approach to detect potential outliers in data sets of nonlinear curve fitting problems using inequalities derived in Sec. 3. The paper concludes with a discussion of the contribution of the work presented here and future research directions.

## 2 Background and Related Work

Nonlinear models have been applied in many science and engineering fields. A review paper written by Archontoulis and Miguez provides 72 commonly employed nonlinear models that have a closed-form expression, such as exponential decay function, logistic function, Gompertz function, Gaussian function, and Michaelis–Menten model [3]. A major goal of fitting a specified nonlinear model to data, also called nonlinear curve fitting, is to estimate the value of each parameter in the nonlinear model. These parameter estimators should minimize or maximize a chosen objective function, such as squared error of the fit and maximum likelihood [15]. In nonlinear curve fitting problems, the nonlinear model usually has less than ten parameters. Each parameter in the nonlinear model has a clear interpretation (e.g., mass diffusivity and viscosity). Sometimes, the values of one or more parameters in the nonlinear model are bounded. For example, the viscosity of a fluid, *μ*, needs to be a positive constant (*μ* > 0). However, inequality or equality constraints that establish relationships between several parameters in the nonlinear model are usually not applied to nonlinear curve fitting problems. In this section, several methods that are frequently used to solve nonlinear curve fitting problems are briefly reviewed. These methods estimate parameter values in nonlinear models to minimize the *squared error of the fit*. Detailed reviews of extant nonlinear parameter estimation methods that cover various objective functions can be found in books written by Bates and Watts [16], Björck [17], Gelman et al. [18], and Tarantola [8], among others.

*x*and

*y*represent substrate concentration and reaction velocity in a biochemical reaction, respectively; parameter

*θ*

_{1}denotes the maximum reaction velocity, and parameter

*θ*

_{2}is defined as the Michaelis constant. Parameters

*θ*

_{1}and

*θ*

_{2}are, by definition, positive constants. To fit the Michaelis–Menten model to experimental data and estimate the values of parameters

*θ*

_{1}and

*θ*

_{2}in the model, Lineweaver and Burk transform the Michaelis–Menten model into a linear model as [19]

Equation (2) is often called the Lineweaver–Burk equation, and the graphical representation of Eq. (2) (1/*x* versus 1/*y* graph) is known as the Lineweaver–Burk plot or the double reciprocal plot in biochemistry. The values of 1/*θ*_{1} and *θ*_{2}/*θ*_{1} in Eq. (2) are derived by regressing 1/*y* against 1/*x* using all data points collected from the biochemical experiment, and then the values of parameters *θ*_{1} and *θ*_{2} can be calculated accordingly. However, parameter estimators derived through the Lineweaver–Burk equation minimize the squared error of the fit for the reciprocal of the reaction velocity, 1/*y*, rather than for the reaction velocity, *y*. In other words, the experimental data points with low reaction velocity (a low value of *y*), which often have greater percentage error, have a higher impact on the linear regression results compared to the data points with high reaction velocity (a high value of *y*). The parameter estimators derived through the Lineweaver–Burk equation are therefore not the optimal least squares estimators for the original problem of fitting the Michaelis–Menten model represented by Eq. (1) to experimental data. An example using the Lineweaver–Burk equation to estimate parameter values in the Michaelis–Menten model is provided in Sec. 5.2.

**= [**

*θ**θ*

_{1},

*θ*

_{2}, …,

*θ*] is a column vector of

_{n}*n*parameters in a specified nonlinear model

*y*=

*f*(

*x*

_{1},

*x*

_{2}, …,

*x*,

_{q}**) that is fit to**

*θ**m*data points [

*x*

_{1(i)},

*x*

_{2(i)}, …,

*x*

_{q}_{(i)},

*y*

_{(i)}], and

*i*∈ {1, 2, …,

*m*};

*θ*^{(s + 1)}and

*θ*^{(s)}represent parameter values at iteration step

*s*+ 1 and

*s*, respectively;

**Δ**

^{(s)}represents the increment vector at the iteration step

*s*, which is specified differently in these iterative local methods. For example, when the Levenberg–Marquardt method is employed, the increment vector

**Δ**in the iteration step

*s*is computed through [10,23]

Besides these iterative local methods, several heuristic methods, also call guided random search techniques [24], such as genetic algorithms [12], simulated annealing [13], particle swarm optimization [25], and tabu search [26,27], also have been applied for parameter estimation in nonlinear models to minimize the squared error of the fit. These heuristic methods are usually initialized by multiple initial guesses, known as initial population, as the initial estimators for each parameter in the nonlinear model. These parameter estimators are then updated in an iterative process involving random variation [24].

Importantly, in nonlinear curve fitting problems, it is critical to find the optimal estimator for each parameter that minimizes the squared error of the fit (i.e., global minimum) since each parameter in the nonlinear model characterizes a substance or a system (e.g., mass diffusivity between two species and maximum velocity of a chemical reaction). However, it is not guaranteed to find the optimal parameter estimators using these iterative local methods and heuristic methods since these methods are based on sampling over the parameter space. Specifically, using these methods, the iterative process is usually stopped when the result reaches a prespecified step tolerance, function tolerance, or maximum number of iterations. If the initial guess(es) and the stopping criteria are properly specified, these algorithms may converge to a local minimum where the first-order derivatives of the squared error of the fit equal (or approximate) zero [14,24]. However, since multiple or even an infinite number of local minima may exist for nonlinear curve fitting problems, practitioners are not able to know whether the local minimum derived from an iterative local method or a heuristic method is the global minimum that corresponds to the optimal parameter estimators, and thus it is difficult to determine when to stop the search process using these methods in practice. In other words, since the goal is to find the optimal parameter estimators that minimize the squared error of the fit (i.e., global minimum), when practitioners find a local minimum using these extant methods, they do not know whether they should stop the search process or keep searching the parameter space until they exhaust all available computational power (e.g., through using other initial guesses or increasing the population size). In addition, the iteration process of these extant methods also does not inform practitioners about where the global minimum may be located. Notably, in practice, it is also usually not feasible to perform an exhaustive search in the parameter space of nonlinear curve fitting problems since the parameter space is infinite in many cases. Recent research introduces a solution interval method for least squares parameter estimation in nonlinear models [14]. To initialize the solution interval method, practitioners do not need to guess the initial value of each parameter in a nonlinear model. It is also proved that the method can find the optimal estimator for each parameter that minimizes the squared error of the fit when the nonlinear curve fitting problem satisfies the specified monotonic and symmetric conditions [14]. However, many nonlinear curve fitting problems do not fulfill these conditions, such as the curve fitting problem defined in Sec. 5.2; it is therefore not guaranteed to find the optimal parameter estimators and the corresponding global minimum for nonlinear curve fitting problems using the solution interval method.

## 3 Focused Regions Identification for Least Squares Parameter Estimation in Nonlinear Models

Based on the research gap discussed in Sec. 2, a generally applicable method is introduced in this section to identify the focused regions for least squares parameter estimation in nonlinear models. Using expected fitting accuracy and derivatives of the squared error of the fit, this method rules out the suboptimal regions, where the optimal parameter estimators that correspond to the global minimum of the squared error of the fit are impossible to exist, from parameter space. The remaining regions in the parameter space are defined as the focused regions to search for the optimal parameter estimators. If adequate computational power is available, practitioners are guaranteed to find the optimal parameter estimators that correspond to the *global minimum* of the squared error of the fit through an exhaustive search in the focused regions. Notably, the method introduced in this section applies to nonlinear models that have a closed-form expression. The method becomes less efficient as the number of parameters in the nonlinear model increases, and ten parameters are found as a practical limit for the application of the method due to current typical computational resources. As stated in Sec. 2, many nonlinear models, such as exponential decay function, logistic function, Gaussian function, and Michaelis–Menten model, have less than ten parameters.

*m*data points [

*x*

_{1(i)},

*x*

_{2(i)}, …,

*x*

_{q}_{(i)},

*y*

_{(i)}], where

*i*∈ {1, 2, …,

*m*}. The objective is to fit a specified nonlinear model

*y*=

*f*(

**,**

*x***) to the**

*θ**m*data points by finding the optimal values for parameters

**that minimize the squared error of the fit, where**

*θ***= [**

*θ**θ*,

_{1}*θ*, …,

_{2}*θ*] represents

_{n}*n*constant parameters in the nonlinear model, and

**= [**

*x**x*,

_{1}*x*, …,

_{2}*x*]. Here,

_{q}*y*=

*f*(

**,**

*x***) represents the closed-form expression of the nonlinear model using a finite number of standard operations that do not include limit, differentiation, and integration. Fitting a nonlinear model that does not have a closed-form expression (e.g., a differential equation) to data is beyond the scope of this paper, and model selection is also not discussed in this paper. The squared error of the fit for the problem, also called the residual sum of squares (RSS), is**

*θ***= [**

*θ**θ*

_{1},

*θ*

_{2}, …,

*θ*] are the estimated values of parameters

_{n}**. Sometimes there are bounds applied to parameter**

*θ**θ*(e.g., $\theta j(lb)_<\theta j<\theta j(ub)\xaf$, where $\theta j(lb)_$ and $\theta j(ub)\xaf$ are the lower bound and upper bound of parameter

_{j}*θ*, respectively), and the

_{j}*n*-dimensional parameter space is defined by the allowed value range of each parameter

*θ*, where

_{j}*j*∈ {1, 2, …,

*n*}.

The regions that cannot satisfy expected fitting accuracy are ruled out from parameter space first. When an expected fitting accuracy is specified, an upper limit is applied to the squared error of the fit or the squared error of each data point. The regions in which the squared error exceeds the upper limit are ruled out from parameter space. Notably, the ruled-out regions may include several or even an infinite number of local minima in which some iterative local methods, such as the Gauss–Newton method and the Levenberg–Marquardt method, can be trapped. The ruled-out regions identification through expected fitting accuracy is demonstrated in Fig. 1(a), where *n* = 1.

*R*

^{2}, or mean squared error, denoted MSE. The coefficient of determination is defined as

*y*as

_{(i)}*R*

^{2}

_{(e)}, the regions defined by Eq. (9) should be ruled out from the space of parameters

**:**

*θ**m*terms on each side of the inequality. In cases when the ranges of parameters

**cannot be easily derived from Eq. (10), a more conservative criterion, based on the condition that each of the**

*θ**m*terms on the left side of Eq. (10) is larger than its corresponding term on the right side of Eq. (10) can be used. In this case, the regions that should be ruled out from the parameter space can be defined by

Equation (11) represents the intersection of *m* inequalities related to parameters ** θ** after

*m*data points are plugged into the equation. The regions defined by Eq. (11) are the subregions of the regions defined by Eq. (10). In other words, smaller suboptimal regions are ruled out if Eq. (11) is employed instead of Eq. (10). Importantly, as demonstrated in the two case studies in Sec. 5, when Eq. (10) or Eq. (11) is employed to rule out suboptimal regions from the parameter space, sampling over the parameter space is not necessary given the closed-form nature of Eqs. (10) and (11). It is conservative to assign

*R*

^{2}

_{(e)}= 0 in Eqs. (10) and (11) since a

*successful*curve fitting result must have its coefficient of determination,

*R*

^{2}, greater than or equal to zero. In practice, a larger expected coefficient of determination,

*R*

^{2}

_{(e)}, could be employed (e.g., 0.5 or 0.7) at the discretion of the practitioner. Cases when the whole parameter space is ruled out by Eq. (10) or Eq. (11) indicate that the curve fitting result cannot reach the expected coefficient of determination, and practitioners need to check data reliability and the fidelity of the nonlinear model employed to fit the data.

_{(e)}, the regions defined by Eq. (13) should be ruled out from the space of parameters

**:**

*θ**m*terms. In cases when the ranges of parameters

**cannot be easily derived from Eq. (14), a more conservative set of regions that should be ruled out from the parameter space can be defined by**

*θ*Equation (15) is derived based on the condition that each term on the left side of Eq. (14) is greater than the expected MSE_{(e)}. Equation (15) represents the intersection of *m* inequalities related to parameters ** θ** after

*m*data points are plugged into the equation. The regions defined by Eq. (15) are the subregions of the regions defined by Eq. (14). In practice, when the expected fitting accuracy is specified as that the squared error

*for each data point*is expected to be less than or equal to a specified value, denoted as MSE

_{(e)}in Eqs. (14) and (15), the ruled-out regions can be defined by the union of the

*m*inequalities related to parameters

**after**

*θ**m*data points are plugged into Eq. (15). Cases when the whole parameter space is ruled out by Eq. (14) or Eq. (15) indicate that the curve fitting result cannot satisfy the expected mean squared error or the expected squared error for each data point.

**= [**

*θ***θ*,

_{1}**θ*, …,

_{2}**θ*] are the optimal values of parameters

_{n}***,**

*θ**S*(

**) is the global minimum of the squared error of the fit. The squared error of the fit**

*θ***S*(

**) is then written as the Taylor series at**

*θ* + h***as**

*θ****= [**

*h**h*,

_{1}*h*, …,

_{2}*h*] are arbitrary small positive or negative values, and

_{n}*S*(

**) is the global minimum of the squared error of the fit,**

*θ***S*(

**) must be greater than**

*θ* + h**S*(

**) for any small positive or negative values of**

*θ****, and thus**

*h**k*∈ {1, 2, …} based on Eq. (16), where

*k*represents the order of derivatives for the squared error of the fit. The regions in which Eq. (23) is not satisfied should be ruled out from parameter space. In practice, to further reduce the remaining regions in the parameter space derived through expected fitting accuracy, the order of derivatives for the squared error of the fit in Eq. (23) could be iteratively increased (i.e., assign

*k*= 1 in Eq. (23) first, and then

*k*= 2, 3, …) until the new inequalities including the

*k*th-order derivatives of

*S*(

**) derived from Eq. (23) cannot rule out any new area from the remaining regions for parameters**

*θ***. The iteration also could be halted when the remaining regions are small enough for an exhaustive search. For example, when**

*θ**n*= 1, Eq. (16) is simplified as

*h*

_{1}could be any small positive or negative value and each derivative term in Eq. (24) must have a positive value, the regions defined by Eqs. (25)–(28) should be ruled out from parameter space. For instance, as shown in Eq. (25), when

*k*= 1, since

*h*

_{1}could have any small positive or negative value, the first-order derivative of the squared error of the fit must be zero at the global optimum

*θ*

_{1}*, the regions where the first-order derivative is greater or less than zero should be ruled out from the parameter space. In practice, the regions defined by Eqs. (25)–(28) can be ruled out from parameter space one by one until the new inequalities including the

*k*th-order derivative of

*S*(

*θ*

_{1}) cannot rule out any new area from the remaining regions for parameter

*θ*

_{1}or the remaining regions derived through the

*k*th-order derivative of

*S*(

*θ*

_{1}) are small enough for practitioners to perform an exhaustive search using the available computational power

*k*represents the order of derivative for the squared error of the fit. Notably, the squared error of the fit is the summation of

*m*terms. In cases when the ranges of parameters

**cannot be easily derived from an inequality related to the derivative of the squared error of the fit, the subregions of the regions defined by the inequality could be ruled out instead. For example, the first inequality in Eq. (25) is expressed as**

*θ**θ*

_{1}cannot be easily derived from Eq. (29), the regions that should be ruled out from the parameter space can be defined conservatively by

*m*inequalities represented by the equation. Equation (30) is derived based on the condition that each of the

*m*terms on the left side of Eq. (29) has a positive value, and the regions defined by Eq. (30) are therefore the subregions of the regions defined by Eq. (29).

Ultimately, the optimal values for parameters ** θ** that minimize the squared error of the fit could be searched in the focused regions identified through the expected fitting accuracy and the derivatives of the squared error of the fit. As shown in the two case studies in Sec. 5, the search space for the optimal parameter estimators that correspond to the global minimum of the squared error of the fit is significantly reduced using the expected fitting accuracy and the derivatives of the squared error of the fit. The optimal parameter estimators could be found through an exhaustive search, also called grid search, with specified significant digits (e.g., four significant digits) in the focused regions, if adequate computational power is available. If exhaustive search is not feasible, an iterative method that can solve nonlinear least squares problems subject to bounds, such as the trust region reflective method [21,22], could be employed to find the parameter estimators. The initial values of parameters to initialize the chosen iterative method could be specified based on previous experience, expert advice, linear least squares estimators (if linear transformation is feasible for the specified nonlinear model

*y*=

*f*(

**,**

*x***)) [3,30], graphical exploration [31], or the roots of parameter solution equations [14]. Notably, the parameter estimators derived through an iterative least squares method may not be the optimal parameter estimators since one or more local minima may exist in the focused regions.**

*θ*## 4 Four Steps to Estimate Parameter Values in Nonlinear Models

In Sec. 3, a generic method is introduced to identify the focused regions in the parameter space for least squares parameter estimation in nonlinear models. The method rules out the suboptimal regions from the parameter space, and the remaining regions in the parameter space are defined as the focused regions to search for the optimal parameter estimators that correspond to the global minimum of the squared error of the fit. In this section, the procedure to implement the focused regions identification method is summarized in four steps. These four steps are illustrated in Fig. 2. Practitioners can follow these steps to identify the focused regions of their nonlinear curve fitting problems and search for the optimal parameter estimators in the focused regions.

*Step 1—Data collection and model selection*: Research data (*m* data points) are first collected from scientific experiments or computer simulations. An appropriate mathematical model with a closed-form expression including *n* parameters is selected to fit the research data. Practitioners need to check data reliability (e.g., potential outliers) and the fidelity of the mathematical model before fitting the model to data. If the selected mathematical model is linear, linear regression is employed to solve the values of these *n* parameters in the model to minimize the squared error of the fit.

*Step 2—The ruled-out regions delineation using the expected fitting accuracy*: If the selected mathematical model is nonlinear, the regions that cannot satisfy the expected fitting accuracy are ruled out from the parameter space. The *n*-dimensional parameter space is defined by the allowed value range of each parameter *θ _{j}* in the nonlinear model (e.g., $\theta j(lb)_<\theta j<\theta j(ub)\xaf$, where $\theta j(lb)_$ and $\theta j(ub)\xaf$ are the lower bound and upper bound of parameter

*θ*, respectively, where

_{j}*j*∈ {1, 2, …,

*n*}). When the expected fitting accuracy is defined by the coefficient of determination (e.g.,

*R*

^{2}≥ 0), the regions defined by Eq. (10) are ruled out from the parameter space. In cases when the ranges of parameters

**cannot be easily derived from Eq. (10), the regions defined by Eq. (11) are ruled out instead. When the expected fitting accuracy is defined by the mean squared error (e.g., MSE ≤ 10**

*θ*^{−2}) or squared error for each data point, Eq. (14) or Eq. (15) are employed in the similar manner. If the whole parameter space is ruled out in Step 2, it indicates that the fitting result cannot reach the expected fitting accuracy, and practitioners need to double check data reliability (e.g., potential outliers) and the fidelity of the nonlinear model employed to fit the data.

*Step 3—The ruled-out regions delineation using the derivatives of the squared error of the fit*: The remaining regions in the parameter space are further reduced using the derivatives of the squared error of the fit if the whole parameter space is not ruled out in Step 2. For the first-order derivatives of the squared error of the fit (*k* = 1), Eq. (23) leads to 2*n* inequalities. The regions defined by these 2*n* inequalities are ruled out from the parameter space first. In cases when the ranges of parameters ** θ** cannot be easily derived from an inequality, the subregions of the regions defined by the inequality could be ruled out instead, as exemplified by Eq. (30). The order of the derivatives of the squared error of the fit is then iteratively increased (

*k*= 2, 3, 4, …) until the new inequalities derived from Eq. (23) for the

*k*th-order derivatives of the squared error of the fit cannot rule any new area out from the remaining parameter space. Practitioners also can stop the iteration when the remaining parameter space is small enough for an exhaustive search using the available computational power. If the whole parameter space is ruled out in Step 3, it indicates that a global minimum does not exist in the remaining regions derived in Step 2, and practitioners need to consider whether the expected fitting accuracy is overestimated in Step 2.

*Step 4—The optimal parameter estimators search in the focused regions*: If the whole parameter space is not ruled out in Step 3, the regions that are finally left over in the parameter space are defined as the focused regions. An exhaustive search with specified significant digits (e.g., four significant digits) is employed to find the optimal parameter estimators that correspond to the global minimum of the squared error of the fit in the focused regions if adequate computational power is available for the task. If the exhaustive search is not feasible, an iterative least squares method, such as the trust region reflective method [21,22], could be employed to find the parameter estimators in the focused regions. However, an iterative least squares method is not guaranteed to find the optimal parameter estimators (global minimum) in the focused regions.

## 5 Case Studies of the Rumford Cooling Experiment and the Puromycin Experiment

To demonstrate the four-step procedure outlined in Sec. 4, a model based on Newton’s law of cooling and the Michaelis–Menten model are fitted to two data sets collected from the Rumford cooling experiment and the puromycin experiment, respectively, as two case studies. The results of these two case studies validate the effectiveness of the focused regions identification method. Notably, the original parameter space is infinite in both case studies. Using the method, the search space for the optimal least squares parameter estimators is significantly reduced from an infinite space to one or several finite focused regions. In addition, it is proved that the optimal parameter estimators corresponding to the global minimum of the squared error of the fit are located in the focused regions identified through the method in both case studies.

### 5.1 The Rumford Cooling Experiment.

*θ*in the equation:

*x*denotes time,

*y*represents cylinder temperature, and parameter

*θ*is a constant cooling coefficient in the model.

No. | Time x (min) | Temperature y (°F) |
---|---|---|

1 | 4 | 126 |

2 | 5 | 125 |

3 | 7 | 123 |

4 | 12 | 120 |

5 | 14 | 119 |

6 | 16 | 118 |

7 | 20 | 116 |

8 | 24 | 115 |

9 | 28 | 114 |

10 | 31 | 113 |

11 | 34 | 112 |

12 | 37.5 | 111 |

13 | 41 | 110 |

No. | Time x (min) | Temperature y (°F) |
---|---|---|

1 | 4 | 126 |

2 | 5 | 125 |

3 | 7 | 123 |

4 | 12 | 120 |

5 | 14 | 119 |

6 | 16 | 118 |

7 | 20 | 116 |

8 | 24 | 115 |

9 | 28 | 114 |

10 | 31 | 113 |

11 | 34 | 112 |

12 | 37.5 | 111 |

13 | 41 | 110 |

*θ*in the model, the focused regions identification method presented in Secs. 3 and 4 is employed. The squared error of the fit for this problem, also called the RSS, is

_{(i)}and

*y*

_{(i)}represent the 13 data points in Table 1. The regions for the value of parameter

*θ*that cannot satisfy the expected fitting accuracy are ruled out from the parameter space

*θ*∈ (−∞, ∞) first. Here, the expected fitting accuracy is specified as that the coefficient of determination,

*R*

^{2}, is greater than or equal to zero (

*R*

^{2}

_{(e)}= 0) for demonstration purposes. Equation (11) is reduced to

Notably, the regions shown in Eq. (34) are the intersection of the regions defined by the 13 inequalities represented by Eq. (33). The remaining region *θ* ∈ [−0.01699, 0.05102] is shown as the gray area in Fig. 3(a).

_{(i)}> 0 and $e\u2212\theta x(i)>0$, Eq. (25) leads to

*θ*∈ [0.008207, 0.01505] using the first-order derivative of the squared error of the fit. The second-order derivative of the squared error of the fit is

_{(i)}

^{2}> 0 and $e\u2212\theta x(i)>0$, Eq. (26) leads to

Since no new area can be ruled out from the remaining region by Eq. (40), *θ* ∈ [0.008207, 0.01505] is defined as the focused region to search for the optimal parameter estimator that corresponds to the global minimum of the squared error of the fit. The focused region is shown as the gray area in Fig. 3(b). Figure 3(b) indicates that the focused regions identification method presented in this paper significantly reduces the search space for the optimal least squares parameter estimator *θ** in this case study from an infinite space *θ* ∈ (−∞, ∞) to a small region with its length of 0.006843.

Exhaustive search with four significant digits in the focused region *θ* ∈ [0.008207, 0.01505] yields the optimal least squares parameter estimator for the cooling coefficient in the model as *θ** = 0.009415, which is shown as the dot in Fig. 3(b). The coefficient of determination, *R*^{2}, for this result is 0.8682. The RSS is 44.16, and it is the global minimum of the nonlinear least squares parameter estimation problem. Notably, this result is the same with the parameter estimator derived in previous studies [16]. However, previous studies using graphical exploration and iterative methods to search for the least squares parameter estimator cannot guarantee that the result is the optimal estimator that corresponds to the global minimum of the squared error of the fit, while the focused regions identification method does.

The effectiveness of the focused regions identification method could be compared with extant iterative least squares methods. The parameter estimators derived through the Levenberg–Marquardt method using eight different initial guesses are shown in Table 2. Notably, Fig. 3 only shows a small portion of the parameter space *θ* ∈ (−∞, ∞), and several local minima exist in the parameter space. The results in Table 2 show that the parameter estimator derived through the Levenberg–Marquardt method is sensitive to the initial guess. The Levenberg–Marquardt iterative algorithm converges to the optimal parameter estimator *θ** = 0.009415 only when the initial guess employed to initialize the iterative algorithm is close to the optimal estimator. Importantly, the Levenberg–Marquardt iterative algorithm stops when it finds a local minimum in this case study. Practitioners are not able to know whether the parameter estimator derived through the Levenberg–Marquardt method is the optimal estimator that corresponds to the global minimum of the squared error of the fit, so it is difficult for practitioners to determine when the search process should be halted. It is also not feasible to cover the infinite parameter space *θ* ∈ (−∞, ∞) with an unlimited number of initial guesses in this case study. In contrast, using the focused regions identification method in this case study, the regions in which the optimal parameter estimator cannot be located are ruled out from the parameter space, and the remaining region is defined as the focused region. Practitioners are therefore guaranteed to find the optimal parameter estimator by performing an exhaustive search in the focused region, and practitioners can stop the search process once the optimal parameter estimator is derived.

No. | Initial guess θ_{0} | Solution θ | RSS | R^{2} |
---|---|---|---|---|

1 | −100 | Nonconvergence | N/A | N/A |

2 | −10 | −10 | Infinite | −Infinite |

3 | −1 | −0.7073 | 7.588 × 10^{28} | −2.266 × 10^{26} |

4 | 0.01 | 0.009415 | 44.16 | 0.8682 |

5 | 1 | 0.009415 | 44.16 | 0.8682 |

6 | 5 | 4.996 | 4.269 × 10^{4} | −126.5 |

7 | 10 | 10 | 4.269 × 10^{4} | −126.5 |

8 | 100 | 100 | 4.269 × 10^{4} | −126.5 |

No. | Initial guess θ_{0} | Solution θ | RSS | R^{2} |
---|---|---|---|---|

1 | −100 | Nonconvergence | N/A | N/A |

2 | −10 | −10 | Infinite | −Infinite |

3 | −1 | −0.7073 | 7.588 × 10^{28} | −2.266 × 10^{26} |

4 | 0.01 | 0.009415 | 44.16 | 0.8682 |

5 | 1 | 0.009415 | 44.16 | 0.8682 |

6 | 5 | 4.996 | 4.269 × 10^{4} | −126.5 |

7 | 10 | 10 | 4.269 × 10^{4} | −126.5 |

8 | 100 | 100 | 4.269 × 10^{4} | −126.5 |

The effectiveness of the focused regions identification method also could be compared with extant heuristic methods. Genetic algorithm is employed to search for the optimal least squares parameter estimator in this case study. The population size of the genetic algorithm is specified as 1000 for demonstration purposes. Since random variation is involved in the iterative search process of the algorithm, the genetic algorithm initialized by 1000 random seeds is run 100 times and the parameter estimator that has the minimum RSS is chosen as the result. The parameter estimator derived by the genetic algorithm is *θ* = 0.009771, which has 3.781% deviation from the optimal parameter estimator. The coefficient of determination, *R*^{2}, is 0.8605, and the RSS is 46.72. This result indicates that the fitting accuracy of the genetic algorithm with the population size of 1000 is lower than that of the focused regions identification method in this case study. Although the genetic algorithm with a larger population size may be able to find the optimal parameter estimator in this case study, practitioners never know whether they should run the algorithm again because they do not know the result is the global minimum or not. In contrast, using the focused regions identification method, practitioners are guaranteed that the optimal parameter estimator is located in the focused regions, and they can stop the search process once the optimal parameter estimator is derived.

### 5.2 The Puromycin Experiment.

*x*represents substrate concentration,

*y*denotes reaction velocity, parameter

*θ*

_{1}represents the maximum reaction velocity achieved in a biochemical experiment, and parameter

*θ*

_{2}is the Michaelis constant. Both parameters

*θ*

_{1}and

*θ*

_{2}in Eq. (42) are positive constants. These two parameters are often represented by

*V*

_{max}and

*K*

_{M}, respectively, in biochemistry.

No. | Substrate concentration x (parts per million) | Reaction velocity y (counts/min^{2}) |
---|---|---|

1 | 0.02 | 47 |

2 | 0.02 | 76 |

3 | 0.06 | 97 |

4 | 0.06 | 107 |

5 | 0.11 | 123 |

6 | 0.11 | 139 |

7 | 0.22 | 152 |

8 | 0.22 | 159 |

9 | 0.56 | 191 |

10 | 0.56 | 201 |

11 | 1.10 | 200 |

12 | 1.10 | 207 |

No. | Substrate concentration x (parts per million) | Reaction velocity y (counts/min^{2}) |
---|---|---|

1 | 0.02 | 47 |

2 | 0.02 | 76 |

3 | 0.06 | 97 |

4 | 0.06 | 107 |

5 | 0.11 | 123 |

6 | 0.11 | 139 |

7 | 0.22 | 152 |

8 | 0.22 | 159 |

9 | 0.56 | 191 |

10 | 0.56 | 201 |

11 | 1.10 | 200 |

12 | 1.10 | 207 |

*θ*

_{1}and

*θ*

_{2}in Eq. (42). The squared error of the fit for the problem, also called the RSS, is expressed as

**= [**

*θ**θ*

_{1},

*θ*

_{2}],

*x*

_{(i)}and

*y*

_{(i)}represent the 12 data points in Table 3. The regions in the space of parameters

*θ*

_{1}and

*θ*

_{2}that cannot satisfy the expected fitting accuracy are ruled out from the parameter space first. In this case study, the expected fitting accuracy is specified as that the squared error for each data point is smaller than or equal to 1600 (MSE

_{(e)}= 1600) for demonstration purposes. In other words, the absolute error of the predicted reaction velocity

*y*at

*x*

_{(i)}should be smaller than or equal to 40 for each data point in Table 3. Equation (15) is expressed as

Plugging the 12 data points into Eq. (44), the 16 regions that should be ruled out from the parameter space *θ*_{1} ∈ (0, ∞) and *θ*_{2} ∈ (0, ∞) are provided in Appendix A. Notably, the 16 regions provided in Appendix A are the union of the regions defined by the 12 inequalities represented by Eq. (44) since the absolute error for each data point cannot exceed 40. The remaining regions in the space of parameters *θ*_{1} and *θ*_{2} are shown in Fig. 4.

*θ*

_{1}is

*x*

_{(i)}> 0 and (

*θ*

_{2}+

*x*

_{(i)})

^{2}> 0, the ruled-out regions are defined by

*θ*

_{2}is

Since *x*_{(i)} > 0, *θ*_{1} > 0, and *θ*_{2} > 0, the ruled-out regions are also defined by Eqs. (46) and (47), and no new region can be ruled out from the space of parameters *θ*_{1} and *θ*_{2} using the first-order partial derivative for parameter *θ*_{2}.

The overall ruled-out regions derived by the expected fitting accuracy and the first-order derivatives are provided in Appendix C, which are the union of the 16 regions defined in Appendix A and the 12 regions defined in Appendix B. The regions that are finally left over in the space of parameters *θ*_{1} and *θ*_{2} are shown in Fig. 5. Since the search space for the optimal least squares parameter estimators is significantly reduced and it is feasible to perform an exhaustive search in the leftover regions shown in Fig. 5, these leftover regions are defined as the focused regions in this case study. It is also proved that no new region can be ruled out using the second-order derivatives of the squared error of the fit. Detailed mathematical derivations for the second-order derivatives are not included to conserve space.

The optimal least squares estimators for parameters *θ*_{1} and *θ*_{2} are derived through an exhaustive search with four significant digits in the focused regions as *θ*_{1}* = 212.7 and *θ*_{2}* = 0.06412. The optimal estimators are shown as the dot in Fig. 5. The coefficient of determination, *R*^{2}, for this result is 0.9613. The RSS is 1195. This is the same result as previous studies yield [14,16,34]. However, to our knowledge, it is the first time that the result is proved to be the optimal parameter estimators that correspond to the global minimum of the squared error of the fit for this benchmark problem. In contrast, previous studies using other approaches cannot guarantee that the final parameter estimators derived through these approaches correspond to the global minimum of the problem. Specifically, using the focused regions identification method, for all parameter estimators that are located in the ruled-out regions defined in Appendix C, their corresponding squared error of the fit, RSS, is larger than 1600 or the first-order derivatives of the squared error of the fit are not zero, and therefore the parameter estimators *θ*_{1}* = 212.7 and *θ*_{2}* = 0.06412 derived through the exhaustive search in the focused regions are the optimal parameter estimators that correspond to the global minimum of the squared error of the fit in this case study.

The fitting accuracy of the focused regions identification method is compared with the popular linear transformation method introduced by Lineweaver and Burk [19]. The Michaelis–Menten model is transformed into the Lineweaver–Burk equation, as shown in Eq. (2) in Sec. 2, and the values for parameters *θ*_{1} and *θ*_{2} are estimated through linear regression. The parameter estimators derived by the linear transformation method are *θ*_{1} = 195.8 and *θ*_{2} = 0.04841. The coefficient of determination, *R*^{2}, is 0.9378, and the RSS is 1920. These results show that the fitting accuracy of the linear transformation method is lower than that of the focused regions identification method in this case study. Importantly, the parameter estimators derived by the linear transformation method have noticeable deviations from the optimal parameter estimators (7.945% for the maximum reaction velocity, *θ*_{1}, and 24.50% for the Michaelis constant, *θ*_{2}). Such deviations can lead to a significant error when these estimated parameter values are employed to design other biochemical reactions.

The effectiveness of the focused regions identification method could be compared with extant iterative methods. The parameter estimators derived through the Levenberg–Marquardt method using six different initial guesses are found in our previous study [14]. Similar to the results in Sec. 5.1, the parameter estimators derived by the Levenberg–Marquardt method are sensitive to the initial guesses. The Levenberg–Marquardt iterative algorithm converges to the optimal parameter estimators only when the initial guesses for parameter values are close to the optimal estimators. Although practitioners can find reasonable initial guesses using the solution interval method presented in our previous study [14], the solution interval method does not promise that it can output the optimal parameter estimators that minimize the squared error of the fit (global minimum) in this case study, and practitioners are therefore not able to know when to stop the search process using an iterative local method with multiple initial guesses. In contrast, practitioners are guaranteed that the parameter estimators derived by the focused regions identification method in this paper are the optimal estimators that correspond to the global minimum of the squared error of the fit in this case study, and practitioners can stop the search process once the optimal parameter estimators are derived.

The effectiveness of the focused regions identification method also could be compared with extant heuristic methods. The genetic algorithm with a population size of 1000 is run 100 times, and the parameter estimators that have the minimum RSS are selected from the 100 results. The parameter estimators derived by the genetic algorithm are *θ*_{1} = 184.0 and *θ*_{2} = 0.03514. The coefficient of determination, *R*^{2}, is 0.8825, and the RSS is 3626. Notably, the parameter estimators derived by the genetic algorithm with the population size of 1000 have noticeable deviations from the optimal parameter estimators (13.49% for the maximum reaction velocity, *θ*_{1}, and 45.20% for the Michaelis constant, *θ*_{2}), which can lead to significant error in biochemical reaction design. As stated in Sec. 2, the genetic algorithm with a larger population size still cannot guarantee that the results are the optimal parameter estimators that minimize the squared error of the fit, while the focused regions identification method does in this case study.

## 6 Outlier Detection for Nonlinear Curve Fitting Problems

The inequalities derived through the expected fitting accuracy and the derivatives of the squared error of the fit in Sec. 3, such as Eqs. (11), (15), and (23), also could be employed to detect potential outliers in nonlinear curve fitting problems. The principle of deleting one data point at a time is commonly used to detect potential outliers in linear curve fitting problems [35]. Specifically, ** θ** represents the estimated values of parameters in a linear model computed from all the

*m*data points, and

**(**

*θ**i*) represents the estimated values of parameters

**in the linear model computed from**

*θ**m*−1 data points (without the

*i*th data point). The difference between

**and**

*θ***(**

*θ**i*) indicates the impact of the

*i*th data point on the estimated values of parameters

**in the linear model. However, parameter estimation in nonlinear models is usually more computationally expensive compared to linear models, and therefore it may not be feasible to compare**

*θ***and**

*θ***(**

*θ**i*) for each of these

*m*data points in practice when fitting a nonlinear model to data.

As an alternative, the inequalities derived through the expected fitting accuracy and the derivatives of the squared error of the fit in Sec. 3 could be employed to detect potential outliers in nonlinear curve fitting problems. For example, using Eq. (11), the regions that should be ruled out from the parameter space computed from all the *m* data points, denoted C* ^{n}*, could be compared with the ruled-out regions computed from

*m*−1 data points (without the

*i*th data point), denoted C

*(*

^{n}*i*), where

*i*∈ {1, 2, …,

*m*}. When there is a significant difference between C

*and C*

^{n}*(*

^{n}*i*), practitioners are recommended to check whether the

*i*th data point is an outlier.

*x*= 2 min, temperature

*y*= 128 °F) is inserted into the original data set shown in Table 1 as the first data point in the modified data set for demonstration purposes. Equation (33) is employed to detect the potential outliers for the problem. Plugging the 14 data points (the inserted outlier and the 13 original data points in Table 1) into Eq. (33), the two regions defined by Eq. (49) should be ruled out from the parameter space

*θ*∈ (−∞, ∞):

The remaining region is *θ* ∈ [−0.05502, 0.09526] with a length of 0.1503. The length derived by Eq. (33) using all 14 data points is set as a benchmark. The impact of the *i*th data point on the length of the remaining region derived by Eq. (33) is then evaluated through deleting the *i*th data point from the modified data set. Figure 6 shows the absolute deviation from the benchmark length for deleting each data point in the modified data set. The first data point (i.e., the inserted outlier) is identified as a potential outlier in Fig. 6 since its deviation is significantly greater than any other data point. Such a result validates the effectiveness of the potential outliers detection approach presented in this section, especially given the fact that it is difficult to identify the outlier by visual inspection in the case study of the Rumford cooling experiment since the outlier follows the temperature decreasing trend well.

## 7 Conclusions and Future Work

A focused regions identification method is introduced for least squares parameter estimation in nonlinear models. The novelty of this method is that the method uses expected fitting accuracy (e.g., the coefficient of determination, *R*^{2}, and MSE) and derivatives of the squared error of the fit to rule out the suboptimal regions from the parameter space where the optimal parameter estimators cannot exist. A new approach to detect potential outliers based on the principle of deleting one data point at a time is also proposed for nonlinear curve fitting problems. The new approach employs the inequalities derived through expected fitting accuracy and derivatives of the squared error of the fit to measure the deviation that results from deleting each data point. The application of the focused regions identification method is demonstrated through two case studies in which a model based on Newton’s law of cooling and the Michaelis–Menten model are fitted to two data sets collected from the Rumford cooling experiment and the puromycin experiment, respectively. The application of the potential outliers detection approach is demonstrated through the case study of the Rumford cooling experiment.

Using the focused regions identification method, practitioners can significantly reduce the search space for the optimal parameter estimators that correspond to the global minimum of the squared error of the fit for nonlinear curve fitting problems. The ruled-out regions may include one or more local minima in which some iterative local methods, such as the Gauss–Newton method and the Levenberg–Marquardt method, can be trapped. Practitioners can then find the guaranteed optimal parameter estimators and their corresponding global minimum for the nonlinear curve fitting problem through an exhaustive search in the remaining regions in the parameter space, defined as the focused regions, if adequate computational power is available for the exhaustive search. Potential outliers in the data sets of nonlinear curve fitting problems also can be detected efficiently using the new approach proposed in this paper.

This work has limitations that offer opportunities for future research. As stated in Sec. 3, the focused regions identification method presented in this paper only applies to nonlinear models that have a closed-form expression, and the method becomes inefficient when the nonlinear model has more than ten parameters. Future research may extend the method to nonlinear models without a closed-form expression, such as a differential equation, or nonlinear models that have a greater number of parameters. In addition, in both case studies in this paper, the alternative inequalities in which a summation is not included, such as Eqs. (11), (15), and (30), are employed to rule out suboptimal regions from the parameter space. Notably, the suboptimal regions ruled out by these alternative inequalities are the subregions of the regions defined by the original inequalities derived through expected fitting accuracy and derivatives of the squared error of the fit, such as, e.g., Eqs. (10), (14), and (29). These alternative inequalities may be modified in future research to rule out larger suboptimal regions from the parameter space, and then the resulting focused regions become smaller. It will be less computationally expensive to perform an exhaustive search in the reduced focused regions to find the optimal parameter estimators for nonlinear curve fitting problems.

## Acknowledgment

This material is partially supported by the Air Force Office of Scientific Research (Grant No. FA9550-18-0088). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The authors attest that all data for this study are included in the paper.