## Abstract

Fitting a specified model to data is critical in many science and engineering fields. A major task in fitting a specified model to data is to estimate the value of each parameter in the model. Iterative local methods, such as the Gauss–Newton method and the Levenberg–Marquardt method, are often employed for parameter estimation in nonlinear models. However, practitioners must guess the initial value for each parameter to initialize these iterative local methods. A poor initial guess can contribute to non-convergence of these methods or lead these methods to converge to a wrong or inferior solution. In this paper, a solution interval method is introduced to find the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit. To initialize this method, it is not necessary for practitioners to guess the initial value of each parameter in a nonlinear model. The method includes three algorithms that require different levels of computational power to find the optimal parameter estimators. The method constructs a solution interval for each parameter in the model. These solution intervals significantly reduce the search space for optimal parameter estimators. The method also provides an empirical probability distribution for each parameter, which is valuable for parameter uncertainty assessment. The solution interval method is validated through two case studies in which the Michaelis–Menten model and Fick’s second law are fit to experimental data sets, respectively. These case studies show that the solution interval method can find optimal parameter estimators efficiently. A four-step procedure for implementing the solution interval method in practice is also outlined.

## 1 Introduction

Fitting a specified model to data, also called curve fitting, is an inverse problem that has wide application in many science and engineering fields. For example, researchers often fit a fatigue model, such as the stress-life (*S*-*N*) model and the strain-life (*ɛ*-*N*) model, to the cycles to failure experimental data of different metals [1]. Fatigue curves, which play a significant role in machine component design [2], are then constructed accordingly [3,4]. Similarly, Fick’s laws govern the moisture diffusion in composite materials [5]. Practitioners usually fit Fick’s second law to experimental data in order to estimate mass diffusivities of different composite materials [6]. These estimated mass diffusivities are then used to design composite structures (e.g., wings of passenger aircraft and the bodies of racing cars) under different service environments [7].

A major task in fitting a specified model to data is to estimate the value of each parameter in the model. Several objective functions, such as least-squares error and maximum likelihood, are available for parameter estimation [8]. Practitioners need to find the optimal parameter values that minimize or maximize a chosen objective function. Linear regression has been applied successfully to estimate parameter values in linear models [9]. However, many scientific and engineering models are nonlinear [10]. Iterative local methods, such as the Gauss–Newton method [9] and the Levenberg–Marquardt method [11], and Bayesian approaches, are usually employed for parameter estimation in nonlinear models [12–14]. Importantly, practitioners must guess the initial value or the prior distribution of each parameter to initialize these methods. A poor initial guess can contribute to non-convergence of these methods or lead these methods to converge to a wrong solution or a sub-optimal solution (e.g., local optimum) [15]. Moreover, practitioners may not know whether the result is the optimal solution or not even when a good initial guess is employed to initialize these methods.

In this paper, a broadly applicable method is introduced to estimate parameter values in a nonlinear model that minimizes the squared error of the fit when the number of unknown parameters in the model is less than the number of data points. To initialize this method, it is not necessary for practitioners to guess the initial value of each parameter in a nonlinear model. To our knowledge, an initial guess free method for least-squares parameter estimation that can be applied to fit any specified nonlinear models to data is a new contribution to curve fitting. This method establishes an empirical probability distribution for each parameter in a nonlinear model. This method also constructs a solution interval for each parameter in a nonlinear model. It is proved that the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit belongs to its corresponding solution interval under certain conditions. The optimal least-squares estimator for each parameter could be found through exhaustive search with specified significant digits (e.g., four significant digits) in the solution interval. The procedure to compute an initial value for each parameter in a nonlinear model is provided when an iterative local method is preferred for parameter estimation. A stochastic algorithm is also developed to find the optimal parameter estimators efficiently.

This paper begins with a short review of currently available methods for parameter estimation in nonlinear models in Sec. 2. The new method to construct a solution interval and search for the optimal least-squares estimator for each parameter in a nonlinear model is presented in Sec. 3. To ease the implementation of this method in practice, four steps to implement this method are summarized in Sec. 4. In Sec. 5, the application of this method is illustrated by fitting the Michaelis–Menten model and Fick’s second law to experimental data sets, respectively. The paper concludes with a brief discussion of the contribution of the work presented here and future research directions.

## 2 Background and Related Work

The goal of parameter estimation in nonlinear models is to find the optimal estimator for each parameter in a specified nonlinear model that minimizes or maximizes a chosen objective function. Available objective functions include, but are not limited to, least-squares error, weighted least-squares error, maximum likelihood, and robust regression functions for M-estimators [8]. Due to space limitations, this section only summarizes several popular methods to find the optimal parameter estimators in nonlinear models that minimize the squared error of the fit. A brief review of Bayesian approaches for parameter estimation is also provided. A thorough review of available methods for parameter estimation in nonlinear models using different objective functions could be found in books, review papers, and a technical report written by Bates and Watts [16], Björck [13], Jennrich and Ralston [8], Motulsky and Ransnas [15] and Fraley [17], among others.

*y*is the technology performance (e.g., CPU transistor count),

*t*is the time, and

*A*and

*B*are constant parameters. A natural logarithm transformation is often made to Eq. (1) as

*A*and

*B*for technological forecasting. However, the technology performance data,

*y*, have been distorted by the natural logarithm transformation in Eq. (2). The linear least-squares estimators for parameters

*A*and

*B*found from solving Eq. (2) minimize the summation of squared residuals on ln(

*y*) rather than on

*y*; thus, these linear least-squares estimators are not the optimal least-squares parameter estimators for the original problem represented by Eq. (1). A similar example could be found in Sec. 5.1 where the Michaelis–Menten model is fit to experimental data.

An iterative least-squares local method is commonly employed to estimate parameter values in nonlinear models since the linear transformation approaches usually cannot produce the optimal least-squares parameter estimators, and many nonlinear models even cannot be transformed into a linear model. An iterative local approach begins with an initial guess for the value of each parameter in a nonlinear model and then alters the values of parameters iteratively to reduce the squared error of the fit. Among currently available local methods that have been developed to direct these iterations, the Gauss–Newton method and the Levenberg–Marquardt method are two popular methods in science and engineering fields [9,13].

**Δ**=

*θ—θ*^{(s)},

**= [**

*r**r*

_{1},

*r*

_{2}, …,

*r*] is a column vector of residuals,

_{m}**= [**

*θ**θ*

_{1},

*θ*

_{2}, …,

*θ*] is a column vector of parameters in a nonlinear model

_{n}*y*=

*f*(

*x*

_{1},

*x*

_{2}, …,

*x*,

_{q}**) that is fit to**

*θ**m*data points [

*x*

_{1(i)},

*x*

_{2(i)}, …,

*x*,

_{q(i)}*y*], and

_{(i)}**is**

*J**m*×

*n*Jacobian matrix of residuals as

*i*∈ {1, 2, …,

*m*},

*j*∈ {1, 2, …,

*n*}. The squared error of the fit,

*S*, is the squared Euclidean norm of residual vector

**as**

*r***Δ**

^{(s)}is derived through minimizing approximate least-squares error defined by Eq. (3) as

*n*simultaneous linear equations as

*J**is the transpose of the Jacobian matrix*

^{T}**. The vector**

*J***Δ**

^{(s)}may be solved, for example, via a decomposition of the matrix into an orthogonal matrix and a triangular matrix (QR decomposition) at each iteration step [13,19].

The Gauss–Newton method has the advantage that it is only necessary to solve linear equations represented by Eq. (8) in each iteration step. However, the Gauss–Newton method may converge slowly for large residual problems [13,20]. Moreover, the method fails to converge when the matrix *J** ^{T}J* is singular [21]. To solve the convergence issue of the Gauss–Newton method, several modified Gauss–Newton methods, also known as Newton-type methods [13], have been developed. These modified Gauss–Newton methods [22] take second-order derivatives (i.e., Hessian matrices) into account when the residuals of a nonlinear fit are approximated compared to the first-order approximation in Eq. (3). It is usually expensive to compute Hessian matrices of the residuals in each iteration step, and researchers have therefore suggested different methods of approximating Hessian matrices [23,24].

**Δ**in each iteration step, where

*λ*is the Marquardt parameter that controls the trust-region size, and

**is a diagonal scaling matrix. Both parameter**

*D**λ*and matrix

**are updated in each iteration step based on the change of least-squares error [26]. A variety of algorithms [27] have been developed to compute parameter**

*D**λ*and matrix

**in each iteration step and to approximate the residuals of a nonlinear fit after the pioneering work of Marquardt [26] in 1963. The Levenberg–Marquardt method also has been extended and applied to least-squares parameter estimation subject to bounds (e.g.,**

*D**θ*

_{lb}<

*θ*< $\theta ub\xaf$, where

_{j}*θ*

_{lb}and $\theta ub\xaf$ are the lower bound and upper bound of parameter

*θ*respectively, in a nonlinear model) [29].

_{j}Of note, least-squares parameter estimation in nonlinear models could be regarded as a special case of engineering optimization, where least-squares error is the optimization objective and parameters in a specified nonlinear model are the design variables. Parameter estimation methods, especially trust-region methods, have been generalized and commonly employed to solve unconstrained and constrained optimization problems [30,31]. Similarly, several guided random search techniques invented for engineering optimization [32], such as genetic algorithms, simulated annealing, and particle swarm optimization, also have been applied for parameter estimation in nonlinear models [33–36]. These guided random search techniques are iterative methods that usually begin with multiple initial guesses (i.e., initial population) for the value of each parameter in a nonlinear model. A process involving random variation in parameter values is then performed to improve an objective function in each iteration step [32].

**conditional on observed data**

*θ***,**

*B**P*(

**|**

*θ***), also called the posterior distribution, is derived using Bayes’ theorem as**

*B**P*(

**) is a user-specified prior distribution of parameter**

*θ***, and**

*θ**P*(

**|**

*B***) is a data distribution [14]. The posterior expectation of parameter**

*θ***is**

*θ*In most applications involving nonlinear models, it is difficult to evaluate the integrations in Eq. (11) analytically. Laplace approximation [37] and Markov chain Monte Carlo methods [38] have been applied to obtain the posterior expectation for each parameter in a nonlinear model.

Iterative least-squares local methods (e.g., the Gauss–Newton method, the Levenberg–Marquardt method, and guided random search techniques) and Bayesian approaches have been applied to estimate parameter values in nonlinear models in many science and engineering fields [13,14,32,39]. A successful application of any method mentioned previously relies on a smart initial guess. Specifically, an initial value for each parameter in a nonlinear model must be specified when an iterative least-squares local method is employed. Bayesian approaches require a user-specified prior distribution for each parameter. A poor initial guess can contribute to non-convergence of these methods or lead these methods to converge to a wrong solution or a sub-optimal solution (e.g., local optimum) [15,40]. However, there is no standard procedure available to specify initial values for least-squares iterative local methods or prior distributions for Bayesian approaches [15,41,42]. In practice, an initial guess for parameter estimation in nonlinear models is made on a case-by-case basis and is often based on previous experience, expert advice, linear least-squares estimators (if linear transformation is feasible for a nonlinear model), graphical exploration, or a grid search (i.e., “brute force” approach) [15,40–42].

## 3 A Solution Interval Method to Estimate Parameter Values in Nonlinear Models

Based on the research gap discussed in Sec. 2, a solution interval method for least-squares parameter estimation in nonlinear models is introduced in this section. This method includes three algorithms that require different levels of computational power to find the optimal parameter estimators. Importantly, it is not necessary to guess the initial value for each parameter in a nonlinear model using the method.

*m*data points [

*x*

_{1(i)},

*x*

_{2(i)}, …,

*x*,

_{q(i)}*y*], where

_{(i)}*i*∈ {1, 2, …,

*m*}. Practitioners want to fit a nonlinear model

*y*=

*f*(

**,**

*x***) to the**

*θ**m*data points and find the optimal values for parameters

**that minimize the squared error of the fit, where**

*θ***= [**

*θ**θ*

_{1},

*θ*

_{2}, …,

*θ*] represents

_{n}*n*constant parameters (

*n*<

*m*) in the nonlinear model, and

**= [**

*x**x*

_{1},

*x*

_{2}, …,

*x*]. Here,

_{q}*y*=

*f*(

**,**

*x***) is a specified nonlinear model, and model selection is beyond the scope of this paper. The least-squares error of the fit is**

*θ***.**

*θ**parameter solution equations*.

*n*data points $[x1(ik),x2(ik),\u2026,xq(ik),y(ik)]$ are chosen from the

*m*data points [

*x*

_{1(i)},

*x*

_{2(i)}, …,

*x*,

_{q(i)}*y*], where

_{(i)}*k*∈ {1, 2, …,

*n*},

*i*∈ {1, 2, …,

_{k}*m*}, and

*i*∈ {1, 2, …,

*m*}. There are

*h*combinations for choosing

*n*data points $[x1(ik),x2(ik),\u2026,xq(ik),y(ik)]$ from the

*m*data points [

*x*

_{1(i)},

*x*

_{2(i)}, …,

*x*,

_{q(i)}*y*], where

_{(i)}*n*equations are derived when any chosen

*n*data points are plugged into the nonlinear model

*y*=

*f*(

**,**

*x***) respectively as**

*θ*The *n* equations represented by Eq. (14) are called *parameter solution equations*. The unknown variables in these *n* equations are ** θ** = [

*θ*

_{1},

*θ*

_{2}, …,

*θ*]. If the

_{n}*n*equations represented by Eq. (14) only have one solution for one combination as

*θ**= [*

_{(s)}*θ*

_{1(s)},

*θ*

_{2(s)}, …,

*θ*], where

_{n(s)}*s*∈ {1, 2, …,

*h*}, there are

*h*solutions for each parameter

*θ*in total, where

_{j}*j*∈ {1, 2, …,

*n*}. Of note, although

*y*=

*f*(

**,**

*x***) is a nonlinear model, the**

*θ**n*equations represented by Eq. (14) may be linear equations or nonlinear equations that have analytical solutions. Numerical methods, such as the bisection method, Brent’s method, and the Newton–Raphson method [19], could be employed to solve these

*n*equations when analytical solutions are not available. If there are any bounds applied to parameter

*θ*(e.g.,

_{j}*θ*

_{lb}<

*θ*< $\theta ub\xaf$, where

_{j}*θ*

_{lb}and $\theta ub\xaf$ are the lower bound and upper bound of parameter

*θ*), it is only necessary to find the solution of Eq. (14) for

_{j}*θ*that satisfies these bounds.

_{j}*h*solutions for each parameter

*θ*are sorted from the smallest solution

_{j}*θ*

_{j}^{min}to the largest solution

*θ*

_{j}^{max}, where

*θ*

_{j}

^{min}≤

*θ*≤

_{j}_{(s)}*θ*

_{j}^{max},

*j*∈ {1, 2, …,

*n*}, and

*s*∈ {1, 2, …,

*h*}. The empirical probability distribution of each parameter

*θ*could be built through setting the same probability, 1/

_{j}*h*, at each solution

*θ*

_{j(}_{1)},

*θ*

_{j}_{(2)}, …,

*θ*if necessary. The length of the interval [

_{j(h)}*θ*

_{j}^{min},

*θ*

_{j}^{max}] is defined as

*θ*

_{j}^{min},

*θ*

_{j}^{max}] is

It is proven in Appendix A that the optimal estimator for each parameter *θ _{j}* that minimizes the squared error

*S*belongs to the interval [

*θ*

_{j}

^{mid}−

*L*,

_{j}*θ*

_{j}^{mid}+

*L*] when the monotonic conditions and the symmetric condition defined by Eqs. (A8)–(A10) are satisfied. [

_{j}*θ*

_{j}^{mid}−

*L*,

_{j}*θ*

_{j}^{mid}+

*L*] is called the

_{j}*solution interval*of parameter

*θ*. Future research may relax these conditions given by Eqs. (A8)–(A10) in Appendix A and define a solution interval for the more general case or derive the solution interval on a case-by-case basis.

_{j}Here, three algorithms that require different levels of computational power are provided to find the optimal estimator for each parameter in a nonlinear model using the solutions of *parameter solution equations*, *θ _{j(s)}*, and the solution interval [

*θ*

_{j}^{mid}−

*L*,

_{j}*θ*

_{j}^{mid}+

*L*] for each parameter

_{j}*θ*. As the first algorithm, the optimal parameter estimators that minimize the squared error

_{j}*S*could be found through an exhaustive search, also called grid search, with specified significant digits (e.g., four significant digits) in the

*solution interval*of each parameter

*θ*if abundant computational power is available. The second algorithm could be employed if exhaustive search is not feasible. For example, the solution intervals derived by Eqs. (15) and (16) may be extremely large because of outliers in the data set or numerical ill-conditioning in solving the parameter solution equations. The median of the

_{j}*h*solutions of each parameter

*θ*,

_{j}*θ*

_{j}^{med}could be found in that case. The median of parameter solutions,

**= [**

*θ*^{med}*θ*

_{1}

^{med},

*θ*

_{2}

^{med}, …,

*θ*

_{n}^{med}], is supplied as the initial values to a currently available iterative least-squares local method (e.g., the Gauss–Newton method or the Levenberg–Marquardt method) to search the optimal estimator for each parameter.

It is necessary to solve the *n* equations represented by Eq. (14)*h* times using either algorithms mentioned above (i.e., exhaustive search in the solution interval or use the median of *h* parameter solutions as initial values of an iterative least-squares local method). In some cases, it may be expensive to solve these *n* equations *h* times. For example, a nonlinear model with 10 parameters is fit to 1000 data points (i.e., *m* = 1000, and *n* = 10). There are 2.63 × 10^{23} combinations to choose 10 data points from the 1000 data points (i.e., *h* = 2.63 × 10^{23}). Enormous computational power is required to solve the 10 equations represented by Eq. (14) 2.63 × 10^{23} times if analytical solutions of these 10 equations are not available, and a numerical method has to be employed. The following stochastic algorithm, as the third algorithm, is an alternative to search for the optimal value for each parameter in a nonlinear model when there is not enough computational power available to solve these *n* equations *h* times.

A flowchart of the stochastic algorithm appears in Fig. 1. As defined earlier, there are *h* combinations to choose *n* data points from the *m* data points (*n* < *m*). Three combinations are chosen from the *h* combinations randomly without replacement. Each combination corresponds to one solution for each parameter *θ _{j}* through solving the

*n*equations represented by Eq. (14), where

*j*∈ {1, 2, …,

*n*}. A solution interval [

*a*

_{j}^{(1)},

*b*

_{j}^{(1)}] = [

*θ*

_{j}

^{mid}−

*L*,

_{j}*θ*

_{j}^{mid}+

*L*] could be built for each parameter

_{j}*θ*using these three solutions. The median of these three solutions is used as the initial value for each parameter

_{j}*θ*of a specified iterative least-squares local method (e.g., the Gauss–Newton method or the Levenberg–Marquardt method) to search for the optimal parameter estimators. The search result for parameter

_{j}*θ*is denoted by

_{j}*θ*

_{j}^{(1)}. If search result

*θ*

_{j}^{(1)}for each parameter

*θ*belongs to the corresponding solution interval [

_{j}*a*

_{j}^{(1)},

*b*

_{j}^{(1)}],

*θ*^{(}^{1)}= [

*θ*

_{1}

^{(1)},

*θ*

_{2}

^{(1)}, …,

*θ*

_{n}^{(1)}] is the final parameter estimators. If any search result

*θ*

_{j}^{(1)}does not belong to the corresponding solution interval [

*a*

_{j}^{(1)},

*b*

_{j}^{(1)}], another two combinations are then chosen from the (

*h*-3) combinations randomly without replacement. A solution interval [

*a*

_{j}^{(2)},

*b*

_{j}^{(2)}] = [

*θ*

_{j}^{mid—}

*L*,

_{j}*θ*

_{j}^{mid}+

*L*] is built using all five available solutions for each parameter

_{j}*θ*, and the median of the five solutions is used as the initial value for each parameter

_{j}*θ*of the specified iterative least-squares local method. If search result

_{j}*θ*

_{j}^{(2)}for each parameter

*θ*belongs to the corresponding solution interval, [

_{j}*a*

_{j}^{(2)},

*b*

_{j}^{(2)}],

*θ*^{(2)}= [

*θ*

_{1}

^{(2)},

*θ*

_{2}

^{(2)}, …,

*θ*

_{n}^{(2)}] is the final parameter estimators. If any search result

*θ*

_{j}^{(2)}does not belong to the corresponding solution interval [

*a*

_{j}^{(2)},

*b*

_{j}^{(2)}], another two combinations are then chosen from the remaining combinations randomly without replacement repeatedly until the new search result

*θ*

_{j}

^{(t)}for each parameter

*θ*belongs to the corresponding solution interval [

_{j}*a*,

_{j}^{(t)}*b*], where

_{j}^{(t)}*t*<

*h*/2. The convergence rate of the stochastic algorithm depends on the specified nonlinear model and the iterative least-squares local method employed in each iteration step. A case study using the stochastic algorithm could be found in Sec. 5.2 where Fick’s second law is fit to experimental data and the Levenberg–Marquardt method is employed in each iteration step. The stochastic algorithm reaches 100% convergence within five iteration steps, and the parameter solution equation represented by Eq. (14) is solved less than 0.2*

*h*times in the case study. The convergence rate of the stochastic algorithm could be investigated through more case studies in future research.

As stated earlier, the solution interval method provides practitioners three algorithms to choose from based on available computational power. The first algorithm is more computationally expensive than the other two algorithms. However, the first algorithm is complete since exhaustive search in the solution interval of each parameter is performed. It is proven in Appendix A that the optimal estimator for each parameter belongs to its corresponding solution interval when the monotonic conditions and the symmetric condition defined by Eqs. (A8)–(A10) are satisfied. The soundness of the second algorithm (i.e., using median of solutions as initial value) and the third algorithm (i.e., the stochastic algorithm shown in Fig. 1) depends on whether a sound iterative local method is employed in these algorithms. Of note, the third algorithm is equivalent to the second algorithm when the stochastic algorithm does not converge before all *h* combinations are chosen. All *h* combinations have been chosen when the third algorithm terminates in the end, and *h* solutions are derived for each parameter. The median of the *h* parameter solutions is then taken as the initial value for the specified iterative local method, which is identical to the second algorithm.

## 4 Four Steps to Estimate Parameter Values in a Nonlinear Model

In Sec. 3, a generic least-squares method, called the solution interval method, is introduced to estimate parameter values in a nonlinear model. The method includes three algorithms that require different levels of computational power to find the optimal parameter estimators, and practitioners do not need to guess the initial value for each parameter in a nonlinear model using the method. In this section, the procedure to implement the solution interval method is summarized in four steps. These four steps are illustrated in Fig. 2, and the three algorithms appear as the two blocks on the right side of step 2 and step 3 and the block of step 4 in Fig. 2. Practitioners can follow these guidelines to choose an appropriate algorithm and find the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit.

*Step 1—Data collection and model selection*. Practitioners first collect data (i.e.,*m*data points) from scientific experiment, observational study, simulation, or survey. An appropriate mathematical model with*n*unknown parameters (*n*<*m*) is selected from several candidate models to fit the data. If the mathematical model is linear, linear regression is applied to solve the parameter values in the model.*Step 2—Parameter solution equations derivation.*If the chosen mathematical model is nonlinear, the expressions of*n*equations represented by Eq. (14) are derived from the specified nonlinear model. These*n*equations are then solved analytically. If analytic solutions are not available, a numerical method (e.g., the bisection method, Brent’s method, or the Newton–Raphson method) should be specified to solve these*n*equations. Practitioners evaluate whether there is abundant computational power to solve these*n*equations*h*times using the numerical method, where*h*is defined in Eq. (13). If limited computational power is available, practitioners should use the stochastic algorithm shown in Fig. 1 to estimate the value for each parameter in the nonlinear model.*Step 3—Solution interval construction.*If there is abundant computational power to solve the*n*equations represented by Eq. (14)*h*times,*n*data points are chosen from the*m*data points exhaustively and plugged into these*n*equations, respectively.*h*solutions are derived for each parameter in the nonlinear model. These*h*solutions are sorted from the smallest solution to the largest solution, and a solution interval [*θ*_{j}^{mid}−*L*,_{j}*θ*_{j}^{mid}+*L*] is constructed for each parameter_{j}*θ*using Eqs. (15)–(16). Practitioners evaluate whether there is abundant computational power to perform an exhaustive search within the solution interval of each parameter to find the optimal estimator. If limited computational power is available, practitioners should use the median of these_{j}*h*solutions of each parameter as the initial value of an iterative least-squares local method (e.g., the Gauss–Newton method or the Levenberg–Marquardt method) to search for the optimal estimator for each parameter.*Step 4—Optimal parameter estimator search.*If abundant computational power is available for exhaustive search in the solution interval of each parameter, practitioners specify the significant digits (e.g., four significant digits) required for final parameter estimators and perform an exhaustive search, also called grid search, in solution intervals to find the optimal estimator for each parameter in the nonlinear model that minimizes the squared error of the fit.

## 5 Case Studies Using Michaelis–Menten Model and Fick’s Second Law

To illustrate the four-step procedure outlined in Sec. 4, the Michaelis–Menten model and Fick’s second law are fit to experimental data sets, respectively, as two case studies in this section. These two case studies show that the optimal least-squares parameter estimators belong to the solution intervals defined in Sec. 3 (i.e., [*θ _{j}*

^{mid}−

*L*,

_{j}*θ*

_{j}

^{mid}+

*L*] for parameter

_{j}*θ*), although it is difficult to prove analytically whether the Michaelis–Menten model and Fick’s second law satisfy the conditions defined by Eqs. (A8)–(A10). These optimal least-squares parameter estimators have also been found more efficiently using the solution interval method compared with using currently available methods (e.g., the linear transformation method and iterative least-squares local methods) in these case studies. These results validate the solution interval method and associated assumptions.

_{j}### 5.1 Fitting Michaelis–Menten Model to Experiment Data.

*m*= 12) as shown in Table 1, and the Michaelis–Menten model

*θ*

_{1}and

*θ*

_{2}(i.e.,

*n*= 2), where

*x*represents substrate concentration in the unit of parts per million (ppm), and

*y*denotes the initial velocity of the reaction in the unit of counts/min

^{2}.

*θ*

_{1}and

*θ*

_{2}in the Michaelis–Menten model. There are

*h*combinations to choose two data points $[x(i1),y(i1)]$ and $[x(i2),y(i2)]$ from the 12 data points, where

*i*∈ {1, 2, …, 12},

_{1}*i*∈ {1, 2, …, 12}. Of note, as shown in Table 1, there are six combinations where the two chosen data points have different initial velocities of the reaction but the same substrate concentration (i.e., $y(i1)\u2260y(i1),x(i1)=x(i2)$). These six combinations are removed, and the remaining 60 combinations are used in the following steps.

_{2}*θ*

_{1}and

*θ*

_{2}as

*θ*

_{1}and

*θ*

_{2}are derived from Eqs. (21)–(22) as

*θ*

_{1}and

*θ*

_{2}are solved for 60 combinations using Eqs. (23)–(24). The solution distributions of parameters

*θ*

_{1}and

*θ*

_{2}appear in Figs. 3 and 4, respectively, where 112.5 ≤

*θ*

_{1}≤ 295.8, and −0.005646 ≤

*θ*

_{2}≤ 0.1476. The solution intervals for parameters

*θ*

_{1}and

*θ*

_{2}are derived based on Eqs. (15)–(16) as

*θ*

_{1}and

*θ*

_{2}as

*θ*

_{1}= 212.7 and

*θ*

_{2}= 0.06412, which is the same result as previous studies produce [9,16]. The coefficient of determination,

*R*

^{2}, for this result is 0.7206. The residual sum of squares, RSS, is 1195.

Although exhaustive search in solution intervals is feasible in this case study, it is more efficient to find the optimal least-squares parameter estimators by using the median of the 60 solutions of each parameter. It is not necessary to employ the stochastic algorithm shown in Fig. 1 in this case study because Eqs. (19)–(20) have analytic solutions, and it is not expensive to solve Eqs. (19)–(20) 60 times. The median of the 60 solutions for parameter *θ*_{1}, *θ*_{1}^{med}, is 213.7, and the median of the 60 solutions for parameter *θ*_{2}, *θ _{2}*

^{med}, is 0.06693. Of note, these medians are close to the optimal least-squares estimators for parameters

*θ*

_{1}and

*θ*

_{2}derived by exhaustive search. These two medians are supplied as the initial values of the Levenberg–Marquardt method, and the same optimal parameter estimators (i.e.,

*θ*

_{1}= 212.7 and

*θ*

_{2}= 0.06412) are derived by the Levenberg–Marquardt method.

*R*and RSS) of the solution interval method could be compared with the linear transformation method. Equation (17) can be linearized as [9]

^{2}*y**= 1/

*y*and

*u*= 1/

*x*. The result of linear regression is

*β*

_{0}= 0.005107 and

*β*

_{1}= 0.0002472, so

*θ*

_{1}= 195.8 and

*θ*

_{2}= 0.04841. Figure 5 shows the fitted curves with the original data using the solution interval method and the linear transformation method. The original data points have been distorted by the linear transformation shown in Eq. (27), so the linear model, Eq. (28), does not fit the original data well when the substrate has high concentration. The curve fitting accuracy of the linear transformation method is lower than that of the solution interval method. The coefficient of determination,

*R*, for the fit using the linear transformation method is 0.5513, and the residual sum of squares, RSS, is 1920.

^{2}The effectiveness of the solution interval method also could be compared with extant iterative least-squares local methods. Here, the Levenberg–Marquardt method is employed to fit the Michaelis–Menten model to the experimental data set, and 1000 random values are chosen as initial guesses for parameters *θ*_{1} and *θ*_{2}, respectively. The results show that only three out of the 1000 random initial guesses lead the Levenberg–Marquardt iterative algorithm to converge to the optimal parameter estimators (i.e., *θ*_{1} = 212.7 and *θ*_{2} = 0.06412). In other words, practitioners need to run the Levenberg–Marquardt iterative algorithm with a random initial guess 333 times on average to find the optimal parameter estimators. In practice, practitioners may run the Levenberg–Marquardt iterative algorithm with a random initial guess much more than 333 times to check whether the best available parameter estimators are the optimal estimators. The curve fitting results using the Levenberg–Marquardt method with six different initial guesses are shown in Table 2. These results in Table 2 indicate that the estimators of parameters *θ*_{1} and *θ*_{2} derived through the Levenberg–Marquardt method are sensitive to the initial guess. The Levenberg–Marquardt iterative algorithm converges to the optimal parameter estimators (i.e., *θ*_{1} = 212.7 and *θ*_{2} = 0.06412) only when the initial guesses of the parameters are close to their optimal estimators. In contrast, the solution interval method is less computationally expensive than the Levenberg–Marquardt method with random initial guesses in this case study. Practitioners only need to solve the linear Eqs. (19)–(20) 60 times and take the median of the 60 solutions using the solution interval method. As stated earlier, the optimal parameter estimators can be found when the median of parameter solutions provided by the solution interval method is used as the initial value for each parameter.

### 5.2 Fitting Fick’s Second Law to Experiment Data.

*h*exposes on two sides to the same moist environment. The plate is assumed to be infinite in the

*y*and

*z*directions so that the moisture content inside the plate varies only in the

*x*-direction. The temperature in the moist environment and the temperature inside the plate are the same and do not change with time. In Eqs. (29)–(31),

*c*is the moisture concentration;

*c*is the initial moisture concentration;

_{i}*c*is the moisture concentration at the surface; and

_{m}*D*is the mass diffusivity. The solution of Eqs. (29)–(31) is given by [46,47]

*c*over the plate thickness

*W*, for moisture absorption. The specimen is then placed in a conditioning chamber, which has previously reached a specified steady-state moist environment. At the beginning of the test (

_{b}*t*= 0), moisture concentration,

*c*, and the total weight of moisture,

_{i}*m*, are equal to zero because the specimen is dry. The specimen is removed from the conditioning chamber and weighted at the end of each specified weighting time interval. The weight of the specimen in each measurement is recorded as

_{i}*W*. The percent specimen mass change Δ

_{t}*M*is given by

*W*. Effective moisture equilibrium content,

_{f}*M*, is

_{m}*M*, of the epoxy plate. There are 57 data points (i.e.,

*m*= 57), as shown in Fig. 6, and the last data point is employed as effective moisture equilibrium content,

*M*[6].

_{m}*D*, is the only parameter in Eq. (37) (i.e.,

*n*= 1). There are

*h*combinations to choose one data point (

*t*, Δ

*M*) from the 57 data points, where

*D*. Analytic solution is not available in this case. Brent’s method [50], which is a combination of the bisection method, the secant method, and the inverse quadratic interpolation, is employed to solve mass diffusivity,

*D*, in Eq. (37) numerically. A wide range [0, 1] is set as the initial interval for Brent’s method since it is assumed that prior knowledge about the range of the mass diffusivity of the epoxy plate is not available. The solution distribution of mass diffusivity,

*D*(mm

^{2}/s), solving by Brent’s method appears in Fig. 7, where 2.411 × 10

^{−8}≤

*D*≤ 4.045 × 10

^{−7}. Of note, only 43 solutions are obtained from solving Eq. (37) numerically for the 57 combinations because the last data point is employed as

*M*[6], and Eq. (37) has no solution for mass diffusivity,

_{m}*D*, when percent mass change, Δ

*M*, is greater than or equal to

*M*. The solution interval for mass diffusivity is derived based on Eqs. (15)–(16) as

_{m}The left bound of the solution interval could be truncated at *D* = 0 because mass diffusivity is always non-negative. Exhaustive search with four significant digits in this truncated solution interval (i.e., 0 ≤ *D* ≤ 5.947 × 10^{−7}) yields the optimal least-squares estimator for the mass diffusivity as *D* = 6.414 × 10^{−8} mm^{2}/s, which is the same result as exhaustive search in the wide range [0, 1] (i.e., 0 ≤ *D* ≤ 1), as shown in Fig. 8. The coefficient of determination, *R*^{2}, for this result is 0.9555. The residual sum of squares, RSS, is 1.511 × 10^{−4}.

It is expensive to perform an exhaustive search in the solution interval in this case study because Eq. (37) includes the summation of infinite terms. The median of the 43 solutions, *D*^{med} = 5.662 × 10^{−8} mm^{2}/s, could be supplied as the initial value of the Levenberg–Marquardt method, and the same optimal mass diffusivity estimator (i.e., *D* = 6.414 × 10^{−8} mm^{2}/s) is derived by the Levenberg–Marquardt method. Like the case study using the Michaelis–Menten model in Sec. 5.1, the median is close to the optimal estimator for the mass diffusivity, *D*, derived by exhaustive search in this case study.

The stochastic algorithm shown in Fig. 1 could be employed to find the optimal estimator for the mass diffusivity in Eq. (37) more efficiently because the stochastic algorithm usually does not require solving Eq. (37) numerically 57 times. The Levenberg–Marquardt method is used as the iterative least-squares local method in the stochastic algorithm. In each iteration step, a new combination is chosen without replacement and plugged into Eq. (37) to solve *D* if Eq. (37) has no solution for the previous combination. To test the convergence of the stochastic algorithm, the stochastic algorithm shown in Fig. 1 has been performed 100 times, and the results show that the stochastic algorithm always converges to the optimal mass diffusivity estimator (i.e., *D* = 6.414 × 10^{−8} mm^{2}/s). Specifically, the algorithm converges within one iteration step (i.e., using three solutions) 86 times, converges within two iteration steps (i.e., using five solutions) 98 times, and converges within five iteration steps (i.e., using 11 solutions) 100 times. The solution interval derived by five iteration steps using 11 solutions covers 51.36% of the full solution interval using all 43 solutions (i.e., −1.661 × 10^{−7} ≤ *D* ≤ 5.947 × 10^{−7}) on average in this case study.

*R*and RSS) of the solution interval method could be compared with the linear approximation method suggested by ASTM D5229 [6]. Equation (37) approximates a linear relationship between the percent specimen mass change, Δ

^{2}*M*, and a square root of time,

*√t*, for a short time after the specimen is exposed to the moist environment [47]. In ASTM D5229, the mass diffusivity of the specimen is calculated through [6,47]

*h*is the specimen thickness,

*M*is the effective moisture equilibrium content defined by Eq. (36), (

_{m}*M*)/(

_{2}–M_{1}*√t*–

_{2}*√t*) is the slope of moisture absorption plot in the initial linear portion of the

_{1}*√t*–Δ

*M*curve. In practice, it is often difficult to identify the endpoint of the initial linear portion of the

*√t*–Δ

*M*curve visually. Researchers who published the experimental data shown in Fig. 6 take the first nine data points as the linear portion of

*√t*–Δ

*M*curve, which is a reasonable visual pick from the curve. The last data point is employed as effective moisture equilibrium content,

*M*[6]. The mass diffusivity derived through Eq. (40) is

_{m}*D*= 1.224 × 10

^{−7}mm

^{2}/s, which is 1.91 times of the optimal mass diffusivity estimator

*D*= 6.414 × 10

^{−8}mm

^{2}/s. Figure 6 shows the fitted curves using the solution interval method and the linear approximation method suggested by ASTM D5229. The linear approximation method leads to a poor fit in the middle portion of the curve (i.e., 15 <

*√t*< 45) compared with the solution interval method because the linear approximation method only takes the first nine data points into account. In addition, the value of the mass diffusivity derived through Eq. (40) varies significantly if a different data point is chosen as the endpoint of the initial linear portion of the

*√t*–Δ

*M*curve. The mass diffusivity results given by Eq. (40) using seven different endpoints appear in Table 3. These results show that the mass diffusivity derived through Eq. (40) could vary more than 60% using different endpoints to make linear approximations for the initial portion of the

*√t*–Δ

*M*curve. The

*R*

^{2}and RSS results in Table 3 also show that the curve fitting accuracy of the linear approximation method is lower than that of the solution interval method (i.e.,

*R*

^{2}= 0.9555 and RSS = 1.511 × 10

^{−4}).

The effectiveness of the solution interval method also could be compared with extant iterative least-squares local methods. As an example, the Levenberg–Marquardt method is employed to fit Eq. (37) to the experimental data set, and 1000 random values are chosen as initial guesses for mass diffusivity, *D*. The last data point is employed as effective moisture equilibrium content, *M _{m}* [6]. The results show that none of the 1000 random initial guesses lead the Levenberg–Marquardt iterative algorithm to converge to the optimal parameter estimators (i.e.,

*D*= 6.414 × 10

^{−8}mm

^{2}/s). The curve fitting results using the Levenberg–Marquardt method with nine different initial guesses are shown in Table 4. The Levenberg–Marquardt iterative algorithm converges to the optimal mass diffusivity estimator (i.e.,

*D*= 6.414 × 10

^{−8}mm

^{2}/s) only when the initial guess is close to the optimal estimator. As shown in Fig. 8, the curve of RSS is flat when the value of the mass diffusivity is larger than 10

^{−4}mm

^{2}/s, and the Levenberg–Marquardt iterative algorithm therefore cannot converge to the optimal mass diffusivity estimator if the initial guess is chosen in that flat region. In contrast, the solution interval method establishes a solution interval for the mass diffusivity, shown as the gray area in Fig. 8. The solution interval significantly reduces the search space for the optimal estimator of the mass diffusivity in this case study. As stated earlier, any of the three algorithms associated with the solution interval method (i.e., exhaustive search in solution interval, using median of solutions as the initial value, and the stochastic algorithm shown in Fig. 1) can find the optimal estimator of the mass diffusivity more effectively in this case study.

## 6 Conclusions and Future Work

A solution interval method is introduced for least-squares parameter estimation in nonlinear models. The novelty of this method is that practitioners do not need to guess the value for each parameter in a nonlinear model in order to initialize the method. The solution interval method includes three algorithms (i.e., blue blocks in Fig. 2) that require different levels of computational power to find the optimal parameter estimators. The method provides the empirical probability distribution of each parameter in a nonlinear model. A solution interval for each parameter in a nonlinear model is also constructed by the method. It is proved that the optimal estimator for each parameter in a nonlinear model that minimizes the squared error of the fit belongs to its corresponding solution interval under certain conditions. Four steps to implement the solution interval method in practice are outlined. The Michaelis–Menten model and Fick’s second law are fit to experimental data sets respectively as two case studies to illustrate these steps and validate the method.

Based on available computational power, practitioners can find the optimal least-squares parameter estimators in a specified nonlinear model efficiently using one of the three algorithms associated with the solution interval method. The solution intervals constructed by the method significantly reduce the search space for the optimal parameter estimators. The empirical probability distribution of each parameter is valuable for researchers to assess parameter uncertainty that pertains to curve fitting.

This work has limitations that provide opportunities for future research. As stated in Sec. 3, parameter solution equations represented by Eq. (14) are derived when the chosen data points are plugged into the nonlinear model. In this paper, it is assumed that these parameter solution equations only have one solution for each parameter. Future research may develop methodologies to find the optimal parameter estimators when these equations have more than one solution for each parameter. Moreover, the solution interval method presented in this paper provides an empirical probability distribution for each parameter in the nonlinear model. Future research could explore how to utilize these probability distributions. To assess parameter uncertainty, these probability distributions may be employed as prior distributions of a Bayesian approach. These probability distributions may be used to identify potential outliers in a data set as well. For example, if a data point leads to its corresponding parameter solution equation(s) which have no solution or the solution locates at the margin of the probability distribution for each parameter, practitioners need to check whether that data point is an outlier. Practitioners may also allocate the computational resource in an exhaustive search (i.e., the first algorithm in Sec. 3) based on the empirical probability distribution of each parameter in the nonlinear model when the solution interval is extremely large. For example, a finer grid may be applied in the high-density regions of the probability distribution for each parameter. In addition, the solution interval of each parameter is defined as [*θ _{j}*

^{mid}−

*L*,

_{j}*θ*

_{j}^{mid}+

*L*] rather than [

_{j}*θ*

_{j}^{min},

*θ*

_{j}^{max}] in this paper because it is proven in Appendix A that the optimal estimator for each parameter

*θ*that minimizes the squared error of the fit belongs to the interval [

_{j}*θ*

_{j}^{mid}−

*L*,

_{j}*θ*

_{j}

^{mid}+

*L*] when conditions given by Eqs. (A8)–(A10) are satisfied. However, two case studies in Sec. 5 show that the optimal parameter estimator for each parameter belongs to [

_{j}*θ*

_{j}^{min},

*θ*

_{j}^{max}]. It may not be necessary to extend the solution interval from [

*θ*

_{j}^{min},

*θ*

_{j}^{max}] to [

*θ*

_{j}^{mid}−

*L*,

_{j}*θ*

_{j}^{mid}+

*L*]. Future research may relax the conditions given by Eqs. (A8)–(A10), which are stronger than convexity for the least-squares error $R(\theta ^)$, and define a solution interval for the more general case or develop a procedure to derive the solution interval on a case-by-case basis. Lastly, both the number of unknown parameters in the nonlinear models and the number of data points are small (i.e.,

_{j}*n*< 3 and

*m*< 60) in the Michaelis–Menten model case study and Fick’s second law case study in Sec. 5. The solution interval method presented in this paper may be applied to high-dimensional (i.e., large

*n*) and big data (i.e., large

*m*) engineering problems in future research.

## Footnote

There is a typo in Ref. [47] for this equation. The summation should begin with *j* = 0 rather than *j* = 1.

## Acknowledgment

This material is based upon a work supported by the Defense Advanced Research Projects Agency through cooperative agreement No. N66001-17-1-4064. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.