The focus of this paper is a strategy for making a prediction at a point where a function cannot be evaluated. The key idea is to take advantage of the fact that prediction is needed at one point and not in the entire domain. This paper explores the possibility of predicting a multidimensional function using multiple one-dimensional lines converging on the inaccessible point. The multidimensional approximation is thus transformed into several one-dimensional approximations, which provide multiple estimates at the inaccessible point. The Kriging model is adopted in this paper for the one-dimensional approximation, estimating not only the function value but also the uncertainty of the estimate at the inaccessible point. Bayesian inference is then used to combine multiple predictions along lines. We evaluated the numerical performance of the proposed approach using eight-dimensional and 100-dimensional functions in order to illustrate the usefulness of the method for mitigating the curse of dimensionality in surrogate-based predictions. Finally, we applied the method of converging lines to approximate a two-dimensional drag coefficient function. The method of converging lines proved to be more accurate, robust, and reliable than a multidimensional Kriging surrogate for single-point prediction.
Introduction
In surrogate modeling, it is common to sample a function at scattered points and fit the samples with an explicit function to approximate function values at unsampled points [1,2]. The estimation procedure is called interpolation when target points are inside the convex hull of sampled points, while it is termed extrapolation [3,4] otherwise, as shown in Fig. 1. For one-dimensional samples, for example, the convex hull is set by the lower and upper bounds of samples. Surrogate models are routinely used as interpolation schemes and may be inadequate for extrapolation, which is commonly encountered while approximating high-dimensional functions [5]. A surrogate is especially useful to estimate function value when the point of interest cannot be sampled via simulation or experiment due to extreme conditions, lack of data, limitations of simulation software, or an experiment that is too dangerous to perform. For example, 16 out of the 500 simulations were reported to have failed for an aeroelastic helicopter blade simulation in Glaz et al. [6] during design optimization.
Several valuable efforts have been initiated toward reliable estimation at inaccessible points. Neural networks, an advanced framework, have been reported to be misleading for extrapolation beyond the region of samples [4] and are therefore mostly used for interpolation. Balabanov et al. [7] and Hooker and Rosset [8] demonstrated that regularization methods can improve extrapolation accuracy beyond sample range even if they compromise interpolation accuracy. Wilson and Adams [9] proposed a Gaussian process kernel function for better uncovering data patterns and leading to improved prediction accuracy of forecasting. Richardson extrapolation has been commonly adopted to predict the response to finite element analysis toward extremely refined mesh that is unaffordable [10,11]. A specific polynomial form was used for Richardson extrapolation to incorporate physical knowledge about convergence order. Hooker [12] proposed an empirical trend function when predicting toward inaccessible point. In prognosis for remaining life of mechanical systems, application of Gaussian process surrogate and neural network model has been discussed based on a heuristic study [13]. Besides, numerous papers can be found on forecasting one-dimensional time series data at inaccessible points (i.e., future) in financial fields and weather forecasting, e.g., see Refs. [14,15].
Based on this literature, it is challenging to estimate prediction accuracy at inaccessible points. Surrogate predictions, in particular, risk increasing uncertainty when prediction points are far from samples [7,16,17]. This motivated us to estimate a function value by creating multiple independent predictions from different data sets. Introducing multiple predictions may reduce the uncertainty of blind predictions. A new sampling pattern, called the method of converging lines, is proposed in this work. As a first step toward improving the prediction capability for long-distance prediction using surrogate model, we explore the possibility of predicting a multidimensional function using multiple one-dimensional lines. The main idea is to select samples along lines toward the inaccessible point. One-dimensional surrogates are then constructed based on the samples along each line in order to predict the function value at the inaccessible point. The method of converging lines transforms multidimensional function prediction into a series of one-dimensional predictions, which provide multiple estimates at the inaccessible point. Combining multiple predictions based on Bayesian inference is proposed to enhance the prediction accuracy.
The paper is organized as follows: In Sec. 2, we present the concept and technical details for the method of converging lines. In Sec. 3, we examine numerical properties of proposed approach using two algebraic functions. In Sec. 4, we apply the method of converging lines to approximate a two-dimensional drag coefficient function to demonstrate potential application. One-dimensional surrogate prediction is performed using ordinary Kriging due to its excellence for interpolating noise-free function.
Method of Converging Lines
Principal Idea.
The procedure to select locations where the function is to be sampled is called the design of experiments (DOE). For fitting a surrogate, it is common to select space-filling DOEs (e.g., Latin hypercube sampling (LHS) [18]), which are geared to provide a good representation of the design domain. The method of converging lines, in contrast, serves the prediction at a single target point, . This method locates sampling points along several lines all intersecting at the target point.
The difference between LHS and the sampling scheme of the method of converging lines in two- and three-dimensional space is illustrated in Fig. 2. In the figure, the target point, , is at the origin, and the domain within a normalized distance of 0.5 from is set to be inaccessible. Three converging lines are shown in Fig. 2(b). The orientations of the line may vary with applications in order to generate a good estimate. Along a given line, local coordinate of jth sample point is the Euclidean distance between and as
where is a non-negative scalar and zero when . The local coordinate of the sample, , is first normalized to be within [0,1] along each dimension to eliminate the effect of different scales.
The method of converging lines is expected to be ideal in certain conditions, where sampling domain is relatively convenient to access in terms of cost and feasibility, and sampling the target point is impossible or exponentially more challenging. In this paper, we focused on long-distance prediction, namely, the length over inaccessible domain equals to that from the accessible domain. Under these two expected scenarios, it might be desirable to perform pointwise prediction using multiple lines.
One-Dimensional Function Estimation Using Ordinary Kriging.
The underlying assumption for prediction is that the function behaves similarly in the accessible domain and inaccessible domain. Therefore, surrogate models providing acceptable accuracy in sampling domain would be appropriate to estimate the function at inaccessible points within a certain distance from samples. When this assumption fails, the surrogate prediction is likely to fail, whether we use a multidimensional surrogate or the method of converging lines.
where n is the dimensionality of variable x and s, is the hyperparameter along ith direction [20]. n is 1 for applying one-dimensional prediction using the method of converging lines.
where denotes ith sample point, and is the mean of samples. The hyperparameters and are obtained by maximizing the likelihood that the data come from a Gaussian process. Ordinary Kriging is implemented using the surrogate toolbox of Viana and Goel [21].
The prediction of ordinary Kriging and the corresponding uncertainty are expressed as a conditional probability density function (PDF) for a given data set y as , which follows a student's t-distribution [22]. A student's t-distribution is well approximated by a normal distribution and converges to a normal distribution as the number of samples increases. In this paper, the surrogate prediction with uncertainty is expressed using a normal distribution for computational efficiency. The conditional PDF of a prediction is defined as a normal distribution , where is the Kriging predictor and is the standard deviation at x.
Combination of Prediction From Multiple Lines.
For making a prediction in n-dimensional space, one-dimensional surrogate allows higher accuracy than using an n-dimensional surrogate with the same number of samples. The method of converging lines proposes to combine multiple predictions at the inaccessible point with independently built multiple one-dimensional surrogates toward the point to reduce the chance of large error. The true response is predicted with multiple one-dimensional Kriging surrogates, and the prediction of the ith surrogate and the corresponding prediction uncertainty are defined through a conditional PDF for the given ith data set as , which is a normal distribution having a mean of the Kriging predictor and a variance of the prediction variance at the target point .
Based on Eqs. (4) and (5), the posterior distribution follows a normal distribution, . From the expression of in Eq. (6), it can be expected that surrogate prediction from the line with minimum prediction variance will be most influential on the posterior distribution because the combined prediction is a weighted average of the predictions based on the prediction variances.
Compared with multidimensional approximation, the method of converging lines is essentially a set of predictions from one-dimensional (1D) surrogates. The 1D surrogate is likely to be more robust than multidimensional approximation for two reasons: (a) The number of hyperparameters to be estimated for building a one-dimensional surrogate is one, while n hyperparameters have to be estimated for an n-dimensional surrogate. Therefore, the cost of evaluating samples to make a reasonable prediction is affordable regardless dimensionality while that to build n-dimensional surrogate increases excessively (the curse of dimensionality). (b) The 1D surrogate can be easily visualized reducing the chance that failure of the surrogate fitting will go unnoticed.
One important feature of the method of converging lines is that lines may have diverse performance. Some lines may be friendly for mathematical modeling, namely, enabling small and reliable prediction variance. These good lines will dominate combined prediction based on Bayesian method. By potentially improving the quality of surrogate models and combining multiple lines, the method of converging lines is expected to be more reliable regarding pointwise estimation. Another bonus feature is that the number of sampling points required along lines for accurate prediction may not depend on the dimensionality of the function. For example, if we were in 20-dimensional space, prediction with a standard LHS may require hundreds of points, while it is possible that three lines with 20 points may still suffice for prediction with the method of converging lines. The effect of dimensionality will be discussed in Sec. 3.
Several conditions may lead to large prediction errors even under the valid assumption that the inaccessible domain had a similar trend as that of the accessible domain:
- (1)
Long-distance prediction usually risks large errors. Lacking samples near inaccessible domain would make it challenging to determine surrogate parameters and interpret goodness-of-fit.
- (2)
Different surrogate models or parameter settings is a major source of uncertainty. It is more challenging to select appropriate models for predictions toward inaccessible points especially when no prior information for noise is available.
- (3)
Some lines may not be trusted due to physical complexity or inappropriate modeling. Outlier prediction could be identified and excluded when most lines have similar predictions, but one line is significantly different. Outlier prediction comes from the surrogate model with misleading estimation and prediction variance.
Note, however, that these difficulties are likely to be even more acute if the same prediction is attempted with a multidimensional surrogate, especially that only a single estimate is obtained.
Numerical Properties of Method of Converging Lines
Even though it is tough to determine what function could be approximated and how to predict toward inaccessible points, we tried to identify and discuss several important factors based on numerical test functions. We examined the effect of lines regarding prediction accuracy, compared prediction from multiple lines with that from multidimensional approximation, and effect of high-dimensionality.
Extrapolation has been reported to be more challenging than interpolation in general [19]. Therefore, the following analysis distinguishes between these two scenarios. We adopt ordinary Kriging for prediction considering it could avoid huge error regarding prediction toward inaccessible points [19].
Algebraic Illustration Functions.
For extrapolation, the target point for prediction was selected to be the vertex where all the eight variables were at their upper value, = 1, i = 1,2,…,8, as provided in Table 1. The inaccessible domain was set to have the same length as the accessible domain along all the variable directions. For interpolation, the target point was set to be for i = 1,…, 8, which is the center of variable space. The inaccessible domain was set to be the hypercube centered at the target point with a width of 0.5. Therefore, the inaccessible domain had the same length as the accessible domain.
Target point | Variable range | Inaccessible domain | |
---|---|---|---|
Extrapolation | = 1 | ||
Interpolation |
Target point | Variable range | Inaccessible domain | |
---|---|---|---|
Extrapolation | = 1 | ||
Interpolation |
For extrapolation, the target point for prediction was selected again to be the vertex, where all the variables were at their upper bounds, as provided in Table 2. The inaccessible domain was set to be a hypercube cornered at the target point and had same length as the accessible domain along all the variable directions. For interpolation, the target point was set to be , which was the center of variable space. The inaccessible domain was set to be the hypercube centered at the target point with a width of 1. Therefore, the inaccessible domain had the same length as the accessible domain.
Method of Converging Lines Versus Multidimensional Approximation.
We first compared the proposed approach with multidimensional approximation based on Dette function for extrapolation and interpolation, respectively.
Extrapolation.
For the method of converging lines, three lines were selected toward the target point. The other ends of the lines were randomly selected from the rest of the 28 vertices of the domain. We selected 100 sets of lines randomly to compensate for the effect of line selection. Six samples were evenly spaced in the accessible domain. A typical set of extrapolation results based on ordinary Kriging was shown in Fig. 3. The function along line 1 in Fig. 3(a) is unimodal like a bowl. Figure 3(b) showed line 2 which is wavy. Line 3 in Fig. 3(c) was close-to-linear and most friendly for approximation. In the case in Fig. 3, combined prediction was close to line 3, which had a much smaller prediction variance. Note that even without knowledge of the true function, the ability to visualize the data and predictions shown in Fig. 3 is useful for spotting which surrogate is likely to be trusted.
For a multidimensional approximation, 18 samples were selected using LHS with 5000 iterations with the matlab function lhsdesign. Sampling domain was restricted to be the same as that to generate multiple lines. The inaccessible domain was a hypercube cornered at the target point.
One hundred sets of predictions were generated using multiple lines and multiple LHS designs. Prediction accuracy was quantified using absolute percent error and is summarized using boxplot in Fig. 4. Prediction from multiple lines had the error median to be 0.018%, and 75% predictions were less than 0.1%. In contrast, the minimum error of multidimensional approximation was 25%.
Interpolation.
For the method of converging lines, three lines were selected to intersect at the target point. The ends of lines were randomly selected from all the vertices. A typical set of interpolation results using Kriging is shown in Fig. 5. All the lines had a bowl shape with different minima. All the Kriging models had a similar level of prediction variance and overestimated function value at the target point slightly. Again, we note that even without knowing the exact function, figures such as Fig. 5 provide some protection against the failure of the fitting process.
For a multidimensional approximation, 18 samples were generated using LHS in the accessible domain. One hundred sets of prediction were generated to estimate randomness and summarized in Fig. 6. Prediction using multiple lines had the error median to be 3.24% and maximum error to be 3.31%. In contrast, prediction using 8D Kriging had the error median to be 22.8% and minimum error to be 19.7%. Multiple lines generated more accurate predictions compared to 8D Kriging for this test function.
Applicability to Very High-Dimensionality.
We adopted Styblinski–Tang function with 100 dimensions to illustrate applicability to high-dimensionality. The dimension is too high to be approximated by any regular surrogate models with a reasonable number of samples. Therefore, the method of converging lines might be the only option for a function with such a high dimension.
For extrapolation, three lines were randomly selected from the rest of the 2100 vertices. Six sampling points were evenly spaced in the accessible domain along each line. A typical set of extrapolation results is given in Fig. 7. The test function along all the three lines was multimodal. Prediction error was small at domain close to samples and increased with prediction distance in the extrapolation domain. This example illustrates the difficulty in extrapolating long distance. Not only the prediction can deteriorate with distance, but for a function that changes its behavior, even the uncertainty estimate may be flawed.
One hundred sets of random lines were generated toward same target point to compensate for variability. The error of 100 predictions is summarized in Fig. 8. The prediction error ranged around 28%. In this case, 100-dimensional surrogates are not possible, the performance of the proposed method of converging lines was not compared with multidimensional surrogates, but the prediction error would be huge if 18 samples were able to fit a 100-dimensional surrogate.
For interpolation, three lines are selected toward the target point. The other end of lines is randomly selected from all the vertices. A typical set of interpolation results is shown in Fig. 9. All the lines have a bowl shape, and the Kriging models overestimated function value at the target point. One hundred sets of prediction are generated to compensate for randomness and summarized in Fig. 10. Prediction using multiple lines had the error between 7.4% and 9.3%. Based on the tests using the 100D Styblinski–Tang function, the method of lines enables estimation of the high-dimensional function at one point using multiple lines. Such estimation is practically impossible with a small number of samples using a multidimensional surrogate. In particular, it is impossible to fit a Kriging surrogate with 18 points in 100-dimensional space.
Long-Distance Extrapolation and Interpolation of a 2D Drag Coefficient
A drag coefficient model of a spherical particle [29] is presented in this section to demonstrate long-distance function estimation at an inaccessible point using the method of converging lines.
Introduction to the Drag Coefficient Function.
The drag coefficient is a dimensionless quantity that measures the quasi-steady drag coefficient of a spherical particle in compressible flow. For a given particle, the drag coefficient, , is a function of Mach number, M, and Reynolds number, Re. The range of applicability for in this paper is limited to the supersonic () domain to avoid the physical discontinuity at M = 1. The Reynolds number is limited to to avoid turbulence of the attached boundary layer. It is considered that the governing physics is consistent within these limits. The analytical expression for is given in in the Appendix. The dependence of the drag coefficient on Reynolds number and Mach number is shown in Fig. 11, which shows that the behavior can be made smoother by transforming Re to logarithmic coordinates. The target point, variable range for analysis, and inaccessible domain of the drag coefficient function are summarized in Table 3 and plotted in Fig. 12. For extrapolation, prediction domain is a square located on the upper right corner as shown in Fig. 12(a). For interpolation, prediction domain is set to be a square in the center of variable range as shown in Fig. 12(b). The length of sampling domain and prediction domain is the same along each line for both cases.
Extrapolation Results
Extrapolation of Drag Coefficient Function Using the Method of Converging Lines.
Instead of sampling entire range of sampling domain, six uniform samples were selected from half of the sampling domain as shown in Fig. 12(a). This was because the prediction variance decreased noticeably when samples were closer to the target point. When the inaccessible domain is in the interpolation region, the sampling scheme in Fig. 12(b) can be used, which will be considered in Sec. 4.3. Table 4 summarizes statistics of the surrogate predictions. Predictions of function value along variable M with constant Re (line 3) had the smallest prediction variance as shown in Table 4 and Fig. 13(c), indicating better accuracy along this line.
Line 1 | Line 2 | Line 3 | |
---|---|---|---|
1.021 | 0.9352 | 0.9755 | |
0.0786 | 0.0725 | 0.0003 |
Line 1 | Line 2 | Line 3 | |
---|---|---|---|
1.021 | 0.9352 | 0.9755 | |
0.0786 | 0.0725 | 0.0003 |
By substituting extrapolation results of each line from Table 4 into Eq. (7), most probable estimation can be obtained as . Note that the posterior distribution was dominated by the prediction along line 3 because the prediction variance (σ2) was much smaller than for the other lines. The small prediction variance denotes high reliability of extrapolation. Comparing with true function value 0.9766, absolute error is 0.001 and relative error is 0.1%. Note also that the less accurate lines (lines 1 and 2) provide a reasonable warning of their lack of accuracy in their prediction uncertainties, .
Comparison of 1D and 2D Surrogates.
One interesting question is how sensitive the prediction is to the location of lines. The sensitivity to the selection of lines was tested by varying the angle of lines. In Fig. 12, the angle between lines and the horizontal axis with constant M = 1.75 was set to be , respectively, where in the figure. In order to test the effect of lines, varied from to , from to , and from 85 deg to 90 deg in the increment of 1 deg. Three hundred and ninety six sets of three lines were generated based on different combinations of , , and . Extrapolation results based on each set of lines were plotted using boxplot in Fig. 14(b). All the extrapolation results concentrated near the true function value, 0.9766, varying from 0.9751 to 0.9756.
In order to examine the performance of multidimensional extrapolation, two-dimensional (2D) Kriging is adopted. Eighteen samples are selected in the reduced sampling domain using LHS (5000 iterations maximizing minimum distance) as shown in Fig. 14(a). As converging lines benefitted from the reduced domain, 2D Kriging also benefited from it based on our tests. Two-dimensional extrapolation is repeated 20 times to compensate for the randomness of sampling pattern. For a typical 2D extrapolation out of the 20 cases, i.e., corresponding to the case with median absolute error, the prediction was . In Fig. 14(b), we can see that boxplot of 2D Kriging prediction has a median equal to 0.9799 and standard deviation to be 0.0261. Two-dimensional Kriging generated larger variation and bias than converging lines even with 20 sets of data. The method of converging lines proved to be more reliable and robust for one-point extrapolation of drag coefficient function.
Interpolation Results
Interpolation of Drag Coefficient Function Using the Method of Converging Lines.
Interpolation using method of converging lines was performed similarly as shown in Fig. 12(b). We selected samples in the half of sampling domain to reduce prediction variance. Interpolation results are plotted in Fig. 15 from three lines. Predictions at target point are provided in Table 5.
Line 1 | Line 2 | Line 3 | |
---|---|---|---|
0.9408 | 0.9943 | 0.9223 | |
0.0146 | 0.0742 | 5 × 10−6 |
Line 1 | Line 2 | Line 3 | |
---|---|---|---|
0.9408 | 0.9943 | 0.9223 | |
0.0146 | 0.0742 | 5 × 10−6 |
As in the case of extrapolation, the prediction variance on line 3 is by far the smallest. By substituting prediction results of each line from Table 5 into Eq. (7), most probable estimation can be obtained as: , essentially identical to the result of line 3. Comparing with true function value, 0.9224, absolute error is 0.0001 and relative error is 0.01%.
Comparison of 1D and 2D Surrogates.
We have more freedom to select lines for interpolation while fixing intersection angle between lines. Therefore, the intersection angle between lines 1, 2, and 3 were fixed to be 45 deg successively. Three lines rotated counterclockwise gradually from 0 deg to 45 deg in 20 steps. Twenty sets of 1D interpolation results were then obtained accordingly. For 2D Kriging, 18 samples are selected in the sample domain using LHS (5000 iterations maximizing minimum distance for the following analysis as shown in Fig. 16(a). For a typical 2D interpolation out of the 20 cases, i.e., corresponding to the case with median absolute error, the prediction was . It is not only less accurate than the method of converging lines but also has a substantial underestimate of the uncertainty in the prediction in the prediction variance.
Predictions of 1D and 2D Kriging were summarized in Fig. 16(b). One-dimensional Kriging predictions varied from 0.9223 to 0.9232. Two-dimensional Kriging prediction generated a median to be 0.9255 and standard deviation equal to 0.0195. The box plot shows that for some DOEs, the 2D prediction was grossly inaccurate.
Conclusion
This paper presents a method for improving the accuracy of function estimation at an inaccessible point by taking advantage of the fact that prediction is needed at only one point instead of everywhere. The proposed method of converging lines transforms a multidimensional prediction into a series of 1D predictions. Most probable estimation is obtained by combining consistent predictions from different lines using Bayesian inference while assuming all the lines are independent.
The one-dimensional prediction was performed using Kriging surrogate model. Function prediction is only feasible under the assumption that the function behaves similarly in the sampling domain and prediction domain. We illustrated its applicability to high-dimensional functions with two algebraic examples, 8D Dette function and 100D Styblinski–Tang function. The proposed approach proved to be more accurate and reliable than multidimensional approximation for 8D Dette function. The proposed approach enabled moderately accurate estimation of 100D Styblinski–Tang function with 18 samples. The two examples also illustrated another advantage of the method, which is an easy visualization of the fitted surrogate. Finally, the examples illustrated that combining multiple predictions using Bayesian inference could increase the chance of obtaining valid estimation.
Next, the method was illustrated for a physical example, extrapolation and interpolation of a two-dimensional drag coefficient. For this example, as well, the method of converging lines proved to be much more reliable and accurate than 2D Kriging.
In this paper, we assume that predictions from multiple lines are independent to apply Bayesian methods. The correlation among lines exists to some extent especially at the domain close to the converged target point. Introducing correlation among lines will be an interesting topic for future work.
Acknowledgment
This work was supported by the U.S. Department of Energy, National Nuclear Security Administration, Advanced Simulation and Computing Program, as a Cooperative Agreement under the Predictive Science Academic Alliance Program, under Contract No. DE-NA0002378.