## Abstract

This paper presents a control framework to co-optimize the velocity and power-split operation of a plug-in hybrid vehicle (PHEV) online in the presence of traffic constraints. The principal challenge in its online implementation lies in the conflict between the long control horizon required for global optimality and limits in available computational power. To resolve the conflict between the length of horizon and its computation complexity, we propose a receding-horizon strategy where co-states are used to approximate the future cost, helping to shorten the prediction horizon. In particular, we update the co-state using a nominal trajectory and the temporal-difference (TD) error based on co-state dynamics. Our simulation results demonstrate a 12% fuel economy improvement over the sequential/layered control strategy for a given driving scenario. Moreover, its real-time practicality is evidenced by a computation time per model predictive controller (MPC) step on average of around 80 ms within a 10 s prediction horizon.

## 1 Introduction

The technical maturity of advanced driving assistance systems (ADAS) encourages research into designing a longitudinal velocity controller systematically from an optimal control perspective. Advancements in ADAS, together with a powertrain-level controller, can achieve excellent overall system-wide efficiency. Among various types of vehicles with different energy sources, a plug-in hybrid vehicle (PHEV) is of particular interest. It is because, for a PHEV with two energy sources (fuel and electricity), energy-efficient driving can further benefit fuel economy beyond what can be achieved with an optimal power-split on ordinary human driving.

Over the years, there have been extensive efforts toward increasing the efficiency of (P)HEVs. The book of Sciarretta and Vahidi [1] presents comprehensive discussions that incorporate almost every aspect of connected and automated vehicles using a variety of energy sources. The work of Bae et al. [2] presents an ecological adaptive cruise controller (ECO-ACC) with a two-level framework for a PHEV to minimize energy consumption while avoiding collisions and complying with traffic signals. However, its focus was based solely on velocity planning and safety control rather than jointly optimizing vehicle-following and powertrain dynamics. The work of Heppeler et al. [3] introduced a layered approach for the predictive control of the vehicle and powertrain dynamics of an HEV and demonstrated a 3–10% fuel consumption reduction compared to the desired velocity obtained respecting speed limits followed by the equivalent consumption minimization strategy. However, their proposed controller did not account for car-following, i.e., it did not consider a lead vehicle (LV). Therefore, it is unclear how the controller would perform in terms of fuel economy and safety guarantee when traffic constraints induced by the LV need to be considered in the short-horizon but are ignored in the long-horizon SOC planning.

Most of the existing literature on energy-efficient control of a PHEV employs a layered control framework, including (1) a velocity planning (eco-driving) layer, sometimes with an additional ACC layer to guarantee safety in the presence of an LV, albeit the information of the powertrain dynamics can be partially included to better estimate the energy consumption [3] and (2) a powertrain-control layer to coordinate the operation between the engine and electric motors to minimize fuel and guarantee a desired battery state-of-charge (SOC) level at the end of the trip with the velocity determined by a velocity planning layer. Unlike the existing eco-driving approaches based on the layered control framework that indirectly aims to minimize fuel consumption, in this study, it is proposed to co-optimize the velocity and powertrain operations to explicitly achieve the maximum system-wide efficiency while guaranteeing the safety and the desired terminal SOC. Moreover, the proposed solution strategy is potentially implementable in real-time.

For PHEVs, the predominant roadblock to adopt a co-Optimization control framework online originates from its numerical implementation. To be more concrete, to optimize the battery charge-depletion rate and the velocity for a PHEV, one needs to solve a trajectory optimization problem (TOP) for the entire trip with a specified SOC to be satisfied at the end of the trip. In the online implementation, the TOP is posed as an economic model predictive controller (EMPC) problem. To the authors’ knowledge, the work of Huang et al. [4] is the first that succeeded in solving the TOP directly in real-time rather than tracking a reference explicitly. However, the work in Ref. [4] only considered the powertrain dynamics with a single battery SOC. For the co-optimization problem, due to additional controls and states, it is unclear of their real-time implementability when the EMPC aims to cover the entire trip. Besides, as shown later in the paper, co-optimization requires predicting a LV’s driving trace. Such a prediction will become less accurate as the prediction horizon increases.

The length of the prediction horizon is generally constrained to reduce the computational complexity associated with the TOP, which makes it difficult to achieve global optimality. To resolve this issue, we propose a receding-horizon strategy where the co-states are used to approximate the future cost. In particular, the co-state is updated using a nominal trajectory and the temporal-difference (TD) error based on co-state dynamics. The proposed receding-horizon control framework with co-state correction based on TD error is shown in Fig. 1, which will be detailed in Sec. 3. In the proposed strategy, distance constraints are generated from the prediction of the LV’s trajectory. A reference SOC from a nominal trajectory obtained offline is used to softly update the co-state at the end of each prediction horizon, which is then rolled out backward-in-time to get the control at time *t* = *k*. This is the so-called shooting iteration block and will be discussed in Sec. 3.1. A TD error is used to correct the co-state value sampled from a noisy nominal trajectory to warm-start the single shooting iteration based on its co-state dynamics. This is the future terminal co-state initialization block and detailed in Sec. 3.2.

The remainder of the paper is organized in the following manner: Sec. 2 briefly summarizes the control-oriented model and the centralized control framework as an EMPC problem with related constraints. Then, an innovative strategy to solve the centralized control problem is detailed in Sec. 3. Section 4 presents the simulation results with their comparison against the offline sequential velocity smoothing and power-split optimization. Section 5 concludes the paper with a summary and future works.

## 2 Receding-Horizon Fuel-Efficient Controller Design

This section describes a control framework to co-optimize the velocity and powertrain operation of the PHEV online as an EMPC problem. A detailed discussion of the novel online implementation strategy based on a TD Bellman equation error will be presented in the next section.

### 2.1 Control-Oriented Model.

Two subsystems are considered for co-optimizing the ego PHEV’s velocity and powertrain operation to achieve minimum fuel consumption in the presence of a LV [5]: (i) the vehicle-following subsystem and (ii) the hybrid powertrain. Detailed descriptions of the dynamics and their corresponding constraints are omitted for space consideration. These two subsystems are connected through the vehicle (longitudinal) velocity *v*_{k} and the driver-demanded torque *tp*_{k}. In this work, the ego vehicle is assumed to drive on a single lane without grades and considers only its longitudinal dynamics.

*x*

_{k}= [

*s*

_{k},

*v*

_{k}, SOC

_{k},

*e*

_{k}]

^{T}, denoting the position, velocity, battery SOC, and the normalized engine cranking state of the considered ego PHEV, respectively. The control inputs are

*u*

_{k}= [

*r*

_{k},

*ne*

_{k},

*te*

_{k},

*E*

_{k}]

^{T}, denoting the normalized distance gap regularizer in an ACC-type feedback controller, engine speed, engine torque, and engine on/off command, respectively. The external known disturbance

*w*

_{k}=

*s*

_{l,k}is the position of the LV.

### 2.2 Economic Model Predictive Controller Formulation for Co-Optimization.

The objective of the EMPC design for our problem is not to penalize deviation from a pre-defined equilibrium [6] but rather to directly minimize the total fuel consumption (the economic cost) of the PHEV on a given trip by simultaneously optimizing its velocity and its powertrain operation. Despite the lack of explicit reference tracking, the battery SOC is required to be depleted to a specified low level at the end of the entire trip. In this study, the vehicle is assumed to be in traffic and hence is subject to tight time-varying upper and lower bounds on its position depending on its LV.

*N*

_{f}large enough to cover the whole trip.

*a*)

*m*

_{c}is grams of fuel per cranking. Note that an absolute value can be used instead of the max-term in (2

*a*). The absolute value would penalize both engine on and off events, whereas the max penalizes only the engine on event (consistent with the high-fidelity evaluation model).

*a*) is defined as

*b*)

*c*)

*d*)

*e*)

## 3 Approximation Strategy for Solving Economic Model Predictive Controller

In the original EMPC formulation (**P**$kNf(SOCf)$) presented in the previous section, the problem horizon needs to cover the entire trip. However, as mentioned in the previous section, this formulation could be too computationally demanding to be solved in real-time without a specialized numerical strategy like the one proposed in Ref. [4]. Inspired by the TD update rule [7] in reinforcement learning, an online implementation strategy is proposed where only a short prediction horizon *N*_{p} instead of the entire trip horizon *N*_{f} can be considered. In the proposed strategy, intermediate SOC and its associated co-state values from nominal trajectories obtained offline are used to warm-start the computation. Note that the nominal SOC trajectory is used only to softly update the co-state, and hence no explicit reference tracking formulation is needed in the proposed implementation strategy. Figure 1 presents the receding-horizon implementation strategy to *approximately* solve $PkNf(SOCf)$ by *approximately* solving $PkNp(SOCf,k+Np)$. Among all are two critical blocks: (1) shooting iteration Ⓐ and (2) final terminal co-state initialization Ⓑ, will be explained in detail in this section.

### 3.1 Shooting Iteration.

The numerical algorithm in solving the associated mixed-integer nonlinear optimal control problem for the entire trip offline $P0Nf(SOCf)$ with single shooting is a modified version of the algorithm proposed in Ref. [8]. Since the algorithm for solving $P0Nf(SOCf)$ is not the focus of this work, in the sequel, $P0Nf(SOCf)$ is considered solvable. Note that unlike the general single shooting based on the discretization of the continuous-time Pontryagin’s minimum principle, single shooting is applied with co-state backward-in-time propagation. To be concrete, at time *t* = *k*Δ*t*, it requires

initial guesses of the state and control trajectories within the prediction horizon

*N*_{p}, $Xk+Np0$ and $Uk+Np0$, where $Xk+Np0=[xk0,\u2026,xk+Np0]$ and $Uk+Np0=[uk0,\u2026,uk+Np0]$, with*x*_{k}and*u*_{k}defined in Sec. 2.1 andan initial guess of the terminal co-state associated with the SOC at the end of the prediction horizon, $pk+Np0$

*s*

_{k},

*v*

_{k}, and

*e*

_{k}over the entire trip, their corresponding terminal co-states at the end of prediction horizon

*k*+

*N*

_{p}can always be set to 0 (no terminal cost is considered in this problem). On the other hand, the co-state associated with the SOC needs to be properly updated at the end of the prediction horizon because of the terminal SOC constraint at the end of the entire trip. Based on Bellman’s principle of optimality, ideally, the SOC at the end of each prediction horizon should be optimal with respect to the remaining trip.

In reality, however, the exact trace of the entire trip of the ego PHEV’s LV is not known in advance, making it impossible to accurately predict the exact SOC value at the end of each prediction horizon. Nevertheless, the traffic monitoring system, on-board GPS, and mobile apps enable the possibility of recording the velocity trace of the same route. The repeated traces can be averaged and used as the nominal LV’s velocity trace to form the position constraints for the ego PHEV. Then, the minimum fuel consumption problem $P0Nf(SOCf)$ can be (approximately) solved to obtain the nominal SOC and its corresponding co-state trajectories. As will be detailed next, the nominal SOC $SOC\xaf$ and its corresponding co-state $p\xaf$ trajectory are used in Ⓐ and Ⓑ in Fig. 1. As is shown in Fig. 1 Ⓑ, at each time *t* = *k*Δ*t*, from the nominal SOC trajectory, the nominal SOC value at the end of the prediction horizon $SOC\xaff,k+Np$ is obtained; then, this value is used in Ⓐ to solve $PkNp(SOC\xaff,k+Np)$. However, due to the uncertainty in the future driving condition, strict enforcement of the terminal SOC at $SOC\xaff,k+Np$ can result in infeasibility. To avoid this infeasibility issue, single shooting is performed only with a fixed number of iterations, and $PkNp(SOC\xaff,k+Np)$ is solved *approximately*.

### 3.2 Future Terminal Co-State Initialization.

As mentioned previously, to solve $PkNp(SOC\xaff,k+Np)$ numerically, an initial guess of the state and control trajectories as well as the initial guess of the terminal co-state $pk+Np0$ at the end of the prediction horizon are required. The initial guesses of the state and control trajectories are obtained from the shifted trajectories of the previous model predictive controller (MPC) step. The most critical part in the short-horizon implementation strategy is the proper initialization of the terminal co-state $pk+Np0$. Although it is possible to warm-start the terminal co-state $pk+Np0$ using the value $p\xafk+Np$ from the nominal co-state trajectory $p\xaf$, it is observed that (1) this simple interpolation would induce a systematic bias to the terminal SOC at the end of the trip when errors exist in the position constraints within the prediction horizon (induced by the prediction error of the LV’s position) and (2) if the terminal co-state from the previous MPC step is used to warm-start the next MPC step; large oscillations will be induced in the resulting co-state trajectory.

*a*)

*b*)

*t*=

*k*Δ

*t*, by solving $PkNp(SOC\xaff,k+Np)$ approximately, the corresponding terminal co-state $p~k+Np$, state $x~k+Np$, and control $u~k+Np$ can be obtained. Afterward, $p^k+Np+1$ is obtained as a noisy sampled version from the nominal co-state trajectory $p^k+Np+1\u223cN(p\xafk+Np+1,\sigma 2)$. Then, the TD error is calculated as

Since the battery SOC dynamics are slow, the dynamics of its corresponding co-state are also slow [4]. It means that the updated $p~k+Np$ can be used as the warm-start of the next MPC iteration, $p~k+Np+10$.

## 4 Simulation Results and Discussions

*t*= 1

*s*, and the prediction step is

*N*

_{p}= 10, indicating the total prediction horizon is 10 s. To reduce the computation time, we replace the exact power-split optimization with a selection among several engine operations points close to the potential local minimizers identified through offline power-split optimization. The standard deviation considered in co-state sampling is set to be $\sigma =10$. Given that the prediction of the LV’s trajectory is not the focus of this work, to incorporate potential prediction errors, Gaussian white noise is added to the actual LV’s acceleration with a signal-to-noise ratio = 10 and integrated to compute its trajectory in prediction. The LV is assumed to have the same trajectory as the one shown in the first subplot in Fig. 3(a), representing a trip of approximately 2 h long that exceeds the total battery range of the considered PHEV. It is a combination of standard driving cycles, including US06 (high acceleration aggressive driving), Urban Dynamometer Driving Schedule (UDDS, city driving), and the Highway Fuel Economy Driving Schedule (HWFET, highway driving). The trip we have considered is found through our offline simulations, among other combinations, to provide the most significant fuel economy by optimizing powertrain operation due to the high power demands at the beginning of the trip. To leverage a driver’s ride comfort, a surrogate cost $\varphi ~$ is used to replace $\varphi $ in (2

*a*), where

In simulation, each MPC step with prediction horizon *N*_{p} is performed with 10 single shooting iterations (Ⓐ as shown in Fig. 1). The computation time per MPC step is around 80 ms on average.^{1} For comparison, a single shooting iteration for the entire trip offline is 17.7 s for this particular 2-h trip.

Previously, we demonstrated the effectiveness of velocity smoothing followed by a power-split optimization (sequential optimization) in fuel economy improvement [9]. Minimizing the acceleration of the velocity profile induces a smoothed driving trace in line with some of the work in eco-driving [10,11]. However, the sequential type of optimization requires a layered implementation, where each layer has its own objective function. Consequently, such an implementation is in essence *decentralized*. By comparison, the direct fuel-efficient MPC strategy in this work adopts a *centralized* control framework. Since one of the objectives is to compare the performance of the centralized and decentralized control framework, in this work, the result obtained by offline sequential optimization is used as the baseline for comparison.

_{0}= 0.85, and the desired terminal SOC is SOC

_{f}= 0.14. The first subplot presents the battery SOC trajectories. The second subplot presents the cumulative fuel consumption trajectories (including cranking fuel). The third subplot presents cumulative cranking fuel consumption trajectories. Note that the SOC at the end of the trip is not guaranteed to strictly match the desired SOC level in online implementation. For better comparison, we correct the total fuel consumption considering the deviation of the actual terminal SOC (SOC(

*N*

_{f})) from the desired terminal SOC (SOC

_{f}) and present the fuel consumption results with and without SOC correction, which is computed by

*fc*denotes the actual (uncorrected) total fuel consumption from simulation, and

*fc*

_{cor}denotes the corrected total fuel consumption based on terminal SOC difference. Δ

*f*represents the difference in fuel consumption corresponding to a one-percent difference in SOC, which is chosen to be 16 g in this work based on simulation results. The results are summarized in Table 1. Detailed views of the resulting velocity and acceleration are illustrated in Fig. 3(a). The actual distance gaps between the ego PHEV and its LV corresponding to these time periods are shown in Fig. 3(b).

Test name | Actual terminal SOC | Corrected total fuel (g) | Difference (%) |
---|---|---|---|

Offline seq. opt | 0.14 | 1213 | 0 |

Co-opt $\rho =0$ | 0.13 | 1062 | −12.4 |

Co-opt $\rho =0.01$ | 0.12 | 1146 | −5.5 |

Co-opt $\rho =0.1$ | 0.12 | 1219 | −0.5 |

Test name | Actual terminal SOC | Corrected total fuel (g) | Difference (%) |
---|---|---|---|

Offline seq. opt | 0.14 | 1213 | 0 |

Co-opt $\rho =0$ | 0.13 | 1062 | −12.4 |

Co-opt $\rho =0.01$ | 0.12 | 1146 | −5.5 |

Co-opt $\rho =0.1$ | 0.12 | 1219 | −0.5 |

### 4.1 Position Constraints Under Uncertainty in Prediction.

*t*=

*k*, the position of the LV $sl,j|k,\u2200j\u2264k,j\u2208Z+$, is known (despite that it might be subject to measurement noise). As a result, assuming accurate measurement, in (8) the position upper bound $si|kmax$ is known exactly for

*i*=

*k*, …,

*k*+ 1/Δ

*t*, and the position lower bound $si|kmin$ is known for

*i*=

*k*, …,

*k*+ 3/Δ

*t*. The position constraints otherwise depend on the prediction of that LV’s trajectory. As discussed above, the design of the position constraints (8) guarantees their accuracy for the first several future time-steps, albeit the predicted LV’s future trajectory is always inaccurate. As a result, as is illustrated in Fig. 3(b), in all the online co-optimization simulations, the ego vehicle is always able to keep a desired distance from its LV. Note that in simulation, some mild constraint violation can still be observed due to limited shooting iterations and the fact that the constraints are handled softly by a smooth exterior penalty method, as shown in (2

*b*). Strict enforcement of the position constraints will be investigated in future work.

### 4.2 Influence of the Penalty on Acceleration.

This section investigates the tradeoff between a driver’s ride comfort and fuel economy. As can be seen from the dot dash curves in Fig. 3(a) (also observed in the entire driving cycle when the speed is not too high), the resulting acceleration obtained from direct fuel-efficient EMPC is of a bang-bang type and may not be acceptable for abrupt changes in acceleration. The penalty on acceleration $\rho $ as in (6) is set to be [0, 0.01, 0.1], and the MPC results with these penalty values are presented in Fig. 2 as compared to those obtained from offline sequential optimization. As seen from Fig. 3(a), the resulting acceleration becomes smoother with an increase in the penalty $\rho $ upon acceleration. However, the total fuel consumption increases, as shown in the second subplot in Fig. 2 and Table 1, meaning that a smoothed driving trace is not the most fuel-efficient one for a PHEV.

## 5 Conclusion

This paper proposes a receding-horizon control framework to determine the powertrain operation and velocity simultaneously for fuel-efficient car-following of a PHEV. To resolve the conflict between the horizon length and the resulting computation complexity, we propose approximating the future cost with the co-state. The co-state based on the data from a nominal trajectory is adjusted with a TD error for preventing oscillations and systematic bias in the co-state estimation. The proposed control strategy demonstrates an additional 12% fuel economy benefit in its online implementation compared to what can be achieved by the offline solution of a typical layered approach, velocity smoothing followed by power-split optimization. Additionally, to accommodate drivability, a penalty on acceleration is considered, and the degradation in fuel economy is quantified.

## Footnote

The computations are done on a Mac OS X with an Intel^{®} Core i5 2.7 GHz processor and 8GB RAM.

## Acknowledgment

The work was funded in part by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award Number DE-AR0000837, also known as NEXTCAR. The authors would like to thank Southwest Research Institute and Toyota Motor North America Research & Development for continuous feedback and support.

## Conflict of Interest

There are no conflicts of interest.

## Nomenclature

*K*=the rate of adjustment of co-state based on the difference in the SOC, used in the single shooting

- $p^$ =
co-state sampled from the nominal trajectory $p\xaf$

- $p~$ =
co-state corrected by TD error

- $PkNf(Np)(SOCf)$ =
the MPC problem at time

*t*=*k*Δ*t*with a horizon length*N*_{f}that equals to the length of the entire trip (prediction horizon*N*_{p}). The resulting terminal state of charge (SOC) at the end of the trip should be equal (or very close) to the desired value SOC_{f}($SOCf,k+Np$)- $SOC\xaf$, $p\xaf$ =
nominal SOC and its corresponding co-state trajectories obtained as an average of offline simulation results