Abstract

In recent years, ship hulls have very complicated shapes in order to reduce frictional resistance and wave resistance during navigation. In particular, in the bow and stern, curved skin plates with complex shapes are used. Line heating is used to produce such complex shapes. Line heating is a bending technique using plastic deformation due to heating. The relationship between the heat input and the deformation is nonlinear, which may lead to difficulty in making a heating plan for forming the target shape. Thus, skilled workers are necessary in line heating, and the work time and dimensional accuracy depend on their skills. Another problem is the transfer of this technique to future generations. In order to overcome these problems, automation of the line heating process has been investigated urgently. On the other hand, artificial intelligence (AI) technology has been rapidly developed in recent years. An AI system can deal with nonlinear relationships and ambiguous feature quantities, which are difficult to express mathematically. By using AI, automation of the planning of the heating line can be expected. The purpose of the present study is to obtain the optimal heat input conditions for forming an arbitrary shape in line heating. In order to accomplish this, we constructed an AI system that integrated deep layer reinforcement learning and line heating simulation. The proposed system was applied to the formation of fundamental shapes of line heating, including the bowl shape, the saddle shape, and the twisted shape. As a result, the proposed system was found to be able to generate heating plans for these shapes with fewer heating lines.

Introduction

The outer curved skin plates in the bow and stern of a ship hull are formed by line heating in order to obtain complex shapes. These curved surfaces, such as bowl-, saddle-, and twist-shaped surfaces, are undevelopable surfaces. Therefore, it is necessary to generate shrinkage and angular distortion in a steel plate by line heating. In line heating, curved surfaces are formed by plastic deformation caused by local thermal strain due to heating. Deformation caused by heating can be classified into four types: transverse bending, transverse shrinkage, longitudinal bending, and longitudinal shrinkage. These deformations can be controlled by the heat input to the steel plate. In order to control the heat input to the steel plate, it is necessary to change the moving speed of the gas burner, the mixing ratio of the gas flowrate and the oxygen inflow, the distance between the burner and the steel plate, etc. However, the relationship between the amount of shrinkage and the resulting deflection and the relationship between the heat input and the angular distortion is nonlinear. Therefore, prediction of deformation is very difficult and line heating requires experienced skills.

By relying on the experience and intuition of workers in the line heating process, it is difficult to manage quality, dimensional accuracy, and work time. Although mechanization has been attempted to solve these problems [1], it is difficult to perform prediction in a practical computing time since line heating is a nonlinear transient mechanical phenomenon.

On the other hand, in recent years, artificial intelligence (AI) technology has progressed very quickly. In certain areas, AI has surpassed human ability. For example, the Go program AlphaGO [2] is one such example. An AI system can deal with nonlinear relationships and ambiguous feature quantities that are difficult to express mathematically. As represented by the aforementioned AlphaGo case, optimization by combining simulation and AI has a very high affinity. Instead of games such as Go, simulation technology including nonlinear finite element analysis (FEA) is adopted to adapt to the line heating problem. In combination with AI, we proposed a new concept for optimizing construction methods at manufacturing sites. Artificial intelligence can characterize nonlinear phenomena. In the line heating problem, this characteristic can be used to understand the nonlinear relationship between heating conditions and deformation. Therefore, it is possible to propose an optimum heating plan using AI and FEA by efficiently learning this relationship.

The purpose of the present study is to obtain the optimal heating plan for forming an arbitrary shape in line heating. Therefore, we constructed an AI system that integrates deep layer reinforcement learning and line heating simulation based on the finite element method (FEM). In this system, AI first sufficiently learns the relationship between the heating position and the deformation caused by heating using deep reinforcement learning. Then, in order to create a heating plan, the system selects a heating position in which the deformation obtained by heating is closest to the target shape. This system was applied to the creation of a heating plan for the three basic shapes of line heating: the bowl shape, the saddle shape, and the twisted shape. The characteristics and usefulness of the proposed system are examined through these investigations.

Proposal of Artificial Intelligence Line Heating System

Overview of Reinforcement Learning.

In the present study, reinforcement learning is used as a learning method to construct a system that automatically learns the relationship between the heating position and deformation during line heating. Reinforcement learning is a method by which to autonomously learn through trial and error processes. In reinforcement learning, answers are not given for each item. Instead, rewards are provided to the output values, and AI learns the relation between the rewards and the input. In the present study, Deep-Q-Network (DQN) [3] is used as a basis. The DQN is a method of approximating a value function Q [4] using a neural network. The value function Q is used in Q-learning, which is a reinforcement learning method. In the present study, learning means adjusting the weights between nodes in each layer so that the relationship between input and output used for learning is reproduced.

Figure 1 shows a schematic diagram of the developed learning system based on the DQN. In reinforcement learning, the “reward” is calculated from the change in the “state” when the “agent” takes the “action” for the “environment” The relation between the action and the reward is learned as a value function. In this system, the environment is a line heating simulation, and so the agent learns the value function and selects (searches) the action. When the action selected by the agent, i.e., the heating line is passed to the environment, the difference from the target shape is calculated based on the result of the line heating simulation, and the state is updated by this difference. The reward is calculated as the degree to which the current shape approached the target shape as a result of the action, according to the difference between the updated and previous states. The agent learns the relationship between the “state,” “action,” and “reward” as the value function. By repeating searching and learning, the value function can be used to express an appropriate relationship between the “state,” “action,” and “reward.” It is possible to select an action that maximizes the value function; i.e., the heating line can be selected so as to obtain the shape closest to the target shape.

Fig. 1
Schematic illustration of AI line heating system
Fig. 1
Schematic illustration of AI line heating system
Close modal

In the following sections, we describe the details of the agent, the environment, the behavior, the state, the reward, and the heating plan creation systems.

Learning in Line Heating Problems.

In the line heating problem, FEM elastic analysis using inherent strain was adopted as the environment for line heating simulation because FEM elastic analysis has a shorter calculation time than thermal elastic–plastic analysis. The difference between the out-of-plane displacement obtained by analysis and the target shape is calculated and set as a new state. The next heating position is set as an “action.” At the same time, the reward is calculated based on the degree to which the current shape approached the target shape as a result of the action, according to the difference between the updated and precious states. If the difference from the target shape after heating becomes smaller than the difference from the target shape before heating, a positive reward is given. On the other hand, if the difference from the target shape after heating becomes larger than that before heating, a negative reward is given. In the agent, the value function learns the relation between the input and the “action” as the “state” and the relation between the “action” and the output as the expected value of “reward.” At this time, the value function Q is approximated by a convolutional neural network (CNN) in order to express the input state as a distribution (40 × 40 two-dimensional tensors). In addition, there are innumerable heating lines (a pair of a starting point and an endpoint of the heating line) that can be taken as an action. It is difficult to express all of these. Therefore, the distribution of the expected reward value for the actions is learned as the expected reward value (40 × 40 × 2 three-dimensional tensors) corresponding to the heating elements on the front and back surfaces of the steel plate. Since the output obtained from the CNN after learning represents the distribution of the reward, the straight line that passes through an element with a high reward is selected as the next heating line. This system passes the straight line to the environment and executes the action, i.e., elastic FE analysis is conducted. As a result, the state is updated. The terminal condition is evaluated at this time. If the terminal condition is not satisfied, the next action is selected. Learning advances by repeating these procedures.

Table 1 shows the configuration of the CNN. The sigmoid function is used as the activation function, which adjusts how signals propagate between layers, except for the output layer. Convolution layers ② and ③ compress data in the x and y directions, and deconvolution layers ① and ② extend data in the x and y directions. The Adam method [5] was adopted as an optimization method (CNN weight update method). In addition, because this system learns a series of information, the Experience Replay method [6] was adopted in order to avoid abnormal learning due to the occurrence of time–direction correlation. The experience replay method is not a sequential learning method. In this method, a set of state, an action, and a reward are stored in memory, and once sufficient data have been accumulated, the data are extracted and learned at random.

Table 1

Configuration of CNN

LayerDimensionLayerDimension
Input[40,40,1]Deconvolution①[10,10,128]
Convolution①[40,40,32]Deconvolution②[20,20,64]
Convolution②[20,20,64]Deconvolution③[40,40,32]
Convolution③[10,10,128]Output[40,40,2]
LayerDimensionLayerDimension
Input[40,40,1]Deconvolution①[10,10,128]
Convolution①[40,40,32]Deconvolution②[20,20,64]
Convolution②[20,20,64]Deconvolution③[40,40,32]
Convolution③[10,10,128]Output[40,40,2]

Selection of Heating Lines.

Using the difference between the current shape and the target shape as input, the CNN updates the distribution of the expected reward. From the expected reward distribution, the heating line is extracted using the Hough transformation. The Hough transformation is a technique to detect shapes, such as straight lines and circles, from images. The extraction procedures are as follows:

  1. Select the heating surface from the front or back surfaces. At this time, the expected value of the obtained reward is expressed as a distribution, as shown in Fig. 2(a).

  2. In pre-processing, among the elements with positive rewards, the reward of the elements having a reward that is lower than 70%, is replaced with 0, which is saved as a weight matrix (Fig. 2(b)).

  3. Binarize the weight matrix (Fig. 2(c)) and execute the Hough transform to find a straight line that passes through the specified number of elements continuously.

  4. From the coordinates of the passed elements and the weight matrix, find the straight line that most often passes through high-reward elements among the lines obtained in (3) (Fig. 2(d)).

  5. Based on the linear equation obtained in (4), the segment with a positive value in the binarized matrix (Fig. 2(c)) is determined as the segment representing the heating line (Fig. 2(d)).

  6. Perform steps (1) through (5) on both the front and back sides. The surface for which the heating line has a higher reward is selected.

Fig. 2
Procedure to select an action: (a) output of CNN, (b) preprocessed, (c) binarized, and (d) extraction of line
Fig. 2
Procedure to select an action: (a) output of CNN, (b) preprocessed, (c) binarized, and (d) extraction of line
Close modal

During learning, the trial is repeated using the ε-greedy method [7] at the selection of action (search). The ε-greedy method is a search method that selects random actions with a certain probability, ε, and otherwise selects the action that maximizes the reward using a value function. In order to learn effectively, the rate of random action, ε, is set to 0.7 at the initial stage of learning and is changed in proportion to the number of trials, so that ε finally becomes 0.1.

Setting of Terminal Judgment.

In this research, the terminal, which is a condition for completing one case, is determined independently of whether any of the following conditions are satisfied. When the terminal condition is satisfied, the analysis for the first heating line is performed based on the initial shape, as follows:

  1. The number of heating lines exceeds the specified number of lines (20 lines).

  2. The difference from the target shape is larger after heating than before heating.

  3. The difference from the target shape is sufficiently larger after heating than the initial shape (without heating).

  4. The deformation after heating is close enough to the target shape.

Reward Calculation Method.

In small-deformation analysis, deformation is represented by the sum of deformation due to each heating line. Therefore, the reward is calculated according to Eq. (1) for each heating line as an “immediate reward.” The reward is calculated for every heating, and learning of the CNN is performed sequentially based on this value
(1)
where dn is the difference from the target shape after n heating lines. The above equation indicates the closeness to the target shape from the state after heating for n times by (n + 1) heating lines.

On the other hand, when bending a plate by line heating, a large deformation exceeding the plate thickness may occur. When such large deformation occurs, it is necessary to perform large-deformation analysis, which can analyze out-of-plane deformation due to in-plane shrinkage. In large-deformation analysis, since the heating history affects the deformation, the reward for each heating line using Eq. (1) cannot accurately evaluate the heating line. Therefore, the system learns the combination of heating lines based on Q-learning, which is a method of learning a series of actions that maximizes the rewards obtained at the end, i.e., “long-term rewards.” The reward corresponding to each heating line is calculated based on the following evaluation formula (2):

(2)
where dN is the difference from the target shape after all heating cycles, and d0 is the difference from the target shape before heating. The discount rate, γ, was set to 0.9. In Eq. (2) for the terminal action, the first term gives the value calculated from the result of multiple heating cycles, i.e., the difference between the final shape and the target shape. On the other hand, for heating lines other than the terminal, instead of the actual reward given by Eq. (1), the maximum reward that can be obtained from the current state is estimated using the value function. The calculated value is multiplied by γ and is used as Rn. In cases other than the terminal, assuming that the maximum reward is calculated only from the first term for simplification, Rn can be expressed as follows:
(3)
where RN is the reward obtained at the end. By using RN as the reward in the learning process, it is possible to select the action that gives the highest reward at the end of the value function. This means that the heating lines that can obtain the shape closest to the target shape can be selected. Moreover, as shown in the second term, a negative value is also assigned to each heating line, so that the total reward decreases as the number of heating lines increases. In this way, learning is performed so that the heating line that finally approaches the target shape with fewer heating lines is highly rated.

As described earlier, the authors constructed an AI line heating system that can obtain the heating position that allows the target shape to be approached from an out-of-plane displacement distribution. Using this system, after setting the target shape, an appropriate heating line is automatically searched and learned without human intervention. As a result, it is expected that a more appropriate heating line can be selected as learning progresses (as the number of learning cases increases). This system was constructed using Tensorflow [8], an AI development library provided by Google.

Heating Plan by Artificial Intelligence Line Heating System and Its Evaluation

Setting of the Analysis Condition.

In this section, the relationship between the heating position and the deformation caused by heating is learned, and a heating plan is created using the proposed system. The shapes of the object were the three most basic shapes (bowl, saddle, and twisted shape) in line heating.

In the elastic analysis using inherent strain [9], the deformation can be predicted by distributing the inherent deformation along the heating line as the inherent strain. Figure 3 shows the analysis model used in this section. Shell elements are used in the model, and the width and length of the model are 500 mm. The model was divided into 40 elements in the width direction and the length direction. The material was assumed to be SM490A. In the analysis, the constraints for rigid body motion are considered. The analysis is conducted by giving inherent strain to the element under the heating line passes.

Fig. 3

Thermal elastic–plastic analysis is performed on a 3D model with a plate thickness of 16 mm using the idealized explicit FEM [10]. As a result of thermal elastic–plastic analysis, the inherent deformation for a heat input of Q/h2 = 19.5 is determined as shown in Table 2.

Table 2

Inherent strain used in the analysis

δx*(mm)−1.89 × 10−2
δy*(mm)−2.89 × 10−1
θx*(rad)−1.29 × 10−3
θy*(rad)1.08 × 10−2
δx*(mm)−1.89 × 10−2
δy*(mm)−2.89 × 10−1
θx*(rad)−1.29 × 10−3
θy*(rad)1.08 × 10−2

Investigation of Heating Plan Assuming Small Deformation.

In this section, in the FEM analysis using the inherent strain, a linear displacement–strain relationship (small deformation) is assumed. Using Eq. (1) in the previous section, AI was trained for the bowl shape, the saddle shape, and the twisted shape shown in Fig. 4. Let the AI predict the heating plan to create each target shape. Figure 5 shows the heating plan for creating each target shape obtained by the learned AI, and Fig. 6 shows the results of elastic FE analysis using the AI heating plan.

Fig. 4
Target shape: (a) bowl shape, (b) saddle shape, and (c) Twist shape
Fig. 4
Target shape: (a) bowl shape, (b) saddle shape, and (c) Twist shape
Close modal
Fig. 5
Heating plan by AI: (a) bowl shape, (b) saddle shape, and (c) twist shape
Fig. 5
Heating plan by AI: (a) bowl shape, (b) saddle shape, and (c) twist shape
Close modal
Fig. 6
Obtained shapes: (a) bowl shape, (b) saddle shape, and (c) twist shape
Fig. 6
Obtained shapes: (a) bowl shape, (b) saddle shape, and (c) twist shape
Close modal

As shown in Fig. 5, the front surface was heated vertically and horizontally and the center was heated. And bowl shape was formed using angular distribution by the uniform heating for the bowl shape. For the saddle shape, heating was applied to the edge, as shown in Fig. 5(b). As shown in Fig. 5(c), the line parallel to the diagonal line of the corner to be lifted is heated. For both the saddle and twisted shapes, the target shape is formed using angular deformation and vertical bending. The numbers of heating cycles to create the target shape were 9, 9, and 13 for the bowl shape, the saddle shape, and the twisted shape, respectively. Figure 7 shows the out-of-plane displacement on lines A–A’ and B–B’. Based on these results, the proposed AI line heating system can create a heating plan that sufficiently approaches the target shape.

Fig. 7
Displacement in z-direction: (a) on A–A’ line and (b) on B–B’ line
Fig. 7
Displacement in z-direction: (a) on A–A’ line and (b) on B–B’ line
Close modal

Investigation of Heating Plan Assuming Large Deformation.

In this section, we assume a plate thickness of 8 mm and a nonlinear displacement–strain relation, and large deformation is considered in the elastic FE analysis of line heating. In the small-deformation analysis, the deformation is represented by the sum of the deformation amounts by the respective heating lines. On the other hand, the combination of heating lines and the heating sequence is important in large-deformation analysis. In large-deformation analysis, the amount of deformation caused by a heating line is affected by the previous deformation. On the other hand, reinforcement learning is based on the Markov process. The Markov process is a stochastic process, the future behavior of which is determined by the current value and is independent of past behavior. Therefore, it is necessary to define the state including the previous deformation process in order to consider the combination of heating lines. However, if the history of the displacement distribution is considered in the learning, the neural network becomes larger as the number of heating lines increases. This complicates the construction of the neural network. Therefore, the heating history is considered in the learning by superimposing the conventional heating lines on the front and back surfaces. In other words, learning is performed by adding the heating history of each front and back surface as a tensor, and the size of the CNN input is changed to [40, 40, 3].

Using the above learning configuration, a heating plan is created using the proposed AI line heating system. Figure 8 shows the created heating plan using the AI line heating system, and Fig. 9 shows the results of the analysis according to the heating plan. The number of heating lines needed to create the target bowl, saddle, and twisted shapes were 6, 4, and 3, respectively. Figure 10 shows the out-of-plane displacement along lines A–A ‘and B–B’ for the target shape and the analysis results. Based on these results, the proposed system can create a heating plan that approaches the target shapes sufficiently. Therefore, large-deformation analysis results, i.e., nonlinear phenomena, can also be learned.

Fig. 8
Heating plan by AI: (a) bowl shape, (b) saddle shape, and (c) twist shape
Fig. 8
Heating plan by AI: (a) bowl shape, (b) saddle shape, and (c) twist shape
Close modal
Fig. 9
Obtained shapes: (a) bowl shape, (b) saddle shape, and (c) twist shape
Fig. 9
Obtained shapes: (a) bowl shape, (b) saddle shape, and (c) twist shape
Close modal
Fig. 10
Displacement in z-direction: (a) on A–A’ line and (b) on B–B’ line
Fig. 10
Displacement in z-direction: (a) on A–A’ line and (b) on B–B’ line
Close modal

The calculation took 16 h, 2 h, and 6 h for the learning process for the bowl, saddle, and twisted shapes, respectively, to obtain the heating plan shown in Fig. 8. However, a practice time of approximately 10 s is necessary in order to infer the next heating line by well-trained AI.

Characteristics of Artificial Intelligence Line Heating System.

The difference in the heating plan between the small-deformation analysis and the large-deformation analysis is discussed. Here, we focus on the reward for learning the bowl shape. Figure 11 shows the output of the CNN, i.e., the reward distribution when calculating the first heating line in the AI heating system that has been fully trained. (a) and (b) are the distributions of the expected reward for small the large deformations, respectively. From Fig. 11(a), in the small-deformation analysis, the diagonal heating line that heats the diagonal is evaluated highly. On the other hand, as shown in Fig. 11(b), the lattice-like heating line is evaluated highly in the large-deformation analysis.

Fig. 11
Distribution of expected: (a) small-deformation analysis and (b) large-deformation analysis
Fig. 11
Distribution of expected: (a) small-deformation analysis and (b) large-deformation analysis
Close modal

Next, we discuss the learning process in large-deformation analysis. Figures 12(a)12(c) show the created heating plans using the AI heating system trained using 200 cases, 280 cases, and 460 cases, respectively. The heating plan was predicted by the learned AI line heating system to approach the target shape at each time. Figure 13 shows the results of the analysis using the heating plan. As the learning progresses, the heating lines are arranged. In addition, Fig. 14 shows the history of the difference from the target shape regarding the number of heating lines using the proposed system for 200, 280, and 460 learned cases. Based on the figure, a heating line that finally approaches the target shape can be confirmed by selecting a small number of heating lines as the learning progresses. Figure 15 shows the relationship among the number of learning cases and the difference between the target shape at the end and the number of heating lines required in order to obtain a sufficiently close shape. Regarding the number of heating lines, only differences from the target shape of 25% or less were plotted. In the present study, we could not obtain a heating plan with better results than by the learning of 460 cases, even if the study was further advanced. Figure 15 shows that a heating plan that sufficiently approaches the target shape could not be obtained at the beginning of learning, but after the learning of 140 cases, the heating plan gradually approaches the target shape. As learning progresses, the number of required heating lines decreases. Therefore, learning was performed properly based on the reward by evaluating the heating line plan that eventually approaches the target shape with a small number of heating lines.

Fig. 12
Change of heating plan with increase in learning case: (a) 200 case, (b) 280 case, and (c) 460 case
Fig. 12
Change of heating plan with increase in learning case: (a) 200 case, (b) 280 case, and (c) 460 case
Close modal
Fig. 13
Change of Obtained shapes with increase in learning case: (a) 200 case, (b) 280 case, and (c) 460 case
Fig. 13
Change of Obtained shapes with increase in learning case: (a) 200 case, (b) 280 case, and (c) 460 case
Close modal
Fig. 14
Change of error history
Fig. 14
Change of error history
Close modal
Fig. 15
History of error and number of heating
Fig. 15
History of error and number of heating
Close modal

Based on the earlier considerations, the proposed AI line heating system can create a heating plan to reproduce the basic shapes, i.e., the bowl shape, the saddle shape, and the twisted shape. In addition, once the proposed system learns enough cases, the proposed system can create a heating plan that can obtain a shape close to the target shape in an extremely short computing time.

Conclusions

In the present study, in order to achieve automation of the line heating process, we proposed an AI line heating system by combining reinforcement learning and FEM simulations to deal with nonlinear phenomena of heat and deformation. The usefulness of the proposed system was demonstrated by applying the proposed system to the selection of the heating position for the formation of basic shapes in line heating. We also studied the learning ability of the proposed system. The following results were obtained:

  1. The proposed AI line heating system was applied to the selection of the heating position for the basic shapes, i.e., the bowl shape, the saddle shape, and the twisted shape. As a result, it was confirmed that the heating plan to approach the target shape can be created in the small-deformation analysis and the large-deformation analysis by a sufficiently learned AI.

  2. Learning of the bowl shape, the saddle shape, and the twisted shape took 6, 8, and 16 h, respectively. However, it took approximately 10 s to create a heating plan with AI with full learning. This indicates that the proposed system is applicable to practical uses, e.g., at a construction site.

  3. The combination of heating lines was considered by calculating the reward based on the final shape and adding the heating history to the input. It was shown that the CNN can learn multiple inputs with different properties, i.e., the difference between the current target shape and the heating history. In addition, it was shown that nonlinear phenomena (results of large-deformation analysis) can be learned using rewards that consider the combination of the heating lines and the learning heating history.

Acknowlegment

A part of this research is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Conflict of Interest

There are no conflicts of interest. This article does not include research in which human participants were involved. Informed consent not applicable. This article does not include any research in which animal participants were involved.

Data Availability Statement

The authors attest that all data for this study are included in the paper.

References

1.
Tango
,
Y.
,
Ishiyama
,
T.
, and
Suzuki
,
H.
,
2011
, “
IHIMU-α
” A Fully Automated Steel Plate Bending System for Shipbuilding, IHI Technical Report, 51, 1, pp.
24
29
.
2.
Silver
,
D.
,
Huang
,
A.
,
Maddison
,
C. J.
,
Guez
,
A.
,
Sifre
,
L.
,
Driessche
,
G.
,
Schrittwieser
,
J.
, et al
,
2016
, “
Mastering the Game of Go with Deep Neural Networks and Tree Search
,”
Nature
,
529
(
7587
), pp.
484
489
.
3.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Graves
,
A.
,
Antonoglou
,
I.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2013
,
Playing Atari With Deep Reinforcement Learning, NIPS Deep Learning Workshop
.
4.
Watkins
,
C. J. C. H.
, and
Dayan
,
P.
,
1992
, “
Q-Learning
,”
Mach. Learn.
,
8
, pp.
279
292
.
5.
Kingma
,
D. P.
, and
Ba
,
J. L.
,
2015
, Adam: A Method for Stochastic Optimization, 3rd International Conference for Learning Representations, San Diego, CA,
ICLR
,
arXiv:1412.6980
https://arxiv.org/abs/1412.6980.
6.
Schaul
,
T.
,
Quan
,
J.
,
Antonoglou
,
I.
, and
Silver
,
D.
,
2016
, Prioritized Experience Replay,
ICLR
,
arXiv:1511.05952
https://arxiv.org/abs/1511.05952
7.
Tokic
,
M.
, “
Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences
,”
KI'10 Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence
, R. Dillmann, J. Beyerer, U. D. Hanebeck, and T. Schultz, eds., Lecture Notes in Computer Science, vol 6359, Springer, Berlin/Heidelberg, pp.
203
210
.
8.
Abadi
,
M.
,
Agarwal
,
A.
,
Barham
,
P.
,
Brevdo
,
E.
,
Chen
,
Z.
,
Citro
,
C.
,
Corrado
,
G. S.
, et al
,
2015
,
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
, Preliminary White Paper,
arXiv:1603.04467
https://arxiv.org/abs/1603.04467
9.
Murakawa
,
H.
,
Luo
,
Y.
, and
Ueda
,
Y.
,
1996
, “
Prediction of Welding Deformation and Residual Stress by Elastic FEM Based on Inherent Strain
,”
J. Soc. Nav. Archit. Jpn.
,
180
(
180
), pp.
739
751
.
10.
Shibahara
,
M.
,
Ikushima
,
K.
,
Itoh
,
S.
, and
Masaoka
,
K.
,
2011
, “
Computational Method for Transient Welding Deformation and Stress for Large Scale Structure Based on Dynamic Explicit FEM
,”
Q. J. Jpn. Weld. Soc.
,
29
(
1
), pp.
1
9
.