TY - JOUR
T1 - Simulated annealing with reinforcement learning for the set team orienteering problem with time windows
AU - Yu, Vincent F.
AU - Salsabila, Nabila Yuraisyah
AU - Lin, Shih Wei
AU - Gunawan, Aldy
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/3/15
Y1 - 2024/3/15
N2 - This research investigates the Set Team Orienteering Problem with Time Windows (STOPTW), a new variant of the well-known Team Orienteering Problem with Time Windows and Set Orienteering Problem. In the STOPTW, customers are grouped into clusters. Each cluster is associated with a profit attainable when a customer in the cluster is visited within the customer's time window. A Mixed Integer Linear Programming model is formulated for STOPTW to maximizing total profit while adhering to time window constraints. Since STOPTW is an NP-hard problem, a Simulated Annealing with Reinforcement Learning (SARL) algorithm is developed. The proposed SARL incorporates the core concepts of reinforcement learning, utilizing the ε-greedy algorithm to learn the fitness values resulting from neighborhood moves. Numerical experiments are conducted to assess the performance of SARL, comparing the results with those obtained by CPLEX and Simulated Annealing (SA). For small instances, both SARL and SA algorithms outperform CPLEX by obtaining eight optimal solutions and 12 better solutions. For large instances, both algorithms obtain better solutions to 28 out of 29 instances within shorter computational times compared to CPLEX. Overall, SARL outperforms SA by resulting in lower gap percentages within the same computational times. Specifically, SARL outperforms SA in solving 13 large STOPTW benchmark instances. Finally, a sensitivity analysis is conducted to derive managerial insights.
AB - This research investigates the Set Team Orienteering Problem with Time Windows (STOPTW), a new variant of the well-known Team Orienteering Problem with Time Windows and Set Orienteering Problem. In the STOPTW, customers are grouped into clusters. Each cluster is associated with a profit attainable when a customer in the cluster is visited within the customer's time window. A Mixed Integer Linear Programming model is formulated for STOPTW to maximizing total profit while adhering to time window constraints. Since STOPTW is an NP-hard problem, a Simulated Annealing with Reinforcement Learning (SARL) algorithm is developed. The proposed SARL incorporates the core concepts of reinforcement learning, utilizing the ε-greedy algorithm to learn the fitness values resulting from neighborhood moves. Numerical experiments are conducted to assess the performance of SARL, comparing the results with those obtained by CPLEX and Simulated Annealing (SA). For small instances, both SARL and SA algorithms outperform CPLEX by obtaining eight optimal solutions and 12 better solutions. For large instances, both algorithms obtain better solutions to 28 out of 29 instances within shorter computational times compared to CPLEX. Overall, SARL outperforms SA by resulting in lower gap percentages within the same computational times. Specifically, SARL outperforms SA in solving 13 large STOPTW benchmark instances. Finally, a sensitivity analysis is conducted to derive managerial insights.
KW - Set orienteering problem
KW - Simulated annealing
KW - Team orienteering problem with time windows
UR - http://www.scopus.com/inward/record.url?scp=85174162908&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.121996
DO - 10.1016/j.eswa.2023.121996
M3 - 文章
AN - SCOPUS:85174162908
SN - 0957-4174
VL - 238
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 121996
ER -