TY - JOUR
T1 - Teach and Explore
T2 - A Multiplex Information-guided Effective and Efficient Reinforcement Learning for Sequential Recommendation
AU - Yan, Surong
AU - Shi, Chenglong
AU - Wang, Haosen
AU - Chen, Lei
AU - Jiang, Ling
AU - Guo, Ruilin
AU - Lin, Kwei Jay
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/4/29
Y1 - 2024/4/29
N2 - Casting sequential recommendation (SR) as a reinforcement learning (RL) problem is promising and some RL-based methods have been proposed for SR. However, these models are sub-optimal due to the following limitations: (a) they fail to leverage the supervision signals in the RL training to capture users' explicit preferences, leading to slow convergence; and (b) they do not utilize auxiliary information (e.g., knowledge graph) to avoid blindness when exploring users' potential interests. To address the above-mentioned limitations, we propose a multiplex information-guided RL model (MELOD), which employs a novel RL training framework with Teach and Explore components for SR. We adopt a Teach component to accurately capture users' explicit preferences and speed up RL convergence. Meanwhile, we design a dynamic intent induction network (DIIN) as a policy function to generate diverse predictions. We utilize the DIIN for the Explore component to mine users' potential interests by conducting a sequential and knowledge information joint-guided exploration. Moreover, a sequential and knowledge-aware reward function is designed to achieve stable RL training. These components significantly improve MELOD's performance and convergence against existing RL algorithms to achieve effectiveness and efficiency. Experimental results on seven real-world datasets show that our model significantly outperforms state-of-the-art methods.
AB - Casting sequential recommendation (SR) as a reinforcement learning (RL) problem is promising and some RL-based methods have been proposed for SR. However, these models are sub-optimal due to the following limitations: (a) they fail to leverage the supervision signals in the RL training to capture users' explicit preferences, leading to slow convergence; and (b) they do not utilize auxiliary information (e.g., knowledge graph) to avoid blindness when exploring users' potential interests. To address the above-mentioned limitations, we propose a multiplex information-guided RL model (MELOD), which employs a novel RL training framework with Teach and Explore components for SR. We adopt a Teach component to accurately capture users' explicit preferences and speed up RL convergence. Meanwhile, we design a dynamic intent induction network (DIIN) as a policy function to generate diverse predictions. We utilize the DIIN for the Explore component to mine users' potential interests by conducting a sequential and knowledge information joint-guided exploration. Moreover, a sequential and knowledge-aware reward function is designed to achieve stable RL training. These components significantly improve MELOD's performance and convergence against existing RL algorithms to achieve effectiveness and efficiency. Experimental results on seven real-world datasets show that our model significantly outperforms state-of-the-art methods.
KW - explicit and potential interests
KW - knowledge graph
KW - reinforcement learning
KW - Sequential recommendation
UR - http://www.scopus.com/inward/record.url?scp=85195053350&partnerID=8YFLogxK
U2 - 10.1145/3630003
DO - 10.1145/3630003
M3 - 文章
AN - SCOPUS:85195053350
SN - 1046-8188
VL - 42
JO - ACM Transactions on Information Systems
JF - ACM Transactions on Information Systems
IS - 5
M1 - 120
ER -