论文周报 | 推荐系统领域最新研究进展

Original ML_RSer 机器学习与推荐算法 2022-12-14

收录于合集 #推荐系统干货分享 178个

嘿，记得给“机器学习与推荐算法”添加星标

本文精选了上周（0410-0417）最新的20篇推荐系统相关的论文，方向主要包括去偏推荐、对话推荐、基于负采样的推荐、联邦推荐、公平性推荐、序列化推荐、加速推荐系统训练、时尚推荐、新闻推荐、基于内容的协同过滤推荐等的推荐算法，应用涵盖会话推荐、序列推荐以及组推荐、新闻推荐等。为节省大家时间，只整理了论文标题以及摘要，如果感兴趣可移步原文精读。

论文标题：

1. Self-Guided Learning to Denoise for Robust Recommendation, SIGIR2022

2. A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems

3. To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?

4. Optimizing generalized Gini indices for fairness in rankings, SIGIR2022

5. Negative Sampling for Recommendation

6. CARCA: Context and Attribute-Aware Next-Item Recommendation via Cross-Attention

7. Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation, ICML2021

8. Decentralized Collaborative Learning Framework for Next POI Recommendation

9. Heterogeneous Acceleration Pipeline for Recommendation System Training

10. Recommender May Not Favor Loyal Users

11. HAKG: Hierarchy-Aware Knowledge Gated Network for Recommendation, SIGIR2022.

12. OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

13. FUM: Fine-grained and Fast User Modeling for News Recommendation, SIGIR2022.

14. News Recommendation with Candidate-aware User Modeling, SIGIR2022

15. ProFairRec: Provider Fairness-aware News Recommendation, SIGIR2022

16. Denoising Neural Network for News Recommendation with Positive and Negative Implicit Feedback, NAACL2022

17. IA-GCN: Interactive Graph Convolutional Network for Recommendation

18. GRAM: Fast Fine-tuning of Pre-trained Language Models for Content-based Collaborative Filtering, NAACL2022

19. Positive and Negative Critiquing for VAE-based Recommenders

20. CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

论文简介：

1. Self-Guided Learning to Denoise for Robust Recommendation, SIGIR2022

Yunjun Gao, Yuntao Du, Yujia Hu, Lu Chen, Xinjun Zhu, Ziquan Fang, Baihua Zheng

https://arxiv.org/abs/2204.06832

The ubiquity of implicit feedback makes them the default choice to build modern recommender systems. Generally speaking, observed interactions are considered as positive samples, while unobserved interactions are considered as negative ones. However, implicit feedback is inherently noisy because of the ubiquitous presence of noisy-positive and noisy-negative interactions. Recently, some studies have noticed the importance of denoising implicit feedback for recommendations, and enhanced the robustness of recommendation models to some extent. Nonetheless, they typically fail to (1) capture the hard yet clean interactions for learning comprehensive user preference, and (2) provide a universal denoising solution that can be applied to various kinds of recommendation models.

In this paper, we thoroughly investigate the memorization effect of recommendation models, and propose a new denoising paradigm, i.e., Self-Guided Denoising Learning (SGDL), which is able to collect memorized interactions at the early stage of the training (i.e., "noise-resistant" period), and leverage those data as denoising signals to guide the following training (i.e., "noise-sensitive" period) of the model in a meta-learning manner. Besides, our method can automatically switch its learning phase at the memorization point from memorization to self-guided learning, and select clean and informative memorized data via a novel adaptive denoising scheduler to improve the robustness. We incorporate SGDL with four representative recommendation models (i.e., NeuMF, CDAE, NGCF and LightGCN) and different loss functions (i.e., binary cross-entropy and BPR loss). The experimental results on three benchmark datasets demonstrate the effectiveness of SGDL over the state-of-the-art denoising methods like T-CE, IR, DeCA, and even state-of-the-art robust graph-based methods like SGCN and SGL.

2. A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems

Yang Deng, Wenxuan Zhang, Weiwen Xu, Wenqiang Lei, Tat-Seng Chua, Wai Lam

https://arxiv.org/abs/2204.06923

Recent years witnessed several advances in developing multi-goal conversational recommender systems (MG-CRS) that can proactively attract users' interests and naturally lead user-engaged dialogues with multiple conversational goals and diverse topics. Four tasks are often involved in MG-CRS, including Goal Planning, Topic Prediction, Item Recommendation, and Response Generation. Most existing studies address only some of these tasks. To handle the whole problem of MG-CRS, modularized frameworks are adopted where each task is tackled independently without considering their interdependencies. In this work, we propose a novel Unified MultI-goal conversational recommeNDer system, namely UniMIND. In specific, we unify these four tasks with different formulations into the same sequence-to-sequence (Seq2Seq) paradigm. Prompt-based learning strategies are investigated to endow the unified model with the capability of multi-task learning. Finally, the overall learning and inference procedure consists of three stages, including multi-task learning, prompt-based tuning, and inference. Experimental results on two MG-CRS benchmarks (DuRecDial and TG-ReDial) show that UniMIND achieves state-of-the-art performance on all tasks with a unified model. Extensive analyses and discussions are provided for shedding some new perspectives for MG-CRS.

3. To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?

Matteo Ciniselli, Luca Pascarella, Gabriele Bavota

https://arxiv.org/abs/2204.06894

Deep Learning (DL) models have been widely used to support code completion. These models, once properly trained, can take as input an incomplete code component (e.g., an incomplete function) and predict the missing tokens to finalize it. GitHub Copilot is an example of code recommender built by training a DL model on millions of open source repositories: The source code of these repositories acts as training data, allowing the model to learn "how to program". The usage of such a code is usually regulated by Free and Open Source Software (FOSS) licenses, that establish under which conditions the licensed code can be redistributed or modified. As of Today, it is unclear whether the code generated by DL models trained on open source code should be considered as "new" or as "derivative" work, with possible implications on license infringements. In this work, we run a large-scale study investigating the extent to which DL models tend to clone code from their training set when recommending code completions. Such an exploratory study can help in assessing the magnitude of the potential licensing issues mentioned before: If these models tend to generate new code that is unseen in the training set, then licensing issues are unlikely to occur. Otherwise, a revision of these licenses urges to regulate how the code generated by these models should be treated when used, for example, in a commercial setting. Highlights from our results show that ~$10% to ~0.1% of the predictions generated by a state-of-the-art DL-based code completion tool are Type-1 clones of instances in the training set, depending on the size of the predicted code. Long predictions are unlikely to be cloned.

4. Optimizing generalized Gini indices for fairness in rankings, SIGIR2022

Virginie Do, Nicolas Usunier

https://arxiv.org/abs/2204.06521

There is growing interest in designing recommender systems that aim at being fair towards item producers or their least satisfied users. Inspired by the domain of inequality measurement in economics, this paper explores the use of generalized Gini welfare functions (GGFs) as a means to specify the normative criterion that recommender systems should optimize for. GGFs weight individuals depending on their ranks in the population, giving more weight to worse-off individuals to promote equality. Depending on these weights, GGFs minimize the Gini index of item exposure to promote equality between items, or focus on the performance on specific quantiles of least satisfied users. GGFs for ranking are challenging to optimize because they are non-differentiable. We resolve this challenge by leveraging tools from non-smooth optimization and projection operators used in differentiable sorting. We present experiments using real datasets with up to 15k users and items, which show that our approach obtains better trade-offs than the baselines on a variety of recommendation tasks and fairness criteria.

5. Negative Sampling for Recommendation

Bin Liu, Bang Wang

https://arxiv.org/abs/2204.06520

How to effectively sample high-quality negative instances is important for well training a recommendation model. We argue that a high-quality negative should be both informativeness and unbiasedness. Although previous studies have proposed some approaches to address the informativeness in negative sampling, few has been done to discriminating false negative from true negative for unbiased negative sampling, not to mention taking both into consideration. This paper first adopts a parameter learning perspective to analyze negative informativeness and unbiasedness in loss gradient-based model training. We argue that both negative sampling and collaborative filtering include an implicit task of negative classification, from which we report an insightful yet beneficial finding about the order relation in predicted negatives' scores. Based on our finding and by regarding negatives as random variables, we next derive the class condition density of true negatives and that of false negatives. We also design a Bayesian classifier for negative classification, from which we define a quantitative unbiasedness measure for negatives. Finally, we propose to use a harmonic mean of informativeness and unbiasedness to sample high-quality negatives. Experimental studies validate the superiority of our negative sampling algorithm over the peers in terms of better sampling quality and better recommendation performance.

6. CARCA: Context and Attribute-Aware Next-Item Recommendation via Cross-Attention

Ahmed Rashed, Shereen Elsayed, Lars Schmidt-Thieme

https://arxiv.org/abs/2204.06519

In sparse recommender settings, users' context and item attributes play a crucial role in deciding which items to recommend next. Despite that, recent works in sequential and time-aware recommendations usually either ignore both aspects or only consider one of them, limiting their predictive performance. In this paper, we address these limitations by proposing a context and attribute-aware recommender model (CARCA) that can capture the dynamic nature of the user profiles in terms of contextual features and item attributes via dedicated multi-head self-attention blocks that extract profile-level features and predicting item scores. Also, unlike many of the current state-of-the-art sequential item recommendation approaches that use a simple dot-product between the most recent item's latent features and the target items embeddings for scoring, CARCA uses cross-attention between all profile items and the target items to predict their final scores. This cross-attention allows CARCA to harness the correlation between old and recent items in the user profile and their influence on deciding which item to recommend next. Experiments on four real-world recommender system datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of item recommendation and achieving improvements of up to 53% in Normalized Discounted Cumulative Gain (NDCG) and Hit-Ratio. Results also show that CARCA outperformed several state-of-the-art dedicated image-based recommender systems by merely utilizing image attributes extracted from a pre-trained ResNet50 in a black-box fashion.

7. Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation, ICML2021

Chao Chen, Haoyu Geng, Nianzu Yang, Junchi Yan, Daiyue Xue, Jianping Yu, Xiaokang Yang

https://arxiv.org/abs/2204.06517

User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user's future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.

8. Decentralized Collaborative Learning Framework for Next POI Recommendation

Jing Long, Tong Chen, Nguyen Quoc Viet Hung, Hongzhi Yin

https://arxiv.org/abs/2204.06516

Next Point-of-Interest (POI) recommendation has become an indispensable functionality in Location-based Social Networks (LBSNs) due to its effectiveness in helping people decide the next POI to visit. However, accurate recommendation requires a vast amount of historical check-in data, thus threatening user privacy as the location-sensitive data needs to be handled by cloud servers. Although there have been several on-device frameworks for privacy-preserving POI recommendations, they are still resource-intensive when it comes to storage and computation, and show limited robustness to the high sparsity of user-POI interactions. On this basis, we propose a novel decentralized collaborative learning framework for POI recommendation (DCLR), which allows users to train their personalized models locally in a collaborative manner. DCLR significantly reduces the local models' dependence on the cloud for training, and can be used to expand arbitrary centralized recommendation models. To counteract the sparsity of on-device user data when learning each local model, we design two self-supervision signals to pretrain the POI representations on the server with geographical and categorical correlations of POIs. To facilitate collaborative learning, we innovatively propose to incorporate knowledge from either geographically or semantically similar users into each local model with attentive aggregation and mutual information maximization. The collaborative learning process makes use of communications between devices while requiring only minor engagement from the central server for identifying user groups, and is compatible with common privacy preservation mechanisms like differential privacy.

9. Heterogeneous Acceleration Pipeline for Recommendation System Training

Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

https://arxiv.org/abs/2204.05436

Recommendation systems are unique as they show a conflation of compute and memory intensity due to their deep learning and massive embedding tables. Training these models typically involve a hybrid CPU-GPU mode, where GPUs accelerate the deep learning portion and the CPUs store and process the memory-intensive embedding tables. The hybrid mode incurs a substantial CPU-to-GPU transfer time and relies on main memory bandwidth to feed embeddings to GPU for deep learning acceleration. Alternatively, we can store the entire embeddings across GPUs to avoid the transfer time and utilize the GPU's High Bandwidth Memory (HBM). This approach requires GPU-to-GPU backend communication and scales the number of GPUs with the size of the embedding tables. To overcome these concerns, this paper offers a heterogeneous acceleration pipeline, called Hotline.

Hotline leverages the insight that only a small number of embedding entries are accessed frequently, and can easily fit in a single GPU's HBM. Hotline implements a data-aware and model-aware scheduling pipeline that utilizes the (1) CPU main memory for not-frequently-accessed embeddings and (2) GPUs' local memory for frequently-accessed embeddings. Hotline improves the training throughput by dynamically stitching the execution of popular and not-popular inputs through a novel hardware accelerator and feeding to the GPUs. Results on real-world datasets and recommender models show that Hotline reduces the average training time by 3x and 1.8x in comparison to Intel-optimized CPU-GPU DLRM and HugeCTR-optimized GPU-only baseline, respectively. Hotline increases the overall training throughput to 35.7 epochs/hour in comparison to 5.3 epochs/hour for the Intel-optimized DLRM baseline.

10. Recommender May Not Favor Loyal Users

Yitong Ji, Aixin Sun, Jie Zhang, Chenliang Li

https://arxiv.org/abs/2204.05927

In academic research, recommender systems are often evaluated on benchmark datasets, without much consideration about the global timeline. Hence, we are unable to answer questions like: Do loyal users enjoy better recommendations than non-loyal users? Loyalty can be defined by the time period a user has been active in a recommender system, or by the number of historical interactions a user has. In this paper, we offer a comprehensive analysis of recommendation results along global timeline. We conduct experiments with five widely used models, i.e., BPR, NeuMF, LightGCN, SASRec and TiSASRec, on four benchmark datasets, i.e., MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic. Our experiment results give an answer "No" to the above question. Users with many historical interactions suffer from relatively poorer recommendations. Users who stay with the system for a short time period enjoy better recommendations. Both findings are counter-intuitive. Interestingly, users who have recently interacted with the system, with respect to the time point of the test instance, enjoy better recommendations. The finding on recency applies to all users, regardless of users' loyalty. Our study offers a different perspective to understand recommender performance, and our findings could trigger a revisit of recommender model design.

11. HAKG: Hierarchy-Aware Knowledge Gated Network for Recommendation, SIGIR2022.

Yuntao Du, Xinjun Zhu, Lu Chen, Baihua Zheng, Yunjun Gao

https://arxiv.org/abs/2204.04959

Knowledge graph (KG) plays an increasingly important role to improve the recommendation performance and interpretability. A recent technical trend is to design end-to-end models based on information propagation schemes. However, existing propagation-based methods fail to (1) model the underlying hierarchical structures and relations, and (2) capture the high-order collaborative signals of items for learning high-quality user and item representations. In this paper, we propose a new model, called Hierarchy-Aware Knowledge Gated Network (HAKG), to tackle the aforementioned problems. Technically, we model users and items (that are captured by a user-item graph), as well as entities and relations (that are captured in a KG) in hyperbolic space, and design a hyperbolic aggregation scheme to gather relational contexts over KG. Meanwhile, we introduce a novel angle constraint to preserve characteristics of items in the embedding space. Furthermore, we propose a dual item embeddings design to represent and propagate collaborative signals and knowledge associations separately, and leverage the gated aggregation to distill discriminative information for better capturing user behavior patterns. Experimental results on three benchmark datasets show that, HAKG achieves significant improvement over the state-of-the-art methods like CKAN, Hyper-Know, and KGIN. Further analyses on the learned hyperbolic embeddings confirm that HAKG offers meaningful insights into the hierarchies of data.

12. OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Rohan Sarkar, Navaneeth Bodla, Mariya Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, Gerard Medioni

https://arxiv.org/abs/2204.04812

Learning an effective outfit-level representation is critical for predicting the compatibility of items in an outfit, and retrieving complementary items for a partial outfit. We present a framework, OutfitTransformer, that uses the proposed task-specific tokens and leverages the self-attention mechanism to learn effective outfit-level representations encoding the compatibility relationships between all items in the entire outfit for addressing both compatibility prediction and complementary item retrieval tasks. For compatibility prediction, we design an outfit token to capture a global outfit representation and train the framework using a classification loss. For complementary item retrieval, we design a target item token that additionally takes the target item specification (in the form of a category or text description) into consideration. We train our framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification as inputs. The generated target item embedding is then used to retrieve compatible items that match the rest of the outfit. Additionally, we adopt a pre-training approach and a curriculum learning strategy to improve retrieval performance. Since our framework learns at an outfit-level, it allows us to learn a single embedding capturing higher-order relations among multiple items in the outfit more effectively than pairwise methods. Experiments demonstrate that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks. We further validate the quality of our retrieval results with a user study.

13. FUM: Fine-grained and Fast User Modeling for News Recommendation, SIGIR2022.

Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang

User modeling is important for news recommendation. Existing methods usually first encode user's clicked news into news embeddings independently and then aggregate them into user embedding. However, the word-level interactions across different clicked news from the same user, which contain rich detailed clues to infer user interest, are ignored by these methods. In this paper, we propose a fine-grained and fast user modeling framework (FUM) to model user interest from fine-grained behavior interactions for news recommendation. The core idea of FUM is to concatenate the clicked news into a long document and transform user modeling into a document modeling task with both intra-news and inter-news word-level interactions. Since vanilla transformer cannot efficiently handle long document, we apply an efficient transformer named Fastformer to model fine-grained behavior interactions. Extensive experiments on two real-world datasets verify that FUM can effectively and efficiently model user interest for news recommendation.

14. News Recommendation with Candidate-aware User Modeling, SIGIR2022

Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang

https://arxiv.org/abs/2204.04726

News recommendation aims to match news with personalized user interest. Existing methods for news recommendation usually model user interest from historical clicked news without the consideration of candidate news. However, each user usually has multiple interests, and it is difficult for these methods to accurately match a candidate news with a specific user interest. In this paper, we present a candidate-aware user modeling method for personalized news recommendation, which can incorporate candidate news into user modeling for better matching between candidate news and user interest. We propose a candidate-aware self-attention network that uses candidate news as clue to model candidate-aware global user interest. In addition, we propose a candidate-aware CNN network to incorporate candidate news into local behavior context modeling and learn candidate-aware short-term user interest. Besides, we use a candidate-aware attention network to aggregate previously clicked news weighted by their relevance with candidate news to build candidate-aware user representation. Experiments on real-world datasets show the effectiveness of our method in improving news recommendation performance.

15. ProFairRec: Provider Fairness-aware News Recommendation, SIGIR2022

Tao Qi, Fangzhao Wu, Chuhan Wu, Peijie Sun, Le Wu, Xiting Wang, Yongfeng Huang, Xing Xie

https://arxiv.org/abs/2204.04724

News recommendation aims to help online news platform users find their preferred news articles. Existing news recommendation methods usually learn models from historical user behaviors on news. However, these behaviors are usually biased on news providers. Models trained on biased user data may capture and even amplify the biases on news providers, and are unfair for some minority news providers. In this paper, we propose a provider fairness-aware news recommendation framework (named ProFairRec), which can learn news recommendation models fair for different news providers from biased user data. The core idea of ProFairRec is to learn provider-fair news representations and provider-fair user representations to achieve provider fairness. To learn provider-fair representations from biased data, we employ provider-biased representations to inherit provider bias from data. Provider-fair and -biased news representations are learned from news content and provider IDs respectively, which are further aggregated to build fair and biased user representations based on user click history. All of these representations are used in model training while only fair representations are used for user-news matching to achieve fair news recommendation. Besides, we propose an adversarial learning task on news provider discrimination to prevent provider-fair news representation from encoding provider bias. We also propose an orthogonal regularization on provider-fair and -biased representations to better reduce provider bias in provider-fair representations. Moreover, ProFairRec is a general framework and can be applied to different news recommendation methods. Extensive experiments on a public dataset verify that our ProFairRec approach can effectively improve the provider fairness of many existing methods and meanwhile maintain their recommendation accuracy.

16. Denoising Neural Network for News Recommendation with Positive and Negative Implicit Feedback, NAACL2022

Yunfan Hu, Zhaopeng Qiu, Xian Wu

https://arxiv.org/abs/2204.04397

News recommendation is different from movie or e-commercial recommendation as people usually do not grade the news. Therefore, user feedback for news is always implicit (click behavior, reading time, etc). Inevitably, there are noises in implicit feedback. On one hand, the user may exit immediately after clicking the news as he dislikes the news content, leaving the noise in his positive implicit feedback; on the other hand, the user may be recommended multiple interesting news at the same time and only click one of them, producing the noise in his negative implicit feedback. Opposite implicit feedback could construct more integrated user preferences and help each other to minimize the noise influence. Previous works on news recommendation only used positive implicit feedback and suffered from the noise impact. In this paper, we propose a denoising neural network for news recommendation with positive and negative implicit feedback, named DRPN. DRPN utilizes both feedback for recommendation with a module to denoise both positive and negative implicit feedback to further enhance the performance. Experiments on the real-world large-scale dataset demonstrate the state-of-the-art performance of DRPN.

17. IA-GCN: Interactive Graph Convolutional Network for Recommendation

Yinan Zhang, Pei Wang, Xiwei Zhao, Hao Qi, Jie He, Junsheng Jin, Changping Peng, Zhangang Lin, Jingping Shao

https://arxiv.org/abs/2204.03827

Recently, Graph Convolutional Network (GCN) has become a novel state-of-art for Collaborative Filtering (CF) based Recommender Systems (RS). It is a common practice to learn informative user and item representations by performing embedding propagation on a user-item bipartite graph, and then provide the users with personalized item suggestions based on the representations. Despite effectiveness, existing algorithms neglect precious interactive features between user-item pairs in the embedding process. When predicting a user's preference for different items, they still aggregate the user tree in the same way, without emphasizing target-related information in the user neighborhood. Such a uniform aggregation scheme easily leads to suboptimal user and item representations, limiting the model expressiveness to some extent.

In this work, we address this problem by building bilateral interactive guidance between each user-item pair and proposing a new model named IA-GCN (short for InterActive GCN). Specifically, when learning the user representation from its neighborhood, we assign higher attention weights to those neighbors similar to the target item. Correspondingly, when learning the item representation, we pay more attention to those neighbors resembling the target user. This leads to interactive and interpretable features, effectively distilling target-specific information through each graph convolutional operation. Our model is built on top of LightGCN, a state-of-the-art GCN model for CF, and can be combined with various GCN-based CF architectures in an end-to-end fashion. Extensive experiments on three benchmark datasets demonstrate the effectiveness and robustness of IA-GCN.

18. GRAM: Fast Fine-tuning of Pre-trained Language Models for Content-based Collaborative Filtering, NAACL2022

Yoonseok Yang, Kyu Seok Kim, Minsam Kim, Juneyoung Park

https://arxiv.org/abs/2204.04179

Content-based collaborative filtering (CCF) provides personalized item recommendations based on both users' interaction history and items' content information. Recently, pre-trained language models (PLM) have been used to extract high-quality item encodings for CCF. However, it is resource-intensive to finetune PLM in an end-to-end (E2E) manner in CCF due to its multi-modal nature: optimization involves redundant content encoding for interactions from users. For this, we propose GRAM (GRadient Accumulation for Multi-modality): (1) Single-step GRAM which aggregates gradients for each item while maintaining theoretical equivalence with E2E, and (2) Multi-step GRAM which further accumulates gradients across multiple training steps, with less than 40% GPU memory footprint of E2E. We empirically confirm that GRAM achieves a remarkable boost in training efficiency based on five datasets from two task domains of Knowledge Tracing and News Recommendation, where single-step and multi-step GRAM achieve 4x and 45x training speedup on average, respectively.

19. Positive and Negative Critiquing for VAE-based Recommenders

Diego Antognini, Boi Faltings

https://arxiv.org/abs/2204.02162

Providing explanations for recommended items allows users to refine the recommendations by critiquing parts of the explanations. As a result of revisiting critiquing from the perspective of multimodal generative models, recent work has proposed M&Ms-VAE, which achieves state-of-the-art performance in terms of recommendation, explanation, and critiquing. M&Ms-VAE and similar models allow users to negatively critique (i.e., explicitly disagree). However, they share a significant drawback: users cannot positively critique (i.e., highlight a desired feature). We address this deficiency with M&Ms-VAE+, an extension of M&Ms-VAE that enables positive and negative critiquing. In addition to modeling users' interactions and keyphrase-usage preferences, we model their keyphrase-usage dislikes. Moreover, we design a novel critiquing module that is trained in a self-supervised fashion. Our experiments on two datasets show that M&Ms-VAE+ matches or exceeds M&Ms-VAE in recommendation and explanation performance. Furthermore, our results demonstrate that representing positive and negative critiques differently enables M&Ms-VAE+ to significantly outperform M&Ms-VAE and other models in positive and negative multi-step critiquing.

20. CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You

https://arxiv.org/pdf/2204.06240

The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/zhengzangw/LargeBatchCTR.

欢迎干货投稿 \ 论文宣传 \ 合作交流

由于公众号试行乱序推送，您可能不再准时收到机器学习与推荐算法的推送。为了第一时间收到本号的干货内容，请将本号设为星标，以及常点文末右下角的“在看”。

喜欢的话点个在看吧👇

贺雪峰：精准扶贫为何陷入形式主义？！

隐形的小地方豪门，好日子真到头了

突发！法〔2024〕163 号：最高人民法院发布审执协调35条详细指引！彻底解决执行难！

全国大基建，要停了！

吓人！甲醛最长能潜伏15年，清华大学学霸发明了这个，专克甲醛，比50盆绿植都好用！

论文周报 | 推荐系统领域最新研究进展

欢迎干货投稿 \ 论文宣传 \ 合作交流

推荐阅读

由于公众号试行乱序推送，您可能不再准时收到机器学习与推荐算法的推送。为了第一时间收到本号的干货内容，请将本号设为星标，以及常点文末右下角的“在看”。

您可能也对以下帖子感兴趣

贺雪峰：精准扶贫为何陷入形式主义？！

隐形的小地方豪门，好日子真到头了

突发！法〔2024〕163 号：最高人民法院发布审执协调35条详细指引！彻底解决执行难！

全国大基建，要停了！

吓人！甲醛最长能潜伏15年，清华大学学霸发明了这个，专克甲醛，比50盆绿植都好用！

生成图片，分享到微信朋友圈

论文周报 | 推荐系统领域最新研究进展

欢迎干货投稿 \ 论文宣传 \ 合作交流

推荐阅读

由于公众号试行乱序推送，您可能不再准时收到机器学习与推荐算法的推送。为了第一时间收到本号的干货内容， 请将本号设为星标，以及常点文末右下角的“在看”。

您可能也对以下帖子感兴趣

由于公众号试行乱序推送，您可能不再准时收到机器学习与推荐算法的推送。为了第一时间收到本号的干货内容，请将本号设为星标，以及常点文末右下角的“在看”。