Poster 114: Optimizing Recommendation System Inference Performance Based on GPU

SC19 Proceedings

Poster 114: Optimizing Recommendation System Inference Performance Based on GPU

Authors: Xiaowei Shen (Alibaba Inc), Junrui Zhou (Alibaba Inc), Kan Liu (Alibaba Inc), Lingling Jin (Alibaba Inc), Pengfei Fan (Alibaba Inc), Wei Zhang (Alibaba Inc), Jun Yang (University of Pittsburgh)

Abstract: Neural network-based recommendation models have been widely applied on tracking personalization and recommendation tasks at large Internet companies such as e-commerce companies and social media companies. Alibaba recommendation system deploys WDL (wide and deep learning) models for product recommendation tasks. The WDL model consists of two main parts: embedding lookup and neural network-based feature ranking model that ranks different products for different users. As more and more products and users the model need to rank, the feature length and batch size of the models are increased. The computation of models is also increased so that traditional model inference implementation on CPU cannot meet the requirement of QPS (query per second) and latency of recommendation tasks. In this poster, we develop a GPU based system to speedup recommendation system inference performance. By model quantization and graph transformation, we can achieve 3.9x performance speedup when compared with a baseline GPU implementation.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing