其他

报名 | 智能媒体前沿技术研讨会暨清华大学“计算未来”博硕论坛听众招募

2017-06-15 数据派THU


时间6月17日

清华大学东门内FIT楼1-312

讨会主席孙立峰

坛主席马茗、张文鹏


日程安排



嘉宾信息


王涛博士

爱奇艺公司首席科学家


分享嘉宾:王涛博士,中国计算机学会(CCF)理事,杰出会员,计算机视觉专委会副主任,爱奇艺公司首席科学家。1996年毕业于中国科技大学电子工程系,获学士学位。1999年毕业于中科院遥感应用研究所,获硕士学位。2003年毕业于清华大学计算机系,获博士学位。2003年加入英特尔中国研究院应用研究实验室,任高级研究员。2014年加入爱奇艺公司,主要从事多媒体智能分析,计算机视觉、模式识别、虚拟现实、数据挖掘等相关技术的研究,领导并参与了爱奇艺随视购、视频拆条、视频审核、以图搜剧、对白搜索、场景看点、音频水印、VR渲染直播等项目。在IJCV、ACM MMSJ、CVPR、ACM Multimedia等国际期刊和会议上发表论文六十余篇,申请三十多项专利,翻译《软件优化技术》著作一本。


报告题目:基于深度学习的视频理解


报告摘要:近年来,随着云计算,大数据分析、人工智能等技术的快速发展,互联网视频快速增长,在智能视频处理和用户体验方面也带来了很多创新与挑战。报告将介绍视频媒体发展的现状,爱奇艺基于深度学习的随视购,以图搜剧,对白搜索,智能描述技术,和爱奇艺泡泡社交网络应用及需求。


李晓波

阿里无线技术高级专家


分享嘉宾:李晓波,花名篱悠。2009年毕业于北京大学软件学院,获得硕士学位。毕业加入阿里巴巴工作以来,先后在B2B、阿里云和无线事业部多个部门任职。目前就职于淘宝产品技术部,主要负责音视频算法在客户端上的应用和落地。带领团队先后实现了客户端上的实时H265解码、实时人脸检测和滤镜效果、实时行人检测等算法,很好的支持了手淘直播、短视频等音视频业务的健康发展。

 

报告题目:《音视频技术在手淘中的应用与实践》

 

报告摘要:近两年来,音视频流量在整体网络流量中的占比越来越高。据预测到2019年,音视频流量将占到网络总流量的80%左右。在这个大的背景趋势下,手淘做为阿里巴巴最大的流量入口,将面临什么的挑战与机遇?音视频算法在向计算能力相对受限的客户端上应用的时候又会遇到什么样的挑战呢?本期的报告主要是向大家汇报下,我们在支撑手淘音视频业务健康发展情况下,我们在实时算法优化方面做的一些具体的工作。


马林

腾讯AI lab高级研究工程师


嘉宾介绍:Lin Ma is now a Senior Research Engineer with Tencent AI Lab, Shenzhen, China. Previously, he was a Researcher with Huawei Noah’s Ark Lab, Hong Kong from Aug. 2013 to Sep. 2016. He received his Ph.D. degree in Department of Electronic Engineering at the Chinese University of Hong Kong (CUHK) in 2013. He received the B. E., and M. E. degrees from Harbin Institute of Technology, Harbin, China, in 2006 and 2008, respectively, both in computer science. His current research interests lie in the areas of deep learning and multimodal learning, specifically for image and language, image/video understanding, and quality assessment.

 

Dr. Ma got the best paper award in Pacific-Rim Conference on Multimedia (PCM) 2008. He was awarded the Microsoft Research Asia fellowship in 2011. He was a finalist to HKIS young scientist award in engineering science in 2012.

 

报告题目:Video Processing and Understanding

 

报告摘要:In this talk, we will give a brief introduction of our recent work on video processing and understanding. First, we will introduce our proposed video style transfer, where the convolutional neural network is employed to preserve the original image content, introduce the style information, and maintain the temporal consistency between consecutive frames. Second, video classification based on neural networks is introduced. Based on the video pixel information, the video representations can be learned from scratch toward effective classification. Also we can work on the off-the-shelf frame representations to yield the final classification. Moreover, we will discuss computer vision research work conducted in Tencent AI Lab.  


李诚

商汤科技高级研究员


分享嘉宾:李诚,商汤科技高级研究员,曾就读于清华大学物理系。2013年毕业后加入商汤科技,联合创始人之一。在CVPR ICCV等顶级会议发表论文多篇,目前在商汤负责人脸相关的算法研究及大规模数据的获取与分析。


报告题目:建立技术到用户的桥梁:计算机视觉产品的落地初探


报告摘要:随着近五年计算机视觉技术的快速发展,越来越多的算法已经进入了实际的产品。然而从一个数据集上跑分的算法,到真正客户手中的产品,有多少问题需要解决?是否一个准确率更高的算法,就能获得更好的用户体验?本报告试图从商汤科技在这几年,将计算机视觉算法交付给客户过程中积累的一部分经验和教训,来讨论这一问题。并对之后可能出现的技术进行一个展望。


学生paper


Session1 网络多媒体


Mobile Contextual Recommender System for Online Social Media(杨博文)

 

Abstract: Exponential growth of media consumption in online social networks demands effective recommendation to improve the quality of experience especially for on-the-go mobile users. By means of large-scale trace-driven measurements over mobile Twitter traces from users, we reveal the significance of affective features in shaping users' social media behaviors. Existing recommender systems however, rarely support this psychological effect in real-life. To capture this effect, in this paper we propose Kaleido, a real mobile system to achieve an affect-aware learning-based social media recommendation.Specifically, we design a machine learning mechanism to infer the affective feature within media contents. Furthermore, a cluster-based latent bias model is provided for jointly training the affect, behavior and social contexts. Our comprehensive experiments on Android prototype expose a superior prediction accuracy of 82%, with more than 20% accuracy improvement over existing mobile recommender systems. Moreover, by enabling users to offload their machine learning procedures to the deployed edge-cloud testbed, our system achieves speed-up of a factor of 1,000 against the local data training execution on smartphones.

 

CP-OPERATED DASH CACHING VIA REINFORCEMENT LEARNING(庞峥元)

 

Abstract:In recent years, Dynamic Adaptive Streaming over HTTP (DASH) has gained momentum as an effective solution for delivering videos on the Internet. This trend is further driven by the deployment of existing HTTP cache infrastructures in DASH systems to reduce the traffic load as well as to serve clients better. However, deploying conventional cache servers in DASH systems still suffers from low cache hit ratio and bitrate oscillations, which makes it challenging for content providers (CPs) to balance the user-perceived quality-of-experience (QoE) and the operating cost in cache-enabled DASH systems. To address this challenge, we propose a CP-operated DASH caching framework to provide good user QoE with low cost. In particular, we first formulate the caching decision problem as a stochastic optimization problem over a finite time horizon. The objective of this problem is to maximize a weighted sum of the user QoE and the operating cost, termed as the utility. Then we design a reinforcement learning based online algorithm which can obtain approximately optimal solution of this problem. Through extensive trace-driven experiments, we show that our approach not only achieves 40% average improvement of the overall utility compared to baseline approaches, but also adapts to the server load.

 

Joint Request Balancing and Content Aggregation in Crowdsourced CDN(马茗)

 

Abstract:Recent years have witnessed a new content de- livery paradigm named crowdsourced CDN, in which devices deployed at edge network can prefetch contents and provide content delivery service. Crowdsourced CDN offers high-quality experience to end-users by reducing their content access latency and alleviates the load of network backbone by making use of network and storage resources at millions of edge devices. In such paradigm, redirecting content requests to proper devices is critical for user experience. The uniqueness of request redirection in such crowdsourced CDN lies that: on one hand, the bandwidth capacity of the crowdsourced CDN devices is limit, hence devices located at a crowded place can be easily overwhelmed when serving nearby user requests; on the other hand, contents requested in one device can be significantly different from another one, making request redirection strategies used in conventional CDNs which only aim to balance request loads ineffective. In this paper, we explore request redirection strategies that take both workload balance of devices and content requested by users into consideration. Our contributions are as follows. First, we conduct measurement studies, coving 1.8M users watching 0.4M videos, to understand request patterns in crowdsourced CDN. We observe that the loads of nearby devices can be very different and the contents requested at nearby devices can also be significantly different. These observations lead to our design for request balancing at nearby devices. Second, we formulate the request redirection problem by taking both the content access latency and the content replication cost into consideration, and propose a request balancing and content aggregation solution. Finally, we evaluate the performance of our design using trace-driven simulations, and observe our scheme outperforms the traditional strategy in terms of many metrics, e.g., we observe a content access latency reduction by 50% over traditional mechanisms such as the Nearest/Random request routing scheme.

 

Session 2: 图形学

 

Multiphase SPH Simulation for Interactive Fluids and Solids(严枭)


Abstract:This work extends existing multiphase-fluid SPH frameworks to cover solid phases, including deformable bodies and granular materials. In our extended multiphase SPH framework, the distribution and shapes of all phases, both fluids and solids, are uniformly represented by their volume fraction functions. The dynamics of the multiphase system is governed by conservation of mass and momentum within different phases. The behavior of individual phases and the interactions between them are represented by corresponding constitutive laws, which are functions of the volume fraction fields and the velocity fields. Our generalized multiphase SPH framework does not require separate equations for specific phases or tedious interface tracking. As the distribution, shape and motion of each phase is represented and resolved in the same way, the proposed approach is robust, efficient and easy to implement. Various simulation results are presented to demonstrate the capabilities of our new multiphase SPH framework, including deformable bodies, granular materials, interaction between multiple fluids and deformable solids, flow in porous media, and dissolution of deformable solids.

 

Interactive Image-Guided Modeling of Extruded Shapes(曹炎培)

 

Abstract: A recent trend in interactive modeling of 3D shapes from a single image is designing minimal interfaces, and accompanying algorithms, for modeling a specific class of objects. Expanding upon the range of shapes that existing minimal interfaces can model, we present an interactive image-guided tool for modeling shapes made up of extruded parts. An extruded part is represented by extruding a closed planar curve, called base, in the direction orthogonal to the base. To model each extruded part, the user only needs to sketch the projected base shape in the image. The main technical contribution is a novel optimization-based approach for recovering the 3D normal of the base of an extruded object by exploring both geometric regularity of the sketched curve and image contents. We developed a convenient interface for modeling multi-part shapes and a method for optimizing the relative placement of the parts. Our tool is validated using synthetic data and tested on real-world images.

 

Fast Multiple-Fluid Simulation Using Helmholtz Free Energy(杨涛)


Abstract: Multiple-fluid interaction is an interesting and common visual phenomenon we often observe. In this paper, we present an energy-based Lagrangian method that expands the capability of existing multiple-fluid methods to handle various phenomena, such as extraction, partial dissolution, etc. Based on our user-adjusted Helmholtz free energy functions, the simulated fluid evolves from high-energy states to low-energy states, allowing flexible capture of various mixing and unmixing processes. We also extend the original Cahn-Hilliard equation to be better able to simulate complex fluid-fluid interaction and rich visual phenomena such as motion-related mixing and position based pattern. Our approach is easily integrated with existing state-of-the-art smooth particle hydrodynamic (SPH) solvers and can be further implemented on top of the position based dynamics (PBD) method, improving the stability and incompressibility of the fluid during Lagrangian simulation under large time steps. Performance analysis shows that our method is at least 4 times faster than the state-of-the-art multiple-fluid method. Examples are provided to demonstrate the new capability and effectiveness of our approach.


Session 3 前沿应用

 

ViVo: Video-Augmented Dictionary for Vocabulary Learning (朱叶霜)

 

Abstract:Research on Computer-Assisted Language Learning (CALL) has shown that the use of multimedia materials such as images and videos can facilitate interpretation and memorization of new words and phrases by providing richer cues than text alone. We present ViVo, a novel video-augmented dictionary that provides an inexpensive, convenient, and scalable way to exploit huge online video resources for vocabulary learning. ViVo automatically generates short video clips from existing movies with the target word highlighted in the subtitles. In particular, we apply a word sense disambiguation algorithm to identify the appropriate movie scenes with adequate contextual information for learning. We analyze the challenges and feasibility of this approach and describe our interaction design. A user study showed that learners were able to retain nearly 30% more new words with ViVo than with a standard bilingual dictionary days after learning. They preferred our video-augmented dictionary for its benefits in memorization and enjoyable learning experience.

 

Intra Frame Flicker Reduction for Parallelized HEVC Encoding(温子煜)

 

Abstract:The existing intra flicker artifact reduction approaches, targeting at one of the major artifacts in current video encoding techniques, are not compatible with the distributed encoding structure, which is increasingly important in modern computing systems. To settle this problem, we propose a flicker reduction approach, which is effective, standard compliant, and especially suitable for parallel and distributed systems. Experimental results show that the proposed approach can reduce the flicker artifact by up to 60% on x265 and 14% on HM.

 

 

Projection-free Distributed Online Learning in Networks(张文鹏)

 

Abstract:The conditional gradient algorithm has regained a surge of research interest in recent years due to its high efficiency in handling large-scale machine learning problems. However, none of existing studies has explored it in the distributed online learning setting, where locally light computation is assumed. In this paper, we fill this gap by proposing the distributed online conditional gradient algorithm, which eschews the expensive projection operation by exploiting much simpler linear optimization steps. We give a regret bound for the proposed algorithm as a function of the network size and topology, which will be smaller on smaller graphs or "well-connected" graphs. Experiments on two large-scale real-world datasets for a multiclass classification task confirm the computational benefit of the proposed algorithm and also verify the theoretic regret bound.

 

FinePar: Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures (张峰)

 

Abstract:The integrated architecture that features both CPU and GPU on the same die is an emerging and promising architecture for fine-grained CPU-GPU collaboration. However, the integration also brings forward several programming and system optimization challenges, especially for irregular applications. The complex interplay between heterogeneity and irregularity leads to very low processor utilization of running irregular applications on integrated architectures. Furthermore, fine-grained co-processing on the CPU and GPU is still an open problem. Particularly, in this paper, we show that the previous workload partitioning for CPU-GPU coprocessing is far from ideal in terms of resource utilization and performance. To solve this problem, we propose a system software named FinePar, which considers architectural differences of the CPU and GPU and leverages finegrained collaboration enabled by integrated architectures. Through irregularity-aware performance modeling and online auto-tuning, FinePar partitions irregular workloads and achieves both device-level and thread-level load balance. We evaluate FinePar with 8 irregular applications on an AMD integrated architecture and compare it with state-of-the-art partitioning approaches. Results show that FinePar demonstrates better resource utilization and achieves an average of 1.38X speedup over the optimal coarse-grained partitioning method.


公众号底部菜单有惊喜哦!

企业,个人加入组织请查看“联合会”

往期精彩内容请查看“号内搜”

加入志愿者或联系我们请查看“关于我们”

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存