May 29, 2017
Orlando, Florida, USA
|8:30-9:30am||Keynote 1: Why Tables and Graphs for Knowledge Discovery Systems||John Feo, Northwest Institute for Advanced Computing|
|10:00-10:30am||ExtDict: Extensible Dictionaries for Data- and Platform-Aware Large-Scale Learning (ParLearning-01)||Azalia Mirhoseini, Bita Rouhani, Ebrahim Songhori and Farinaz Koushanfar|
|10:30-11:00am||Coded TeraSort (ParLearning-02)||Songze Li, Sucha Supittayapornpong, Mohammad Ali Maddah-Ali and Salman Avestimehr|
|11:00-11:30am||Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (ParLearning-03)||Nitin Gawande, Joshua Landwehr, Jeff Daily, Nathan Tallent, Abhinav Vishnu and Darren Kerbyson|
|11:30-12:00pm||Efficient and Portable ALS Matrix Factorization for Recommender Systems (ParLearning-04)||Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Xuhao Chen and Canqun Yang|
|1:30-2:30pm||Keynote 2: Matrix Factorization on GPUs: A Tale of Two Algorithms||Wei Tan, IBM T. J. Watson Research Center, NY, USA|
|2:30-3:00pm||Large-Scale Stochastic Learning using GPUs (ParLearning-05)||Thomas Parnell, Celestine Duenner, Kubilay Atasu, Manolis Sifalakis and Haris Pozidis|
|3:30-3:50pm||Distributed and in-situ machine learning for smart-homes and buildings: application to alarm sounds detection (ParLearning-06)||Amaury Durand, Yanik Ngoko and Christophe Cérin|
|3:50-4:10pm||The New Large-Scale RNNLM System Based On Distributed Neuron (ParLearning-07)||Dejiao Niu, Rui Xue, Tao Cai, Hai Li and Effah Kingsley|
|4:10-4:30pm||A Cache Friendly Parallel Encoder-Decoder Model without Padding on Mulit-core Architecture (ParLearning-08)||Yuchen Qiao, Kenjiro Taura, Kazuma Hashimoto, Yoshimasa Tsuruoka and Akkiko Eriguchi|
|4:30-4:45pm||Discussion of ParLearning 2018|
Dr. John Feo, Northwest Institute for Advanced Computing
Why Tables and Graphs for Knowledge Discovery Systems
Abstract: The availability of data is changing the way science, business, and law enforcement operate. Economic competitiveness and national security depend increasingly on the insightful analysis of large data sets. The breadth of analytic processes is forcing knowledge discovery platforms to supplement traditional table-based methods with graph methods that provide better support for sparse data and dynamic relationships among typed entities. While storing data in only tables makes it difficult to discover complex patterns of activities in time and space, tables are the most efficient data structures for storing dense node and edge attributes, and executing simple select and join operations. Consequently, knowledge discovery systems must support both in a natural way without preference. In this talk, I will describe the hybrid data model, SHAD, that we developing to support both graphs and tables. I will present several high-performance, scalable, analytic platforms developed by PNNL for graph analytics, machine learning, and knowledge discovery. I will include an overview of HAGGLE, our proposed graph analytic platform for new architectures.
Bio: DR. JOHN FEO is the Director of the Northwest Institute for Advanced Computing, a joint institute established by Pacific Northwest National Laboratory and University of Washington. Previously, he managed a large DOD research project in graph algorithms, search, parallel computing, and multithreaded architectures. Dr. Feo received his Ph.D. in Computer Science from The University of Texas at Austin. He began his career at Lawrence Livermore National Laboratory where he managed the Computer Science Group and was the principal investigator of the Sisal Language Project. Dr. Feo then joined Tera Computer Company (now Cray Inc) where he was a principal engineer and product manager for the MTA-1 and MTA-2, the first two generations of the Cray’s multithreaded architecture. He has taken short sabbaticals to work at Sun Microsystem, Microsoft, and Context Relevant. Dr. Feo’s has held academic positions at UC Davis and Washington State University.
Dr. Wei Tan, IBM T. J. Watson Research Center, NY, USA
Matrix Factorization on GPUs: A Tale of Two Algorithms
Abstract: Matrix factorization (MF) is an approach to derive latent features from observations. It is at the heart of many algorithms, e.g., collaborative filtering, word embedding and link prediction. Alternating least Square (ALS) and stochastic gradient descent (SGD) are two popular methods in solving MF. SGD converges fast, while ALS is easy to parallelize and able to deal with non-sparse ratings. In this talk, I will introduce cuMF, a CUDA-based matrix factorization library that accelerates both ALS and SGD to solve very large-scale MF. cuMF uses a set of techniques to maximize the performance on single and multiple GPUs. These techniques include smart access of sparse data leveraging memory hierarchy, using data parallelism with model parallelism, approximate algorithms and storage. With only a single machine with up to four Nvidia GPU cards, cuMF can be 10 times as fast, and 100 times as cost-efficient, compared with the state-of-art distributed CPU solutions. In this talk I will also share lessons learned in accelerating compute- and memory-intensive kernels on GPUs.
Bio: DR. WEI TAN is a Research Staff Member at IBM T. J. Watson Research Center. His research interest includes big data, distributed systems, NoSQL and services computing. Currently he works on accelerating machine learning algorithms using scale-up (e.g., GPU) and scale-out (e.g., Spark) approaches. His work has been incorporated into IBM patent portfolio and software products such as Spark, BigInsights and Cognos. He received the IEEE Peter Chen Big Data Young Researcher Award (2016), Best Paper Award at ACM/IEEE ccGrid 2015, IBM Outstanding Technical Achievement Award (2017, 2016 and 2014), Best Student Paper Award at IEEE ICWS 2014, Best Paper Award at IEEE SCC 2011, Pacesetter Award from Argonne National Laboratory (2010), and caBIG Teamwork Award from the National Institute of Health (2008). For more information, please visit http://researcher.ibm.com/person/us-wtan and https://github.com/cumf.
Scaling up machine-learning (ML), data mining (DM) and reasoning algorithms from Artificial Intelligence (AI) for massive datasets is a major technical challenge in the time of "Big Data". The past ten years have seen the rise of multi-core and GPU based computing. In parallel and distributed computing, several frameworks such as OpenMP, OpenCL, and Spark continue to facilitate scaling up ML/DM/AI algorithms using higher levels of abstraction. We invite novel works that advance the trio-fields of ML/DM/AI through development of scalable algorithms or computing frameworks. Ideal submissions would be characterized as scaling up X on Y, where potential choices for X and Y are provided below.
Proceedings of the Parlearning workshop will be distributed at the conference and will be submitted for inclusion in the IEEE Xplore Digital Library after the conference.PDF Flyer
Travel awards: Students with accepted papers have a chance to apply for a travel award. Please find details on the IEEE IPDPS web page.
Submitted manuscripts should be upto 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. Format requirements are posted on the IEEE IPDPS web page.
All submissions must be uploaded electronically at https://easychair.org/conferences/?conf=parlearning2017