The 5th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

May 27, 2016
Chicago, USA

In Conjunction with 30th IEEE International Parallel & Distributed Processing Symposium
May 23-27, 2016
Chicago Hyatt Regency
Chicago, Illinois USA
IPDPS 2016 logo

Best Paper Award

A Best Paper Award with a prize of $300 and certificate will be presented at the workshop. The award is sponsored by Huawei Technologies Co. Ltd.

Program

Title Presenter/Authors

8:30-8:40 AM

Opening remarks

 

8:40-9:30 AM

Keynote: Comparative Performance of a Big Data Problem on a Variety of Highly Parallel Architectures

Peter Kogge (University of Notre Dame, USA)

9:30-10:00 AM

Break

10:00-10:30 AM

A novel scalable DBSCAN algorithm with Spark

Dianwei Han (Northwestern University, USA)

10:30-11:00 AM

A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

Alex Gittens (International Computer Science Institute, USA); Jey Kottalam (University of California, Berkeley, USA); Jiyan Yang (Stanford University, USA); Michael F Ringenburg (Cray Inc., USA); Jatin Chhugani (HiPerform Inc., USA); Evan Racah (Lawrence Berkeley National Laboratory, USA); Mohitdeep Singh (Georgia Institute of Technology, USA); Yushu Yao (LBNL, USA); Curt Fischer, Oliver Ruebel and Benjamin Bowen (Lawrence Berkeley National Laboratory, USA); Norman Lewis (Washington State University, USA); Michael W Mahoney (University of California, Berkeley, USA); Venkat Krishnamurthy (Cray Inc, USA); Prabhat (LBNL, USA)

11:00-11:30 AM

Cache-Aware Approximate Computing for Decision Tree Learning

Orhan Kislal and Mahmut Taylan Kandemir (Penn State University, USA); Jagadish Kotra (The Pennsylvania State University, USA)

11:30-12:00 PM

Accelerating Support Count for Association Rule Mining on GPUs

Vasileios Zois (University of Southern California, USA); Anand Panangadan (California State University, Fullerton, USA); Viktor K. Prasanna (University of Southern California, USA)

12:00-1:30 PM

Lunch

1:30-2:00 PM

A Scheduling Algorithm for Hadoop MapReduce Workflows with Budget Constraints in the Heterogeneous Cloud

Andrew Wylie (Carleton University, Canada); Wei Shi (University of Ontario Institute of Technology, Canada); Jean-Pierre Corriveau (Carleton University, Canada); Yang Wang (Shenzhen Institute of Advanced Technology, P.R. China)

2:00-2:30 PM

An automatic tuning system for solving NP-hard problems in clouds

Yanik Ngoko (University of Paris 13, France); Denis R. Trystram and Valentin Reis (Grenoble Institute of Technology, France); Christophe Cerin (Universite de Paris 13, France)

2:30-3:00 PM

GraQL: A Query Language for High-Performance Attributed Graph Databases

Daniel Gerardo Chavarria and Vito Giovanni Castellana (Pacific Northwest National Laboratory, USA); Alessandro Morari (Pacific Northwest National Laboratory & Universitat Politecnica de Catalunya, USA); David J Haglin and John Feo (Pacific Northwest National Laboratory, USA)

3:00-3:30 PM

Scalable Overlapping Community Detection

Ismail ElHelw (Vrije Universiteit, The Netherlands); Rutger Hofman (Vrije Universiteit Amsterdam, The Netherlands); Wenzhe Li (University of Southern California, USA); Sungjin Ahn (Agency for Defense Development, Korea); Max Welling (University of California, Irvine, USA); Henri Bal (Vrije Universiteit, The Netherlands)

3:30-4:00 PM

Break

4:00-4:20 PM

An Efficient Parallel Nonlinear Clustering Algorithm using MapReduce

Xiang-You Peng, Yu-Bo Yang and Chang-Dong Wang (Sun Yat-sen University, P.R. China); Dong Huang (South China Agricultural University, P.R. China); Jianhuang Lai (Sun Yat-Sen University, P.R. China)

4:20-4:40 PM

A New Evaluation System for Scholars and Majors Based on Big-Data Techniques

Wenhua Yu (Jiangsu Big-Data Key Lab for Education Science and Engineering, P.R. China); Wenhua Yu (Harbin Engineering University, P.R. China)

4:40-5:00 PM

Distributed Real-time Data Analytic Frameworks

Sarwar Morshed (Linnaeus University, Sweden); Juwel Rana (Telenor Group Research, Norway and Linnaeus University, Sweden); Marcelo Milrad (Linnaeus University, Sweden)

5:00-5:15 PM

Concluding discussion

 

Keynote talk by Dr. Peter Kogge

Comparative Performance of a Big Data Problem on a Variety of Highly Parallel Architectures

Abstract: Non Obvious Relationship Analysis (NORA) is one of the most stressing classes of Big Data Analytics problems. This talks extends an earlier paper on scaling such problems on a variety of highly parallel architectures, including a newly emerging one that supports mobile threads within a large PGAS memory. Each step of this implementation is sized in terms of how much of four different resources (CPU, memory, disk, and network) might be used. From this, a parameterized model projecting both execution time and utilizations is used to identify the “tall poles” in performance. A “thought experiment” then uses this model to discover the parameters of a system that would provide both a near 100X speedup, but with a balanced design where no resource is badly over or under utilized.

Bio: PETER M. KOGGE received his Ph.D. in EE from Stanford in 1973. From 1968 until 1994 he was with IBM's Federal Systems Division, and was appointed an IBM Fellow in 1993. In August, 1994 he joined the University of Notre Dame as first holder of the endowed McCourtney Chair in Computer Science and Engineering. He has served as both Department Chair and Associate Dean for Research, College of Engineering. He is an IEEE Fellow, a Distinguished Visiting Scientist at JPL, and a founder of Emu Solutions, Inc. He holds over 40 patents and is author of two books, including the first text on pipelining. His Ph.D. thesis led to the Kogge-Stone adder used in many microprocessors. Other projects included EXECUBE - the world's first multi-core processor and first processor on a DRAM chip, the IBM 3838 Array processor which was for a time the fastest floating point machine marketed by IBM, and the IOP - the world’s second multi-threaded parallel processor which flew on every Space Shuttle. In 2008, he led DARPA’s Exascale technology study group, which resulted in a widely referenced report on technologies for exascale computing, and has had key roles on many other HPC programs. He has received the Daniel Slotnick best paper award (1994), the IEEE Seymour Cray award for high performance computer engineering (2012), the IEEE Charles Babbage award for contributions to the evolution of massively parallel processing architectures (2014), the IEEE Computer Pioneer award (2015), and the Gauss best paper award (2015). His interests are in massively parallel computing paradigms, processing in memory, and the relationship between emerging technology and computer architectures.

Call for Papers

Scaling up machine-learning (ML), data mining (DM) and reasoning algorithms from Artificial Intelligence (AI) for massive datasets is a major technical challenge in the times of "Big Data". The past ten years has seen the rise of multi-core and GPU based computing. In distributed computing, several frameworks such as Mahout, GraphLab and Spark continue to appear to facilitate scaling up ML/DM/AI algorithms using higher levels of abstraction. We invite novel works that advance the trio-fields of ML/DM/AI through development of scalable algorithms or computing frameworks. Ideal submissions would be characterized as scaling up X on Y, where potential choices for X and Y are provided below.  

Scaling up

On

PDF Flyer

Organization

Program Committee

Important Dates

Paper Guidelines

Submitted manuscripts should be 6-10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. Format requirements are posted on the IEEE IPDPS web page.

All submissions must be uploaded electronically at http://edas.info/newPaper.php?c=21782

Past workshops