Large-Scale Multilevel Streaming Data Analytics

Workshop Chair(s): Farhana Zulkernine , Haruna Isah
Theme: Data and Analytics
Date: Tuesday, November 5th ( Afternoon)
Room: Primrose
Format: Speakers
Level: Beginner
Description: Motivation and Justification:

There is a monumental shift happening in how data powers organizational and business operations. This shift is about moving away from traditional batch data analytics to real-time and hybrid data analytics involving both static and continuous data to avoid delay in generating insights and storing massive amount of streaming data. A good number of analytics systems currently utilize stream processing without storing the data to quickly ingest, analyze and to correlate information as it arrives from thousands of real-time sources (devices, sensors, and applications). Such systems often provide real time dashboards and critical alerts, and therefore, are required to be fast, efficient, effective, scalable, and reliable.

In most cases stream processing is followed by batch processing for deeper analytical processing. Modern streaming analytic systems, therefore, try to unify batch and streaming analytics into a seamless data processing pipeline. A general architecture of a large-scale multilevel analytics system consists of (i) an ingestion mechanism at the front-end, (ii) streaming and batch data processing engines for data transformation, scoring, modelling of historical data, and real-time prediction, (iii) data storage units for persisting, indexing, searching, and knowledge management, (iv) resource management unit for the coordination of distributed compute and storage resources, and (v) visualization units to present results and knowledge for decision support.

Some of the deeper analytics of streaming data requires longer execution time and can choke the data processing pipeline. The stream plus batch analytics solves that problem. However, in our progression towards the Internet of Things (IoT), we will face serious computational and storage challenges in such an approach. Innovative solutions are needed to selectively store streaming data, enable near real time micro batch processing, and perform multi-level in-memory analytics.
Large-scale multilevel analytics on a unified platform is increasingly gaining attention in the industry as it can potentially enhance business and operational decision making. However, it faces the following challenges, a) implementing an efficient front-end for ingestion and integration of massive data streams across the globe, b) combining streaming and in-memory data analytics, c) developing a knowledge management strategy to store, manage and link big data and distributed knowledge, and d) other challenges including cluster management, knowledge representation, and visualization. The above challenges make the development of methods, algorithms, and infrastructures for multilevel streaming analytics a challenging but interesting research problem.

Goals and Outcomes:

This workshop aims to provide a forum for researchers and industry practitioners to discuss new ideas and share their experiences in the areas of streaming data analytics. Participants will present their work on topics including methods, models, algorithms, infrastructures, quality issues, applications, and open problems for large-scale streaming data analytics. The workshop can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and multilevel data analytics framework.

Workshop Structure:

The half-day workshop will feature invited talks by experts, practitioners, researchers, and industry partners working on massive streaming analytics research. There will be a time for discussion after each presentation to instigate the audiences to share their comments and views and ask questions to the speaker.
Agenda: Large-scale Multilevel Streaming Data Analytics
Workshop on Tuesday Oct 30th 3:15-5:15 PM
Room: Primrose
1. Introductory note @3:15pm by the workshop chair and co-chair

Workshop Chair
Farhana Zulkernine, PhD, PEng
Assistant Professor and Director, BAM Lab
Coordinator, Cognitive Science Program
School of Computing, Queens University

Workshop Co-chair
Haruna Isah, PhD
IBM SOSCIP Postdoctoral Fellow
Bigdata Management and Analytics Laboratory
School of Computing, Queens University

2. Title: Interpreting Data for Robotics
By Sidney Givigi
Associate Professor
Royal Military College of Canada (RMC)

Robots operate based on sensors, such as cameras and lidars, that generate a large amount of data. In order to take decisions, the data need to be interpreted. However, interpretation depends on how robots represent their world, i.e., on the models robots use to interact with their environment. This talk will discuss the relationships between sensor interpretation, models, and machine learning. The emphasis will be on how models based on machine learning techniques can be used to allow robots to interact with their environments through sensors and actuators.

3. Title: Parallelizing Trajectory Stream Analysis on Cloud Platforms
By Yan Liu
Associate Professor, Electric and Computer Engineering
Faculty of Engineering and Computer Science
Concordia University

The advances in location-acquisition technologies have generated massive spatio-temporal trajectory streams, which represent the mobility of a diversity of moving objects over time, such as people, vehicles, and animals. Discovery of traveling companions on trajectory data has many real-world applications. A key character of trajectory streams is large volumes of time-stamped data is constantly generated from diverse and geographically distributed sources. Thus techniques for handling high-speed trajectory streams should scale on distributed cluster computing. The main issues encapsulate three aspects, namely the design of discovery algorithms to represent the continuous trajectory data, the parallel implementation on a cluster, and the optimization techniques.

The goal of this talk is to provide a practical view of the techniques and challenges of parallelizing the analysis of big trajectory streams. In this talk, we present the following topics: (1) the discovery algorithms to identify the trends in data trajectories; (2) the parallel programming frameworks for both batch and streaming implementation; and (3) the optimization techniques on data partition and data shuffling on clustered nodes running on a public cloud.

4. Title: Everyday Soft Items Turned into Connected Systems to Improve Patient Care Efficiency
By Edward Shim
Managing Director of Studio 1 Labs

Analytics and forecasting accurate predictions are dependent on the quality of information available. Healthcare challenges are faced with fragmented bits of information that potentially hinder making better informed decisions related to health outcomes. IoT has facilitated this with low cost sensors and more powerful computing capability, yet fragmented information has been a product of technological limitations. The future of patient monitoring is an intelligent bed sheet system with fabric sensor technology, requiring no disruption of behaviour to adopt. This provides ordinary comfort for the user resulting in higher retention for longevity of use, along with overall greater consistency of results.

5. Title: Anomaly Detection and Monitoring as Security Analytics
By Marwa Elsayed
Postdoctoral Fellow
School of Computing, Queens University

Big data and analytics have become cornerstones not only of the modern IT landscape but also the security field. As the risk of cyber-attacks grows in sophistication and frequency, security analytics grows in importance and potential even more. In this talk, I will present a real-time security monitoring as a service (SMaaS). SMaaS is a novel framework that aims to detect security anomalies in cloud analytical applications running on Hadoop clusters. It aims to detect vulnerable, malicious, and misconfigured applications which violate data integrity and confidentiality. Towards achieving this goal, we are motivated by leveraging streaming data pipeline that mixes advanced software technologies (Apache NiFi, Hive, and Zeppelin) to automate the collection, management, analysis, and visualization of log data from multiple sources, making it cohesive and comprehensive for security inspection. SMaaS monitors a candidate application by collecting log data in real-time. Then, it leverages log data analysis to model the application's execution in terms of information flow. The information flow model is crucial for profiling processing activities conducted throughout the application's execution. Such a model, in turn, enriches the detection of various types of security anomalies with high accuracy and speed.

Workshop Speaker(s)
Sidney Givigi
Institution Royal Military College of Canada (RMC) 
Bio Sidney Givigi is currently an Associate Professor with the Royal Military College of Canada (RMC). His main interests are in control theory, autonomous robotics, unmanned aerial vehicles, sensors, and machine learning. 
Topic Interpreting Data for Robotics  
Yan Liu
Institution Concordia University 
Bio Yan Liu is an Associate Professor in Faculty of Engineering and Computer Science, Concordia University. Dr. Liu has over 15 years research experience of developing data intensive algorithms on distributed and parallel computing systems. Before her faculty position, she was a senior research scientist in Pacific Northwest National Laboratory (PNNL) in Washington State, delivering high performance and scalable data analysis platforms for domains of power systems, scientific computing and engineering simulation. Her recent research focuses on parallel and distributed machine learning, and automatically scaling back-end computing resources also by means of machine learning. 
Topic Parallelizing Trajectory Stream Analysis on Cloud Platforms 
Edward Shim
Institution Studio 1 Labs 
Bio Edward Shim is Managing Director of Studio 1 Labs, a global leader in fabric sensor technology. Edward overlooked 12 projects to validate a newly developed sensor used in a health application, and achieved clinical validation supported by academic collaborations, industry resources with high performance computing, and funding across all levels of government in Canada within two years.  
Topic Everyday Soft Items Turned into Connected Systems to Improve Patient Care Efficiency  
Marwa Elsayed
Institution Queens University 
Bio Marwa Elsayed is a postdoc fellow in the School of Computing, Queens University, Canada, where she is a member of the Queens Reliable Software Technology (QRST) research group. Dr. Elsayed recently received her PhD in Computing from Queens University in September 2018. She received her B.Sc. and M.Sc. degrees in Computer Science from the School of Computer and Information Sciences, Ain Shams University, Cairo, Egypt. She has 10+ years of interdisciplinary academic teaching and research experience. Her research interests span the areas of software security, Cloud Computing, security analytics, security services, and Internet of Things. Dr. Elsayed has received several recognitions and two best paper awards at top international conferences. More information about her research work and publications is available at∼marwa. 
Topic Anomaly Detection and Monitoring as Security Analytics