21st Italian Symposium on Advanced Database Systems

(21mo Convegno Nazionale su Sistemi Evoluti per Basi di Dati)

June 30th - July 03rd 2013, Roccella Jonica

Invited Speakers and Tutorialists

Minos Garofalakis

ECE Department, Technical University of Crete
Invited Talk: Querying Big, Dynamic, Distributed Data

Abstract: Effective Big Data management and analysis poses several difficult challenges for modern database architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this talk, we introduce the distributed data streaming model, and discuss some of our recent results on tracking complex queries over massive distributed streams, as well as new research directions in this space.

Speaker Bio: Minos Garofalakis received the Diploma degree in Computer Engineering and Informatics (School of Engineering Valedictorian) from the University of Patras, Greece in 1992, and the MSc and PhD degrees in Computer Science from the University of Wisconsin-Madison in 1994 and 1998, respectively. He worked as a Member of Technical Staff at Bell Labs, Lucent Technologies in Murray Hill, NJ (1998-2005), as a Senior Researcher at Intel Research Berkeley in Berkeley, CA (2005-2007), and as a Principal Research Scientist at Yahoo! Research in Santa Clara, CA (2007-2008). In parallel, he also held an Adjunct Associate Professor position at the EECS Department of the University of California, Berkeley (2006-2008). As of October 2008, he is a Professor of Computer Science at the Department of Electronic & Computer Engineering of the Technical University of Crete, and the Director of the Software Technology and Network Applications Laboratory (SoftNet); he is also the current ECE Department Chair (2011-2013). Prof. Garofalakis' research interests include database systems, data streams, data synopses and approximate query processing, probabilistic databases, and data mining. His work has resulted in over 120 published scientific papers in these areas, and 35 US Patent filings (27 patents issued) for companies such as Lucent, Yahoo!, and AT&T. Harzing's Publish-or-Perish gives over 7500 citations to his work, and an h-index value of 46. Prof. Garofalakis is an ACM Distinguished Scientist (2011), and a recipient of the IEEE ICDE Best Paper Award (2009), the Bell Labs President's Gold Award (2004), and the Bell Labs Teamwork Award (2003).

Amit Sheth

Kno.e.sis - Wright State University
Invited Talk: Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

Abstract: Big Data has captured much interest in research and industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on technology that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity. However, the most important feature of data, the raison d'etre, is neither volume, variety, velocity, nor veracity -- but value. In this talk, I will emphasize the significance of Smart Data, and discuss how it is can be realized by extracting value from Big Data. To accomplish this task requires organized ways to harness and overcome the original four V-challenges; and while the technologies currently touted may provide some necessary infrastructure-- they are far from sufficient. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and leverage some of the extensive work that predates Big Data. For Volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration, and discuss how this can not simply be wished away using NoSQL. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics , which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations.
More information including talk are at:
http://wiki.knoesis.org/index.php/Smart_Data
http://www.slideshare.net/apsheth/big-data-to-smart-data-keynote

Bio Sketch: Amit P. Sheth is an educator, researcher, and entrepreneur. He is a LexisNexis Eminent Scholar (an endowed faculty position, funded by LexisNexis and the Ohio Board of Regents) at Wright State University. He directs the Ohio Center of Excellent in Knowledge-enabled Computing (Kno.e.sis), which conducts research in Web 3.0 and applications to healthcare and life sciences, cognitive science, and defense/intelligence. Kno.e.sis' activities have resulted in Wright State University recognized as a top organization in the world on World Wide Web in research impact. Prof. Sheth's research has led to several commercial products, many real-world applications and two companies which he founded and managed in various executive roles (President/CEO/CTO): Infocosm, and Taalee/Voquette/Semagix, which was likely the first company that developed Semantic Web applications and application development platforms. Professor Sheth is an IEEE Fellow and has received recognitions such as the IBM Faculty award. He is among the 100 most most cited authors in Computer Science (h-index of 80) and among the top authors in WWW and databases. He is on several journal editorial boards, is the Editor-in-Chief of the International Journal on Semantic Web and Information Systems (IJSWIS) and the joint-EIC of Distributed & Parallel Databases Journal.

Fabrizio Angiulli and Luigi Palopoli

DIMES, University of Calabria (Italy)
Tutorial: Outlier detection: tasks and techniques

Abstract: Outlier detection is a premier family of tasks in data mining. Intuitively, outlier detection amounts to recovering, within a given database, those individuals that significantly differ from any (homgeneous) group of observations stored therein. Because of its ample applicability over real domains (fraud detection, healthcare and so on), this family of tasks has gained increasing popularity in the research literature. This tutorial will be purposed to describe some of the main specific outlier detection problems and show the techniques that have been developed in order to approach them. To illustrate, we shall concentrate first on outlier detection from data, by discussing the principal definitions introduced in the data mining literature (and including, e.g., that of distance-based and density-based outlier detection) and then presenting some of the associated efficient algorithms available to date. Then, the issue of outlier explanation is dealt with, that is, given an outlier and a reference dataset, singling out the properties justifying the outlier abnormality as opposed to the provided dataset. Even in this case, definitions and techniques will be illustrated by looking at the problem as a special kind of outlier detection in (attribute) subspaces. Finally, the problem of outlier detection in knowledgebases will be considered.

Bio Sketch: Fabrizio Angiulli is an associate professor of computer engineering at DIMES, University of Calabria, Italy. Previously, he held a research and development position at ICAR of the National Research Council of Italy and, after that, a tenured assistant professor position at DEIS, University of Calabria. His main research interests are in the area of data mining, notably outlier detection and classification techniques, knowledge representation and reasoning and database management and theory. He has authored more than sixty papers appearing in premier journals and conference proceedings including ACM TOCL, ACM TODS, ACM TKDD, IEEE TPAMI, IEEE TKDE, IEEE TNN, AIJ, DAMI, TCS, PODS, AAAI, IJCAI, ICDM, ICML and others. He regularly serves in the program commitee of several conferences and, as an associate editor, in the editorial board of AI Communications. Fabrizio Angiulli is Senior Member of IEEE.

Bio Sketch: Luigi Palopoli has been a full professor of computer engineering at DIMES, University of Calabria, since 2003 where he leads the Artificial Intelligence and Data Analysis research group. Previously he held an assistant professor position (1991-1998) and an associate professor position (1998-2000) at DEIS, University of Calabria and a full professor position (2000-2003) at DIMET, University "Mediterranea" of Reggio Calabria. He was visiting scholar at UCLA, Technical University of Wien, AT&T Laboratories and Oxford University. His research interests are in the area of data mining, knowledge representation, artificial intelligence, game theory and bioinformatics. He authored more than 150 papers appearing in research journals and conference proceedings including ACM TODS, ACM TOCL, IEEE TKDE, AIJ, JAIR, ACM PODS, IJCAI and others. He has served in the program committees of many conferences and is on the editorial board of AI Communications. He is a co-founder and member of the board of directors of Exeura, a spin-off company of University of Calabria working in the area of knowledge management.