The Rise and Fall of Data Science: from Noisy and Missing to Dark and Computationally Useful Data
Time:November 19, 2020, 9:00am – 11:00am EST
Co-Chairs:Radmila Juric, Sang Suh, Elisabetta Ronchieri
Brief description:
One of the most intriguing buzzwords of the second decade of the 21st century is DATA SCIENCE. It overflows academic curricula all over the world, and appears to be a driving force behind what we perceive today to be artificial intelligence. However, by looking at practices, computational models, software tools and development environments for data scientists, there is hardly any evidence that data science provides a scientific approach to resolving modern computational problems. However convenient it was to place these two words DATA and SCIENCE together, without agreeing on what their joint meaning would be, we are now creating computations, under the umbrella of data science, which diverge from our basic principles of computer science, built over the last 70 years. Problems are numerous. They start with a complete lack of formalism in terms of defining exactly which type of computational model data science generates. It continues with legitimate distorting of semantics of huge data sets, desperately needed for training algorithms, in order to “clean” or prepare/format data to be suitable for pre-prepared algorithms. They assume that noisy data have no semantic; they look at missing data values in data sets as places to be either eliminated or filled with “something”, without knowing their meaning and without having semantic justification for filling them with any data. Tools are allegedly performing data science operations upon your data without telling you exactly how they deal with semantic of data. They obscure algorithms behind them in order to either automate these computations (without your involvement) or perform faster data processing or give answers to any question one may have.
In this workshop we would like to debate important problems of
Collecting, formatting and preparation of data for further processing as prescribed in data science
understanding the impact of missing data to the overall semantic of the data set and their impact on the performance of the algorithms which use such data set
analysing the existence and the role of dark data which has various interpretation in data science and consequently affect results of computations upon them
believing that imputations in the data set brings trustworthy solutions in predictive algorithms and as such has to be performed.
eliminating noisy data, without understanding its semantics and purpose/role within the data set.
Questions to be answered at the workshop:
Predictive versus logic inference: Would computer scientists agree to using predictive technologies and algorithms and claim the power of predictive inference, without thinking about the manipulation of the meaning of data and using logic with reasoning, in order to claim “real” inference?
Is predictive inference a backbone of what is today called AI?
Does predictive inference really mean marching forward in modern computing because data science favours it?
Is Data Science a culprit which stops us in galloping forward in computer science?
Is there anything else Data Science can give us (apart from predictive inference)?
Does Data Science eliminate exactly what it should be: a SCIENCE which opens doors to many other innovative views on how to manipulate semantics of modern DATA?
Could we agree that predictive technologies are possible companions and not a “sine qua non” of our computations which are expected to pave the way towards artificial intelligence?
Radmila Juric
Dr Radmila Juric has been a lecturer at the Department of Science and Industry Systems, Faculty of Technology, Natural Sciences, and Maritime Sciences, at University of South East Norway since 2015. From 1990 to 2015 she worked for the UK higher education, teaching computer science, information systems and business studies. From the late 70s to the 80s, Dr Juric worked as a software engineer in the banking industry in Croatia, and was employed at Zagreb University Computing Centre on projects funded by the banking and public sectors. In 1990-2003 she taught at the South Bank University Business School in London, and in 2003 moved to the Department of Computing at the Westminster University in London. Dr Juric’s UK HE working expertise ranges from the development, validation and delivery of BSc and MSc curriculum and quality assurance, to BSc / MSc project / PhD supervisions, research management and interdisciplinary collaborative works. Dr Juric published 130+ papers, contributing towards academic conferences in various roles, delivered talks in academia and in industry and encouraged new scholars to undertake interdisciplinary research. Her research interests are in Predictive and Learning Technologies, Semantic Computing, Pervasive Computational Spaces across problem domains, Biomedical Semantic and translational informatics, e-Helath/M-health personalized healthcare, precision medicine Intelligent Engineering.
Sang Suh
DR. Sang Suh is a Regents Professor of Texas A&M University Systems and a Professor of Computer Science at the Texas A&M University-Commerce. His primary research interests are in the areas of artificial intelligence, human computer interaction, data mining, knowledge and data engineering, big data and visual analytics. He published 5 books and more than 120 research papers in those areas. Dr. Suh has over 25 years of experience in teaching at various universities and in scholarly and research activities developing practical industrial applications in the areas of data mining, big data and visual analytics with special focus on convergence in transdisciplinary areas. He has been recognized for his international leadership in building strong computational and transdisciplinary science and education community through the founding and various activities of SDPS.
Elisabetta Ronchieri
Elisabetta Ronchieri received the PhD degree in Automation, Robotics and Bioengineering from the University of Pisa, Italy, in 2007. She is a Computer Science Engineer at INFN (Istituto Nazionale di Fisica Nucleare - National Institute of Nuclear Physics) CNAF in Bologna, Italy, since 2001. In 2020 she has obtained the position of adjunct professor at the University of Bologna for the teaching courses on Machine Learning. Her research interests include the co-relation between the world of software engineering and algorithmic computing/artificial intelligence. Recently she has started to study the correlation among pollution, clinical data and other type of data for the spreading of COVID-19 virus. She collaborated on various European projects, such as DEEP-Hybrid-DataCloud, The European Middleware Initiative (EMI), SIENA, ETICS 2, ETICS, EGEE, DataGrid. Recently, she has started to collaborate on IOTwins and other National projects, like PLANET, IDataLib, ML@INFN.