Data Science

Data Science covers all the work carried out on the semi-automated or automated generation of information from raw and potentially unknown and/or heterogeneous data flows, as well as the different forms of visualization that can be made of them. Bringing out situational models leading to decision making.

Scientific objectives

Nevertheless, Data Science presents objectives that directly enrich the other scientific disciplines in all the areas addressed:

  1. Automated detection of opportunities or threats, according to the applied research axis, and real-time modeling of their context of realization, thanks to data from sensors or social networks and enriched by external knowledge bases.
  2. The complete integration of data within business processes through a dual approach encompassing the generation and consumption of data during the process and its adaptation, or even its construction on the run accordingly.
  3. Combining and generalizing the previous objectives in a broader way, the aim is to offer a real-time decision support framework based on the feedback of data that have become easily accessible but are still too little exploited in the current practice.

Data Science: 4 "business" levels studied

  • Data collection and cleaning systematically associated in the data science approach

Internet of Things, deployment of event-driven architecture (EDA) including loose publish/subscribe coupling between data sources and processing services as well as machine learning and statistical methods in regards to data cleansing.

  • Data aggregation

Use of supervised or unsupervised machine learning methods as well as semantic analysis tools related to knowledge bases engineering (ontologies, taxonomies).

  • Interpretation of data in order to obtain informational models that can then be used to support decision-making processes.

It is itself based on knowledge engineering techniques, via the use of meta-models and business knowledge bases (in the form of ontology or graph-type databases), and is ensured through the development, on the one hand, of rule-based systems (in an EDA framework associating a module of the "complex event processing" type) and, on the other hand, of machine learning algorithms. All supervised and unsupervised techniques (classification, clustering, association and regression) are used and studies on natural language processing (NLP) are carried out.

  • Data visualization simultaneously supporting human decision-making

Use of libraries and tools for real-time data visualization. In particular, the adequacy between the visualization format and the user's needs in terms of decision support is paramount. The data collected and interpreted can be either textual or digital, used independently or in combination. This scientific discipline is part of the "Reasoning on Data" RoD theme, common to the MaDICS (Mass of Data, Information and Knowledge in Science) and AI
(Artificial Intelligence) projects.