Data Management for AI

Research Focus

The Data Management for AI (DAI) research area aims to tame these datasets by providing high-level data science abstractions and developing systems that perform these tasks efficiently and are scalable. All this, while considering increasing specialization at all levels, such as hardware, software, and domain-specific applications.
Integrated data analysis pipelines

We are building an open and extensible system infrastructure for complex integrated data analytics pipelines. With this, we combine data management, machine learning pipelines (feature engineering, ML model training, debugging and scoring), and high-performance computing. The goal is to optimize hardware usage to deploy this infrastructure in related cross-enterprise projects for areas such as energy and manufacturing.

Automatic data reorganization

Our goal is to reduce the increasing redundancy in complex data science workflows. This involves combining building blocks for data preparation and data cleaning, data enrichment, feature engineering, and hyperparameter optimization and model training. Our focus is on automatic data reorganization through compression, caching, and fine-grained linear-based reuse.

Data Engineering

We simplify data engineering through a new hierarchy of primitive data types in data preparation and data cleaning for different users (machine learning experts, data scientists, domain experts). While creating not only efficient and scalable execution processes for these new data primitives, but also the basis for better decisions through analytics in business and society.


Lucas Iacono

Research Area Manager Data Management for AI