Research Focus
Integrated data analysis pipelines
We are building an open and extensible system infrastructure for complex integrated data analytics pipelines. With this, we combine data management, machine learning pipelines (feature engineering, ML model training, debugging and scoring), and high-performance computing. The goal is to optimize hardware usage to deploy this infrastructure in related cross-enterprise projects for areas such as energy and manufacturing.
Automatic data reorganization
Our goal is to reduce the increasing redundancy in complex data science workflows. This involves combining building blocks for data preparation and data cleaning, data enrichment, feature engineering, and hyperparameter optimization and model training. Our focus is on automatic data reorganization through compression, caching, and fine-grained linear-based reuse.
Data Engineering
We simplify data engineering through a new hierarchy of primitive data types in data preparation and data cleaning for different users (machine learning experts, data scientists, domain experts). While creating not only efficient and scalable execution processes for these new data primitives, but also the basis for better decisions through analytics in business and society.
