Beschreibung
The Cluster of Excellence "Machine Learning - New Perspectives for Science" at the University of Tübingen offers a position as
Research Data Steward / Data Architect
(m/f/d, E13 TV-L, 100%)
The position is available in the team of the Machine Learning Science Cloud and runs until
31st December 2032.
Help us build a modern HPC architecture for training large-scale scientific research and foundation models. The Machine Learning Science Cloud is part of the AI/ML compute ecosystem in Tübingen...
weiter lesen
The Cluster of Excellence "Machine Learning - New Perspectives for Science" at the University of Tübingen offers a position as
Research Data Steward / Data Architect
(m/f/d, E13 TV-L, 100%)
The position is available in the team of the Machine Learning Science Cloud and runs until
31st December 2032.
Help us build a modern HPC architecture for training large-scale scientific research and foundation models. The Machine Learning Science Cloud is part of the AI/ML compute ecosystem in Tübingen. Our users work on diverse research and transfer projects ranging from generative climate science to large language models. This involves large, structured, and occasionally sensitive databases for training and benchmarking.
As part of a motivated team, you will work and communicate with the users and help to efficiently scale our largest machine learning experiments. You will enable an ambitious research agenda through a FAIR data strategy for the cluster-s research projects, covering the entire pipeline from project planning, data curation, data storage, building and maintaining distributed data loading pipelines, metadata description, and documentation, to archiving and subsequent use of the data. You will also interact with other entities involved in data management at the University.
Develop storage strategies and data lifecycle management (hot/warm/cold).
Design and implement data pipelines for research datasets on HPC infrastructure.
Establish data governance policies and quality standards.
Create and maintain dataset documentation and metadata schemas.
Advise researchers on FAIR principles and data management plans.
Coordinate with legal/compliance on data protection requirements.
Work closely with users to support scalable AI experiments through stable, accessible, and high-performance data infrastructure.
Master-s degree in information technology, applied computer science or computer engineering (or comparable degree).
Experience with data engineering.
Familiarity with research data management and FAIR principles.
Experience with HPC clusters.
Experience with Linux OS (e.g. SLES/RHEL/CentOS/Ubuntu etc.).
Experience with data pipelines/data streaming.
Knowledge of relevant file formats (e.g. HDF5).
Independent, result driven work, demonstrating ownership and accountability.
Proficiency in English. German is helpful but not required.
Relevant experience in some of the following technologies
Advanced shell scripting skills.
Experience with Storage/Databases (SQL/object storage) and parallel file systems like GPFS/Lustre/Ceph/BeeGFS/Weka.
Experience with automation tools for configuration management (e.g. Ansible, Puppet, Chef) and revision control systems (e.g. Git).
Experience with containers (Docker/Singularity/Podman/Kubernetes).
Experience with Ethernet, InfiniBand, RDMA network technologies.
CPU/GPU/memory/RAID/storage/Data Center technologies.
Knowledge of current technological developments/trends in area of expertise.
Exciting tasks in a dedicated, international team who are fully committed to ambitious research agenda.
Access to modern HPC systems and hardware.
A vibrant working environment with more than 200 international researchers from all over the world.
Family-friendly working environment.