Data Engineer - Databricks (gn)

Frankfurt, Hesse, Germany and remote
Apply now
Contract type
Permanent
Years experience
3+
Business car included
No
Industry
Financial Services

Company description

Our client is a globally active asset management firm with a strong presence across institutional and private client segments. With a clear commitment to active investment management and a broad international footprint, the company serves a diverse client base spanning individuals, families, and large-scale institutional investors. Professionals joining the organisation can expect to become part of an empowering and growth-oriented culture, where individual contributions are valued and there is genuine scope to make a broader impact — both for clients and beyond.

Purpose of job

The role is responsible for designing, building and operating data products and reusable data preparation components on a Databricks-based Data and AI Platform, while acting as a technical enabler for internal platform users through expert guidance and advanced-level support. The position ensures adherence to security, privacy and regulatory requirements via a compliance-by-design approach, maintains alignment with established best practices for data pipelines and data quality, and drives continuous platform improvement through the integration of new features and standardized, reusable pipelines. Additionally, the incumbent oversees the evolution of the platform's data engineering capabilities, ensuring scalability and governance while enabling platform customers to reliably source, ingest, transform, validate and serve high-quality, compliant data for AI and ML use cases in a self-service manner.

Responsibilities

  • Develop and maintain a library of modular, reusable pipeline components — covering data intake, processing, verification and enrichment — to produce consistently structured, AI/ML-ready datasets from both structured and unstructured sources.

  • Architect and run dependable data workflows on Databricks, pulling from a wide range of internal and external origins to produce clean, validated data assets available for AI, ML and reporting purposes.

  • Govern the lifecycle of layered data assets across maturity tiers, upholding quality and timeliness standards while maintaining purpose-specific, analytics- and model-ready output datasets.

  • Build and maintain transformation workflows that derive meaningful predictive attributes from raw data, alongside a centrally managed attribute repository with clear versioning, ownership and service-level commitments.

  • Work alongside AI and ML engineers to establish and maintain data supply chains for retrieval-augmented generation systems, covering content segmentation, vector representation updates and index synchronization.

  • Embed governance controls — covering permissions, traceability, data lifecycle management, encryption and audit trails — into platform design to meet both internal policies and external regulatory requirements.

  • Introduce automated verification routines across data pipelines, establish measurable reliability and timeliness targets, and implement end-to-end observability to proactively detect and address issues.

  • Engage cross-functional stakeholders — including analytics, engineering, security and operations teams — to align on data needs, share proven patterns and foster independent platform adoption through enablement materials and reviews.

Qualifications

Required:

  • At least 3 years of hands-on experience designing and running large-scale data pipelines and data products on Databricks — batch and/or streaming — preferably in regulated or governance-heavy environments.

  • Advanced proficiency in Databricks, Spark, Python and SQL, complemented by sound software engineering practices such as CI/CD workflows and infrastructure-as-code tooling (e.g., Terraform) on Azure.

  • Solid grasp of data engineering principles relevant to AI/ML contexts, including data modelling, quality assurance, feature collaboration and reproducibility, as well as an understanding of how data characteristics influence model behaviour.

  • Strong command of modern data architecture patterns — including Lakehouse principles, layered data organisation and data product thinking — and the ability to put them into practice within a governed enterprise setting.

  • Degree in computer science or a comparable discipline.

  • Strong analytical mindset paired with structured planning, clear documentation and the ability to effectively transfer knowledge to colleagues.

Preferred:

  • Professional background in the financial sector, ideally within asset management, combined with international work experience and relevant Databricks certifications.

Benefits

  • Hybrid and flexible working arrangements

  • Company pension and long-term savings plans

  • Relocation assistance and childcare support

  • Employee share purchase programme

  • Mental health and wellbeing initiatives

  • Subsidised public transport and bicycle leasing

  • Career opportunities across the wider group

  • Self-directed learning and development resources