Automating Large-Scale ETL Migration with AI

Our client needed to migrate approximately 300 data pipeline jobs from Talend, a proprietary ETL platform, to a modern Python and Spark-based stack. Manual migration would have required months of repetitive, error-prone work. We built NextGen, an AI-assisted migration tool that performs SQL-centric extraction and Java code simplification.

implementation time:
2-3 months
Technologies:
Machine Learning
industry:
Banking, Financial Services, and Insurance
team in this project:
Adriano Campinho
Data Scientist
Inês Ferreira
Senior Data Scientist

We operationalize data to deliver measurable impact

300
talend jobs processed
30%
estimated reduction in migration time
0.4GB
size of all jobs

The Opportunity

Opaque, auto-generated pipelines

Talend’s visual jobs compile into massive Java files with embedded SQL and complex variable substitutions, making the real business logic hard to inspect and extract.

Manual migration doesn’t scale

Reverse-engineering each job by hand requires hours of repetitive work per pipeline, turning large migrations into months of error-prone effort.

High risk at enterprise scale

At hundreds of jobs, small inconsistencies compound into operational risk, higher costs, and long delivery timelines that are difficult to justify or maintain.

The Solution

We built NextGen, an AI-assisted migration tool that supports two complementary migration approaches: SQL-centric extraction and Java code logic simplification. Together, they automate the most time-consuming parts of the process while keeping engineers in control of validation and integration

Approach 1 – SQL-Centric Migration

In the first approach, NextGen parses Talend XML files to extract embedded SQL queries. It cleans and normalizes the SQL, resolves variable substitutions, and sends only compact, high-signal SQL to Azure OpenAI for translation into Spark SQL and PySpark.

By avoiding raw Talend exports and auto-generated Java noise, this approach reduces prompt size, improves translation accuracy, and keeps LLM usage cost-effective.

Approach 2 – Java Logic Reduction & Translation

In the second approach, NextGen operates directly on the Talend-generated Java code. Instead of translating it as-is, the tool programmatically strips away redundant constructs and non-essential code, while preserving the original execution logic and data flow.

The resulting simplified Java representation is then sent to the LLM for translation into Python and Spark constructs.

The Impact

NextGen transformed a months-long manual migration into an automated, repeatable pipeline. While human review is still required, the most time-consuming parts of the process were eliminated.

Across nearly 300 jobs, the tool reduced overall migration time by approximately 30%, improved consistency across pipelines, and allowed teams to focus on higher-value engineering tasks instead of repetitive rewrites.

300
talend jobs processed
30%
estimated reduction in migration time
0.4GB
size of all jobs

A word from our customers

Real enterprises solving real problems with AI systems built for reliability, transparency, and scale.

"Lorem ipsum dolor ementum tristique. Duis cursus, mi quis viverra ornare."
Generic placeholder image
Name Surname
Position, Company name
"Lorem ipsum dolor sit amet, consectetur aros elementum tristique. Duis cursus, mi quis viverra ornare."
Generic placeholder image
Name Surname
Position, Company name

"From day one, the DareData team earned our trust through outstanding communication and responsiveness."

Generic placeholder image
Head of Al Tech Lab @ Euronext

”We were very pleased with the training. The materials were adjusted to our needs and, in the end, we could take home some ideas that we could apply to our business.

Generic placeholder image
Data Coordinator @ Worten

“DareData Engineering has the resilience to make the effort in improving our development and production processes.”

Generic placeholder image
Lead Data Manager @ NOS Comunicações

"Their ability to bring clarity to the application of models in practice is amazing."

Generic placeholder image
Revenue & Margin Growth Manager @ Heineken
TRUSTED BY THE WORLDS LARGEST ENTERPRISES