From Talend to Python & Spark, without months of manual work
.png)
Our client needed to migrate approximately 300 data pipeline jobs from Talend, a proprietary ETL platform, to a modern Python and Spark-based stack. Manual migration would have required months of repetitive, error-prone work. We built NextGen, an AI-assisted migration tool that performs SQL-centric extraction and Java code simplification.
Implementation time:
2-3 months
Talend jobs processed
estimated reduction in migration time
size of all jobs
Talend migrations needed a scalable alternative to manual rewrites.
Talend’s visual jobs compile into massive Java files with embedded SQL and complex variable substitutions, making the real business logic hard to inspect and extract.
Reverse-engineering each job by hand requires hours of repetitive work per pipeline, turning large migrations into months of error-prone effort.
At hundreds of jobs, small inconsistencies compound into operational risk, higher costs, and long delivery timelines that are difficult to justify or maintain.
We built NextGen, an AI-assisted migration tool that supports two complementary migration approaches: SQL-centric extraction and Java code logic simplification. Together, they automate the most time-consuming parts of the process while keeping engineers in control of validation and integration
Approach 1 – SQL-Centric Migration
In the first approach, NextGen parses Talend XML files to extract embedded SQL queries. It cleans and normalizes the SQL, resolves variable substitutions, and sends only compact, high-signal SQL to Azure OpenAI for translation into Spark SQL and PySpark.
By avoiding raw Talend exports and auto-generated Java noise, this approach reduces prompt size, improves translation accuracy, and keeps LLM usage cost-effective.
Approach 2 – Java Logic Reduction & Translation
In the second approach, NextGen operates directly on the Talend-generated Java code. Instead of translating it as-is, the tool programmatically strips away redundant constructs and non-essential code, while preserving the original execution logic and data flow.
The resulting simplified Java representation is then sent to the LLM for translation into Python and Spark constructs.
Meet the team of this project
.jpg)
Adriano Campinho
Data Scientist

Inês Ferreira
Senior Data Scientist
NextGen transformed a months-long manual migration into an automated, repeatable pipeline. While human review is still required, the most time-consuming parts of the process were eliminated.
Across nearly 300 jobs, the tool reduced overall migration time by approximately 30%, improved consistency across pipelines, and allowed teams to focus on higher-value engineering tasks instead of repetitive rewrites.
Meet part of the team of this project
.jpg)
Adriano Campinho
Data Scientist

Inês Ferreira
Senior Data Scientist
Discover how AI can transform your customer support into a faster, more consistent, and cost-efficient operation.
Partners

Awards

.png)