Data Engineer
Pyyne Digital
Let’s build Pyyne together
Are you interested in working on complex, high-impact data projects while actively contributing to the growth of an international tech consultancy? At Pyyne, you combine hands-on technical work with real influence over how the company evolves.
We are looking for a Data Engineer to join our team and work from Porto, collaborating with our team in Brazil, Argentina, the United States, and Sweden
About the role
As a consultant at Pyyne, you will integrate into client teams, working to deliver high-quality solutions. You will have the opportunity to collaborate directly with stakeholders to design and implement robust systems. We are not just building exceptional software; we are also cultivating a great company culture. Your ideas, voice, and initiative will play a crucial role in shaping how we work.
About the first project
Your initial assignment will focus on designing and implementing a Customer Master Data Management (MDM) solution for a large organization operating at scale.
The project involves consolidating customer data from legacy relational databases and partner- and business-maintained file-based sources into a modern Azure Databricks Lakehouse architecture
The core challenge is not only data migration, but data standardization and consolidation. You will design logic to reconcile inconsistent customer records into a reliable Golden Record, enabling downstream integration with CRM and analytical platforms
Responsibilities
- Design and implement automated data ingestion pipelines using Azure Databricks and PySpark, capable of handling heterogeneous data sources.
- Develop entity resolution and fuzzy matching logic (e.g. names, addresses, identifiers) to reduce duplication and improve data consistency.
- Implement Golden Record creation logic, including exception handling and manual review workflows
- Define and enforce data quality and validation rules, ensuring traceability and auditability
- Collaborate with downstream system owners (CRM, analytics, reporting) to ensure clean master data consumption
- Document architecture, pipelines, and key technical decisions to support knowledge transfer
Requirements
- Professional experience as a Data Engineer, working with production-grade data pipelines
- Strong hands-on experience with Apache Spark (PySpark) and modern data processing frameworks
- Experience working with Azure Databricks or similar cloud-based data platforms
- Solid understanding of data modeling, SQL, and large-scale data transformation
- Exposure to data quality, deduplication, or record matching challenges
- Ability to work independently, take ownership of technical decisions, and communicate clearly with stakeholders
Nice to have
- Experience with Master Data Management (MDM) or similar data consolidation initiatives
- Familiarity with enterprise data warehouses or analytical platforms
- Experience working in regulated or data-sensitive environments
- Background in projects involving multi-source or multi-region data
Academic background
- Degree in Computer Science, Engineering, or a related field, or equivalent professional experience
Languages:
English – required





