Data Engineer
Pyyne Digital
Let's build Pyyne together!
Are you excited by the idea of working on cutting-edge technology while shaping the direction of the company you’re part of? At Pyyne, you’ll do both. We're seeking a motivated colleague to join our consultant team who is eager to take ownership of their projects and actively contribute to building an international, people-first tech consultancy.
About the project:
As a consultant at Pyyne, you’ll be embedded in client teams, working on-site or in hybrid setups to deliver high-quality solutions. You’ll have full ownership of your projects and be trusted to collaborate directly with stakeholders to design and implement robust systems.
For this first project, we are looking for an engineer to architect and build a new data lake for a MedTech education company. This project involves consolidating data from two distinct PostgreSQL databases (one legacy, one modern) and integrating new pipelines for third-party data. The resulting data lake is required to support future GenAI/ML model development, LLM-generated content, and platform-wide data analysis for trends and projections.
Responsibilities
- Design, architect, and implement scalable data pipelines and data lake solutions utilizing Python.
- Build and manage data infrastructure using core AWS services (e.g., S3, Glue, Lambda, Redshift/Athena).
- Ensure high standards of code quality, system performance, and reliability, adhering to data engineering best practices.
Requirements :
Design, build, and maintain scalable ETL/ELT pipelines using tools such as:
- Apache Airflow, dbt, Prefect, Dagster
- AWS Glue, Azure Data Factory, or GCP Dataflow
- Data ingestion from multiple sources (APIs, databases, streaming data, flat files, etc.)
- Transformations and enrichment using SQL and/or Spark, ensuring data quality and lineage.
- Schema design for raw, processed, and curated layers (e.g. Bronze/Silver/Gold architecture).
- Performance tuning for ingestion and transformation jobs (partitioning, indexing, parallelization)
- Hands-on experience building and maintaining data lakes, preferably in AWS
Experience integrating data lakes with data warehouses, such as:
- Snowflake, BigQuery, Redshift, Synapse, Databricks Lakehouse.
- Building data models optimized for analytics
- Proficient in SQL
- Knowledge of metadata management, lineage tracking, and data privacy regulations (GDPR, CCPA)
- Implementation of data quality checks and validation frameworks.
- Familiarity with data cataloging tools
- Solid experience with AWS
- Understanding of cost optimization and security best practices for data storage and movement
Academic Requirements:
A Bachelor's in Computer Science, Engineering, or a related field
Languages:
English - Must Have





