Data Engineer

  • Led my team’s Data Quality Initiative. Created best practices and guided implementation of automated processes for monitoring data quality. Automated identification of data quality issues monthly across 15+ datasets.
  • Decreased runtime for an insurance re-rating process from 4 days to 1.5 hours by distributing the workload across multiple Kubernetes pods orchestrated with Airflow.
  • Developed a fully automated ETL pipeline with Python, SQL, Airflow and GitLab CI/CD to produce a monthly product analysis dataset with 600+ attributes.
  • Built a process for ad-hoc AWS Athena to GCP BigQuery data transfers using Airflow. Transferred terabytes of data using this process.