Led my team’s Data Quality Initiative. Created best practices and guided implementation of automated processes for monitoring data quality. Automated identification of data quality issues monthly across 15+ datasets.
Decreased runtime for an insurance re-rating process from 4 days to 1.5 hours by distributing the workload across multiple Kubernetes pods orchestrated with Airflow.
Developed a fully automated ETL pipeline with Python, SQL, Airflow and GitLab CI/CD to produce a monthly product analysis dataset with 600+ attributes.
Built a process for ad-hoc AWS Athena to GCP BigQuery data transfers using Airflow. Transferred terabytes of data using this process.