Data Team Lead
Role: Lead/Architect Data Engineer
Location: Remote (US based-remote anywhere in US)
Duration: Contract
Job Summary
We are looking for a results-driven Lead Data Engineer (Contractor) to architect, develop, and guide the implementation of modern data pipelines and cloud-native analytics solutions. The ideal candidate will lead end-to-end delivery across engineering, analytics, and product teams, bringing deep experience in Databricks, PySpark, and Azure cloud platforms. This role also requires strong hands-on experience in Databricks architecture, administration, and performance optimization.
Key Responsibilities
- Lead the architecture, design, and development of scalable ETL/ELT pipelines using Databricks, Pyspark, and SQL across distributed data environments.
- Architect and manage Databricks workspaces, including provisioning and maintenance of clusters, cluster policies, and job compute environments in accordance with enterprise standards.
- Collaborate with platform and infrastructure teams to define Databricks architecture strategy and ensure secure, scalable, and cost-effective implementation.
- Define and enforce cluster policies to ensure proper resource utilization, cost control, and access control based on workload patterns and team requirements.
- Lead performance tuning of Spark jobs, Databricks SQL queries, and notebooks, ensuring optimal execution and minimizing latency.
- Build modular, reusable Python libraries using Pandas, NumPy, and PySpark for scalable data processing.
- Develop optimized Databricks SQL queries and views to power Tableau dashboards.
- React and .NET-based applications.
- Ad-hoc and real-time analytics use cases.
- Work closely with frontend and backend development teams to deliver use-case-specific, query-optimized datasets.
- Leverage Unity Catalog for fine-grained access control, data lineage, and metadata governance.
- Drive DevOps best practices using Azure DevOps, Terraforms, and CI/CD automation pipelines.
- Mentor junior engineers and perform architectural reviews to ensure consistency and alignment with best practices.
Required Skills & Qualifications
- 7+ years of experience in data engineering, with a strong background in cloud-native data architecture.
- Deep hands-on experience with Databricks architecture, workspace administration, and cluster management.
- Experience defining and managing cluster policies, pools, and autoscaling strategies.
- Strong knowledge of Spark performance tuning and job optimization.
- Proven expertise in Databricks SQL, PySpark, Delta Lake, and large-scale data pipelines.
- Skilled in building reusable Python libraries with Pandas, Openpyxl, XLSXWriter, and PySpark.
- Practical experience working with Unity Catalog for security and governance.
- Strong collaboration experience with front-end/backend development teams and backend integration.
- Strong SQL expertise and hands-on experience with PostgreSQL, SQL Server, or similar.
- DevOps expertise with tools like Azure DevOps, Git, and pipeline automation.
- Excellent communication skills with the ability to lead discussions with cross-functional teams and stakeholders.
Tools & Technologies
- Cloud Platforms: Azure (preferred), AWS
- Big Data & Analytics: Databricks, PySpark, Delta Lake, Databricks SQL, Spark Connect, Delta Live Tables
- Programming & Frameworks: Python, Pandas, PySpark, Flask
- Visualization & BI: Tableau
- App Integration: React, .NET, REST APIs
- DevOps & CI/CD: Azure DevOps, Git
- Databases: Databricks SQL, Azure SQL DB, or similar