Data Engineer - Hybrid
Title: Data Engineer - Hybrid Mandatory skills: streaming data pipelines, Python, SQL, data streaming technologies, Kafka, Flink, Spark Streaming, Google Cloud Platform, GCP services, BigQuery, Dataflow, Dataproc, Pub/Sub, Vertex AI, AI Platform, MLOps pipelines, scaling ML models, ETL, ELT workflows, data quality, CI/CD, automated retraining, data engineering, design patterns, package configuration, package deployment, distributed frameworks, secure solutions Description: Data Engineer Senior Engineer Fraud and Abuse Data Engineering Design, build, and operate scalable batch and streaming data pipelines (Kafka) and ETL/ELT workflows across Hadoop and Google Cloud Platform (GCP); implement monitoring/alerting to meet reliability and SLA targets. Develop high-performance distributed processing with Python, Spark, and Hive; optimize jobs, storage, and throughput for large-scale, high-volume datasets and cost efficiency. Deliver curated, trustworthy datasets for analytics, reporting, and ML with strong data quality, lineage, and governance. Partner with data scientists to operationalize ML on GCP (e.g., Vertex AI), building MLOps pipelines for training, deployment, CI/CD, monitoring, and automated retraining. Integrate on-prem Hadoop data lakes with GCP services to enable seamless hybrid data and model workflows. Collaborate with analysts and product engineers to ensure data is accessible, high-quality, and actionable; provide technical mentorship to junior engineers. Uphold security, privacy, and regulatory compliance across all data engineering practices. Continuously evaluate technologies and design patterns, and drive improvements in performance, scalability, and cost across Hadoop and GCP environments. BA/BS or equivalent; 4 years building large-scale data systems. Proficient in core platforms; writes organized, maintainable code across multiple languages and distributed frameworks. Skilled in package configuration/deployment and building custom solutions. Designs robust tests; troubleshoots and resolves routine and non-routine issues independently. Delivers high-performance, scalable, secure solutions (high throughput/low latency). Operates effectively in Agile: communicates clearly with partners, aligns team priorities, and understands guest/business impact. Influences and applies data/engineering standards and policies; maintains expertise and stays current through ongoing learning. Technical Skills: Strong proficiency in Python and SQL. Hands-on experience with Apache Spark and the Hadoop ecosystem (HDFS, Hive, Pig, Oozie, Sqoop, YARN). Experience with real-time data streaming technologies (Kafka, Flink, Spark Streaming). Expertise with GCP services: BigQuery, Dataflow, Dataproc, Pub/Sub, Vertex AI (AI Platform). Experience building and maintaining MLOps pipelines for deploying and scaling ML models. Experience: 4 - 5 years Notes: Hybrid - Two days a week in the office VIVA USA is an equal opportunity employer and is committed to maintaining a professional working environment that is free from discrimination and unlawful harassment. The Management, contractors, and staff of VIVA USA shall respect others without regard to race, sex, religion, age, color, creed, national or ethnic origin, physical, mental or sensory disability, marital status, sexual orientation, or status as a Vietnam-era, recently separated veteran, Active war time or campaign badge veteran, Armed forces service medal veteran, or disabled veteran. Please contact us at [email protected] for any complaints, comments and suggestions. Contact Details : Account co-ordinator: Ramadas Kumaresan, Phone No: (847) 786-5904, Email: [email protected] VIVA USA INC. 3601 Algonquin Road, Suite 425 Rolling Meadows, IL 60008 [email protected] | http://www.viva-it.com