Full-Time Data Engineer
SoulCycle is seeking a Data Engineer who is passionate about working on teams that solve interesting, large-scale problems at a rapid pace. This role will contribute heavily to the evolution of SoulCycle’s next-generation data platform that empowers our people and our ever-expanding set of product offerings. As a member of our Data Engineering team, you will be in a position to have a direct, lasting impact everywhere that data empowers our business.
Ideal candidates for this position should possess a keen mind for solving tough problems with the ideal solution, partnering effectively with various team members and stakeholders along the way. They should be personable, efficient, flexible and communicative, and have a passion and love for what they do. As a Data Engineer here at SoulCycle, you’ll report to the Director of Data. You will integrate data from different sources from around the company into our data lake and warehouse that serves as a single source of truth for our reporting and analytics needs. You will also need to collaborate with our Product Management, Engineering, Marketing, Retail, Finance, and DevOps along the way.
ROLES AND RESPONSIBILITIES:
Building ETL pipelines using Python, Spark and Apache Airflow.
Management of our data lake, data warehouse and Airflow instance.
Build tools to manage, automate and monitor our data and data processing infrastructure.
Work closely with cross functional agile teams to leverage our data in creating rich data-driven customer experiences.
Working with business stakeholders across the company to understand their data and reporting needs.
Assessing the value and quality of new data sources.
Extending and improving on our ETL framework as needed
Extensive Python experience writing and debugging code (Python 3)
Experience developing jobs in Apache Spark (PySpark)
Experience in creating ETL pipelines, preferably with Apache Airflow
Effective written and oral communication skills when interacting with both technical can non-technical staff.
Deep understanding of SQL, dimensional/relational databases and data access patterns.
Ability to work with structured, semi-structured and unstructured data sources.
Significant experience in modeling information for both relational, dimensional, and nosql datastores.
Experience with Google Cloud (GCP) specifically BigQuery and DataProc.
Experience with unit testing.
Understanding of Docker and CI/CD deployment pipelines.
Detail-oriented and able to enforce consistency in naming conventions, standards and reporting nomenclature.
Experience in successfully implementing machine learning frameworks like Spark ML/MLLib or TensorFlow. A plus.
Active member of the open source or New York tech community. A plus.
Personal passion for fitness and/or the SoulCycle brand. A plus.
How to ApplyPlease apply via https://boards.greenhouse.io/soulcycle/jobs/4422045002
57 total views, 1 today