Become a Data Engineer at Ontruck
At Ontruck data engineering focuses on practical applications of data collection and analysis. They take care of the mechanisms for collecting and validating large sets of information.
What we do
- Build, maintain and evolve our data platform
- Collect data from internal and external sources
- Transform the data in usable and understandable formats
- Load this data into controlled areas allowing other teams to use
What we do NOT do
- Analyze data to provide the business team with data-driven insight
- Create or train machine learning models
- Create features in other Ontruck products
- Validate or invalidate analysis or experiment with data analysis
What the product Data Engineering createsAt Ontruck, Data Engineering exposes a data lake to the Data Analytics and Data Science teams. The data lake is meant to be a place of discovery for these teams. Since the data is raw, it takes less work for the Data Engineering team to manage, but it doesn’t eliminate data that could be useful for skilled explorers.
More broadly within Ontruck, Data Engineering exposes a data warehouse of tables that are structured to be queried quickly and only contain a subset of all the data in the lake. For Ontruck, all of our data goes through the lake before it gets to the warehouse, and only the data that we know is useful and worth cleaning gets to the warehouse.
These tables are meant to be more easily understood and allow for varying levels of access to sensitive data through different schemas.
The warehouse has been cleaned for us; it’s in tables that make sense for known use cases, and you can get answers out of it quickly.
What you will be doing day to day:
- Maintain, expand and improve our data processes and computational infrastructure. At Ontruck we use a mix of tools in constant evolution; Streamsets, Tableau, Airflow, Superset, Druid, Spark, MLeap, Tensorflow.
- Develop our core data infrastructure carrying out and reporting on proofs of concept over different tools and ideas
- Responsible of initiatives related to building, maintaining and orchestrating all the components in our data platform.
- Working with different sources of data (internals and externals) to process and transform them and make them accessible for other team
Your skills and experience:
- Degree in Computer Science or related technical field
- At least 3 years of development experience in Python (or any other object-oriented language applied to process data)
- Experience as a Data Engineer or related specialty (e.g., Software Engineer, Business Intelligence Engineer, Data Scientist) with a track record of manipulating, processing, and extracting value from large datasets
- Demonstrated strength in data modeling, scalable ETL development, and data warehousing. We use StreamSets, Airflow and Superset.
- Strong experience in optimising and performance tuning of Postgres. Hands-on experience in configuring and supporting replication and background in building large database infrastructure supporting a high volume of transactions in a high-demand environment
- Hands-on experience with container orchestration using Kubernetes and Docker
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
- Experience building data products incrementally and integrating and managing datasets from multiple sources
- Understanding of engineering best practices: write tests, use automation, build continuous integration pipelines, etc
- Experience with data streaming platforms (Spark, Kafka, Kinesis, etc.) would be an advantage!
Other Relevant information
- Opportunities for personal growth and learning, every single day.
- A flat, laid-back culture. Everybody is encouraged to participate in discussions and contribute.
- High-trust environment. We believe in giving autonomy to all our employees.
- Competitive compensation packages. We are looking for the very best talent, and will reward accordingly.
- Awesome offices in central Madrid. We are easily accessible by public transport, as well as close to public bike stations.
- Flexible schedule.
Por favor, para apuntarte a este trabajo visita www.linkedin.com.