Toni Cebrián
Verified Expert in Engineering
Machine Learning Developer
A rare mixture of data scientist and data engineer, Toni is able to lead projects from conception and prototyping to deploying at scale in the cloud.
Portfolio
Experience
Availability
Preferred Environment
Linux
The most amazing...
...experience has been teaching a typeclasses talk using Scala at a local Scala meetup group.
Work Experience
Consultant
Self-employed
- Ingested a bitcoin graph into a Neo4J database using Airflow to periodically crawl BigQuery tables with bitcoin transactions.
- Created asyncio web crawlers in Python to scrape websites with newsworthy content.
- Maintained and evolved an SDK in Scala and Haskell for accessing web APIs from customers using those languages.
- Created a tool for translating package addresses to different routing zones in a serverless architecture.
Data Engineering Consultant
Walletconnect
- Defined the data pipeline to ingest raw WebSocket data into a S3 data lake.
- Created the data warehouse that reads data from the data lake into a star schema in Athena. Moving data was done through DBT models.
- Created all dashboards and data definitions for exploitation of the data in the warehouse.
Full-stack Data Engineer
Greeneffort
- Defined, researched, and decided on the provider for doing OCR in invoices. Created the data pipeline that moved invoices from out systems through the OCR and finally left metadata in the DB.
- Created the whole server architecture for the frontend using Akka HTTP for the REST API and Slick for DB access. The different services were living in a GKE cluster.
- Created an ontology for mapping the Life Cycle Impact Assessment (LCIA) of different products to our internal data definitions that allowed richer queries on the impact of different products on CO2 consumption.
Semantic Web Consultant
Dow Jones and Company
- Developed the ontologies for data modelling in the area of bankruptcies in the US. The base was the Common Core Ontology and it was extended to accommodate all other concepts.
- Created a compiler that read an OWL file with a schema definition and created Scala code to manage programmatically and fully typed the concepts in that ontology.
- Implemented the Cloud Dataflow pipelines that read the firehose of articles at Dow Jones, processed them, and ingested the semantic data into the semantic data store.
Lead Data Engineer
Nansen
- Worked implementing the dbt models that populated Nansen's warehouse.
- Worked with blockchain ETL library to analyze how to ingest data from diferent blockchains into the raw data lake.
- Performed different data analyses with the graph DB TigerGraph in order to track where some ETH went in a famous scam in 2018.
Lead Data Engineer
Coinfi
- Created the ETL orchestration systems using Airflow with Composer in Google Cloud.
- Created scraping services for getting crypto data (prices, events, and news) to ingest into the platform.
- Set up dbt models to report on blockchain data publicly available in BigQuery datasets.
Head of Data Science
Stuart
- Designed the company's data warehouse using Redshift.
- Created a forecasting model for predicting drivers' login into the platform and deliveries to be served.
- Architected an event sourcing system for complex event processing.
- Deployed a route optimization algorithm for picking drivers based on route and package size.
- Created the data science team from scratch, led the hiring process, created role definitions, and established OKRs.
Chief Data Officer
Enerbyte
- Architected the infrastructure for ingesting data from IoT devices.
- Researched algorithms for energy disaggregation from a single point of measure.
- Created the data science team from scratch, leading the hiring process, role definitions, and quarterly OKRs.
Head of Data Science
Softonic
- Created a recommender system based on textual content from app reviews.
- Developed an improved search engine using machine learning and Solr.
- Created the data science team from scratch. Hired all relevant profiles and set up the OKRs and managerial tasks.
Experience
Type Classes Talk
http://github.com/tonicebrian/typeclasses-talkEducation
Master's Degree in Artificial Intelligence
Universitat Politecnica de Catalunya - Barcelona, Spain
Postgraduate Degree in Quantitative Techniques for Financial Products
Universitat Politecnica de Catalunya - Barcelona, Spain
Certifications
Cloudera Certified Hadoop Professional
Cloudera
Skills
Languages
Python, Python 3, Scala, SQL, RDF, Haskell, C++, OWL
Frameworks
Spark, Akka, Hadoop
Libraries/APIs
Spark Streaming, Pandas, NumPy, PubSubJS, Python Asyncio, TensorFlow, XGBoost, Stanford NLP, OpenAPI, Slick
Tools
Apache Airflow, Cloud Dataflow, Apache Beam, Amazon Athena, Solr, Apache Avro, Protégé, Google Kubernetes Engine (GKE), AWS Glue, BigQuery
Paradigms
Functional Programming, Data Science, Reactive Programming
Platforms
Google Cloud Platform (GCP), Apache Kafka, Linux, Kubernetes, Amazon Web Services (AWS), Blockchain
Other
Machine Learning, Akka HTTP, Data Mining, Data Engineering, Technical Leadership, Leadership, Consulting, Mentorship & Coaching, Google BigQuery, Big Data, Artificial Intelligence (AI), Crypto, NEO, Data Flows, Recommendation Systems, Word2Vec, Semantic Web, Web Scraping, Natural Language Processing (NLP), Deep Learning, Financial Modeling, Monte Carlo Simulations, Time Series, Data Build Tool (dbt), TigerGraph, Stardog, RDFox, Ontologies, OCR, Invoice Processing
Storage
Redshift, Cassandra, PostgreSQL, Google Cloud, Redis, Neo4j, Amazon S3 (AWS S3)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring