Toni is available for hire

Toni Cebrián

Verified Expert in Engineering

Machine Learning Developer

Location

Barcelona, Spain

Toptal Member Since

February 4, 2019

A rare mixture of data scientist and data engineer, Toni is able to lead projects from conception and prototyping to deploying at scale in the cloud.

Portfolio

Self-employed

Google Cloud, Crypto, Python, Scala, Data Science, Recommendation Systems...

Walletconnect

Python 3, Apache Airflow, Data Build Tool (dbt), Amazon Athena...

Greeneffort

Scala, OpenAPI, OCR, Google Cloud, Semantic Web, Invoice Processing, Kubernetes...

Experience

Machine Learning - 10 years SQL - 10 years Functional Programming - 10 years Haskell - 10 years Data Science - 10 years Scala - 8 years Python 3 - 6 years Akka - 4 years

Availability

Part-time

Preferred Environment

Linux

The most amazing...

...experience has been teaching a typeclasses talk using Scala at a local Scala meetup group.

Work Experience

Consultant

2019 - PRESENT

Self-employed

Ingested a bitcoin graph into a Neo4J database using Airflow to periodically crawl BigQuery tables with bitcoin transactions.
Created asyncio web crawlers in Python to scrape websites with newsworthy content.
Maintained and evolved an SDK in Scala and Haskell for accessing web APIs from customers using those languages.
Created a tool for translating package addresses to different routing zones in a serverless architecture.

Technologies: Google Cloud, Crypto, Python, Scala, Data Science, Recommendation Systems, Data Engineering, Semantic Web, Machine Learning, Neo4j, TigerGraph, SQL, Akka, Haskell, Functional Programming, Pandas, NumPy, C++, Google Cloud Platform (GCP), Technical Leadership, Consulting, Mentorship & Coaching, Google BigQuery, Big Data, PostgreSQL

Data Engineering Consultant

2022 - 2023

Walletconnect

Defined the data pipeline to ingest raw WebSocket data into a S3 data lake.
Created the data warehouse that reads data from the data lake into a star schema in Athena. Moving data was done through DBT models.
Created all dashboards and data definitions for exploitation of the data in the warehouse.

Technologies: Python 3, Apache Airflow, Data Build Tool (dbt), Amazon Athena, Amazon S3 (AWS S3), Amazon Web Services (AWS), AWS Glue, SQL, Functional Programming, NumPy, Pandas, Technical Leadership, Consulting, Mentorship & Coaching, Big Data, PostgreSQL

Full-stack Data Engineer

2021 - 2022

Greeneffort

Defined, researched, and decided on the provider for doing OCR in invoices. Created the data pipeline that moved invoices from out systems through the OCR and finally left metadata in the DB.
Created the whole server architecture for the frontend using Akka HTTP for the REST API and Slick for DB access. The different services were living in a GKE cluster.
Created an ontology for mapping the Life Cycle Impact Assessment (LCIA) of different products to our internal data definitions that allowed richer queries on the impact of different products on CO2 consumption.

Technologies: Scala, OpenAPI, OCR, Google Cloud, Semantic Web, Invoice Processing, Kubernetes, Google Kubernetes Engine (GKE), Akka HTTP, Slick, Akka, Functional Programming, Google Cloud Platform (GCP), Technical Leadership, Consulting, Mentorship & Coaching, PostgreSQL

Semantic Web Consultant

2019 - 2021

Dow Jones and Company

Developed the ontologies for data modelling in the area of bankruptcies in the US. The base was the Common Core Ontology and it was extended to accommodate all other concepts.
Created a compiler that read an OWL file with a schema definition and created Scala code to manage programmatically and fully typed the concepts in that ontology.
Implemented the Cloud Dataflow pipelines that read the firehose of articles at Dow Jones, processed them, and ingested the semantic data into the semantic data store.

Technologies: RDF, Stardog, Scala, RDFox, Protégé, Ontologies, OWL, Cloud Dataflow, Stanford NLP, Natural Language Processing (NLP), Functional Programming, Google Cloud Platform (GCP), Technical Leadership, Consulting, Mentorship & Coaching, Google BigQuery, Big Data

Lead Data Engineer

2018 - 2019

Nansen

Worked implementing the dbt models that populated Nansen's warehouse.
Worked with blockchain ETL library to analyze how to ingest data from diferent blockchains into the raw data lake.
Performed different data analyses with the graph DB TigerGraph in order to track where some ETH went in a famous scam in 2018.

Technologies: Python 3, Data Build Tool (dbt), BigQuery, Google Cloud, Blockchain, TigerGraph, Technical Leadership, Consulting, Google BigQuery, Big Data

Lead Data Engineer

2018 - 2018

Coinfi

Created the ETL orchestration systems using Airflow with Composer in Google Cloud.
Created scraping services for getting crypto data (prices, events, and news) to ingest into the platform.
Set up dbt models to report on blockchain data publicly available in BigQuery datasets.

Technologies: PubSubJS, Data Flows, Apache Beam, Python, Apache Airflow, Data Science, Recommendation Systems, Data Engineering, Web Scraping, Data Build Tool (dbt), SQL, Google Cloud Platform (GCP), Technical Leadership, Consulting

Head of Data Science

2016 - 2018

Stuart

Designed the company's data warehouse using Redshift.
Created a forecasting model for predicting drivers' login into the platform and deliveries to be served.
Architected an event sourcing system for complex event processing.
Deployed a route optimization algorithm for picking drivers based on route and package size.
Created the data science team from scratch, led the hiring process, created role definitions, and established OKRs.

Technologies: Akka, Redshift, Apache Kafka, Apache Airflow, Scala, Python, Data Science, Machine Learning, Data Engineering, Artificial Intelligence (AI), Natural Language Processing (NLP), Functional Programming, Pandas, NumPy, Amazon Web Services (AWS), Technical Leadership, Leadership, Mentorship & Coaching

Chief Data Officer

2014 - 2016

Enerbyte

Architected the infrastructure for ingesting data from IoT devices.
Researched algorithms for energy disaggregation from a single point of measure.
Created the data science team from scratch, leading the hiring process, role definitions, and quarterly OKRs.

Technologies: Apache Kafka, Spark Streaming, Spark, Scala, Python, Data Science, Machine Learning, Data Engineering, Artificial Intelligence (AI), Natural Language Processing (NLP), Time Series, Akka, Functional Programming, Pandas, NumPy, Technical Leadership, Leadership, Mentorship & Coaching

Head of Data Science

2012 - 2014

Softonic

Created a recommender system based on textual content from app reviews.
Developed an improved search engine using machine learning and Solr.
Created the data science team from scratch. Hired all relevant profiles and set up the OKRs and managerial tasks.

Technologies: Semantic Web, RDF, Word2Vec, Solr, Recommendation Systems, Spark, Hadoop, Scala, Python, Data Science, Machine Learning, Data Engineering, Artificial Intelligence (AI), Natural Language Processing (NLP), Functional Programming, Data Mining, Pandas, NumPy, Technical Leadership, Leadership, Mentorship & Coaching

Experience

Type Classes Talk

http://github.com/tonicebrian/typeclasses-talk

At my local Scala meetup, I taught type classes using Scala. We moved gradually, building from basic intuitions and metaphors to category concepts like functors, applicatives, monads, and other beasts.

Education

2009 - 2012

Master's Degree in Artificial Intelligence

Universitat Politecnica de Catalunya - Barcelona, Spain

2009 - 2011

Postgraduate Degree in Quantitative Techniques for Financial Products

Universitat Politecnica de Catalunya - Barcelona, Spain

Certifications

MAY 2012 - PRESENT

Cloudera Certified Hadoop Professional

Cloudera

Skills

Languages

Python, Python 3, Scala, SQL, RDF, Haskell, C++, OWL

Frameworks

Spark, Akka, Hadoop

Libraries/APIs

Spark Streaming, Pandas, NumPy, PubSubJS, Python Asyncio, TensorFlow, XGBoost, Stanford NLP, OpenAPI, Slick

Tools

Apache Airflow, Cloud Dataflow, Apache Beam, Amazon Athena, Solr, Apache Avro, Protégé, Google Kubernetes Engine (GKE), AWS Glue, BigQuery

Paradigms

Functional Programming, Data Science, Reactive Programming

Platforms

Google Cloud Platform (GCP), Apache Kafka, Linux, Kubernetes, Amazon Web Services (AWS), Blockchain

Other

Machine Learning, Akka HTTP, Data Mining, Data Engineering, Technical Leadership, Leadership, Consulting, Mentorship & Coaching, Google BigQuery, Big Data, Artificial Intelligence (AI), Crypto, NEO, Data Flows, Recommendation Systems, Word2Vec, Semantic Web, Web Scraping, Natural Language Processing (NLP), Deep Learning, Financial Modeling, Monte Carlo Simulations, Time Series, Data Build Tool (dbt), TigerGraph, Stardog, RDFox, Ontologies, OCR, Invoice Processing

Storage

Redshift, Cassandra, PostgreSQL, Google Cloud, Redis, Neo4j, Amazon S3 (AWS S3)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring