Andrew Collier,英国Newbury的开发者
Andrew is available for hire
Hire Andrew

Andrew Collier

Verified Expert  in Engineering

Data Scientist and Software Developer

Location
Newbury, United Kingdom
Toptal Member Since
March 17, 2016

安德鲁在做实验物理学家的时候学会了编程和数据分析技能. He now works as a data scientist. 他选择的工具是R和Python,并附带了大量的SQL. Andrew还广泛使用Docker,并与AWS和Azure合作过. 他对网络抓取有特别的热情,也是一个有成就的演讲者和培训师.

Portfolio

Unrival Limited
Python,非结构化数据分析,网络爬虫,数据分析...
Fathom Data
Linux, SQL, Python, R, Web抓取,机器学习,Bash, RStudio Shiny...
Toptal
Python, R, Git, Amazon Web Services (AWS), Docker, Web抓取...

Experience

Availability

Full-time

Preferred Environment

Bash、Linux、Git、Jupyter、Docker、Python、Amazon Web Services (AWS)、SQL、R

The most amazing...

...我开发的系统已经在南极洲自主运行了十多年.

Work Experience

Web Crawling Specialist

2020 - PRESENT
Unrival Limited
  • 为B2B营销产品开发从大型社交媒体平台提取数据的刮网器.
  • 使用抓取的数据生成HTML和PDF格式的自动报告.
  • 使用Watson api来解析和分析抓取的数据.
  • 使用Bing Maps API在抓取的数据中定位位置.
  • 开发了一个灵活的网页抓取框架,从100多家不同公司的高管页面收集数据.
Technologies: Python,非结构化数据分析,网络爬虫,数据分析, Amazon Web Services (AWS), MySQL, SQL, Bing API, Large-scale Web Crawlers, APIs, Amazon S3 (AWS S3), Data Visualization, Pandas, Algorithms, Flask, JavaScript

Founder | Data Scientist

2017 - PRESENT
Fathom Data
  • 清理、准备和分析数据:这个过程是在R和Python中完成的.
  • 用R和Python构建机器学习和深度学习模型. 许多模型随后被部署在api后面.
  • 管理一个数据科学家团队,并与客户进行协调和沟通.
  • Automated documentation. 使用R Markdown自动生成报告和演示文稿.
  • 开发和管理包:为R和Python构建和维护了许多包.
  • 准备讲座和演讲,在会议和研讨会上进行培训和演讲.
Technologies: Linux, SQL, Python, R, Web抓取,机器学习,Bash, RStudio Shiny, Data Science, Automation, ArcGIS, Geospatial Data, Technical Writing

Freelance Data Scientist

2016 - PRESENT
Toptal
  • 建立了强大的网页刮板提取数据的个人和组织从领英和销售导航.
  • 构建用于存储医疗药品数据的PostgreSQL数据库. Implemented ETL pipeline.
  • 使用Python和spaCy从LinkedIn个人资料和博客文章中提取重要信息.
Technologies: Python, R, Git, Amazon Web Services (AWS), Docker, Web抓取, Machine Learning, Bash, RStudio Shiny, Data Science, SQL, Automation, Amazon S3 (AWS S3)

Founder/Data Scientist

2008 - PRESENT
Exegetic Analytics
  • Conducted data analyses for clinical trials.
  • 开发了一套适用于印刷业的符合性分析系统.
  • Implemented a Kagi Charts indicator in MQL4.
  • 分析新闻事件对外汇交易的影响,使用从myfxbook上获取的数据.
  • 发起德班R用户组和德班数据科学聚会.
Technologies: Linux, SQL, Python, R, Web抓取,机器学习,Bash, RStudio Shiny, Data Science

Python Engineer

2023 - 2024
HumanOS
  • Designed and implemented a database. Set up on Amazon RDS.
  • 创建了一个Flask API,将数据库连接到桌面和移动应用程序.
  • 将API与第三方(WeFitter) API集成,收集可穿戴数据.
Technologies: Python, PostgreSQL, Flask, APIs, Amazon EC2, Amazon S3 (AWS S3), WebSockets, Amazon RDS

Python数据分析师和技术作家|织机教程视频

2022 - 2023
Domino Data Lab
  • 为现有和新功能创建视频和教程内容.
  • Updated and maintained documentation. Added automation to the website build.
  • 提供关于新特性的反馈和bug报告.
技术:R, Python, Pandas, Technical Writing, JavaScript

R Engineer - Shiny App

2019 - 2022
BluePath Solutions LLC.
  • 开发了多个与数据交互的Shiny应用程序.
  • 开发了一个网络爬虫来提取药品定价数据.
  • Designed and built a database using PostgreSQL; deployed on Amazon RDS.
Technologies: R, Data Science, Machine Learning, Amazon S3 (AWS S3), Data Visualization, Algorithms, Flask

Content Creator

2018 - 2019
Datacamp
  • 用Spark设计了一个关于机器学习的在线课程的内容.
  • 开发课程内容、脚本和相关材料.
  • 创建幻灯片,录制视频和音频,编辑内容.
  • 继续维护课程并回应学生提出的问题.
Technologies: Spark, Python

Senior Data Scientist

2013 - 2017
Derivco
  • Coded a game recommendation engine.
  • Developed a game/player anomaly detection system.
  • Automated routine analyses.
  • Automated report generation.
  • Initiated Data Science Working Group.
Technologies: Linux, Microsoft Excel, SQL, Python, R, Web Scraping, Bash, RStudio Shiny, Data Science, Data Visualization, Pandas

Honorary Senior Lecturer

2004 - 2015
University of KwaZulu-Natal
  • 为南极洲的实验开发了自主观测系统.
  • 将机器学习技术应用于闪电分布.
  • Mentored students in R and data analysis.
  • 在许多国际会议上提出分析结果.
  • 在国际期刊上发表研究成果.
技术:Linux, MATLAB, Octave, R, Technical Writing

{emayili}

http://github.com/datawookie/emayili
An R package for sending emails.

该包具有最小的依赖关系,并公开了用于编写和发送电子邮件的整洁API. 它有详细的文档和广泛的测试套件.

这个包也成为了许多博客文章和会议/聚会的主题.

Trundler R Package

http://github.com/datawookie/trundler
An R wrapper for the Trundler API.

Trundler是一项通过网络抓取收集零售价格数据的服务. The data are available via an API. 这个包为从R访问API提供了一组一致的函数.

Trundler Python Package

http://github.com/datawookie/trundlerpy
An R wrapper for the Trundler API.

Trundler是一项通过网络抓取收集零售价格数据的服务. The data are available via an API. 这个包为从R访问API提供了一组一致的函数.

Scientific Advisor

Supervised two Ph.D. and numerous M.Sc. theses in the field of space physics.

Languages

Python, SQL, Bash, R, Octave, c++, CSS, HTML, Sed, JavaScript

Libraries/APIs

REST API, Beautiful Soup, Bing API, ArcGIS, Pandas

Platforms

Linux, RStudio, Docker, Amazon Web Services (AWS), Amazon EC2

Other

Machine Learning, Web Scraping, Task Automation, Regular Expressions, Visualization, Statistics, Data Analysis, Artificial Intelligence (AI), Technology Consulting, Data Visualization, Technical Writing, Algorithms, Bayesian Statistics, Unstructured Data Analysis, Web Crawlers, Large-scale Web Crawlers, APIs, Geospatial Data, WebSockets, Amazon RDS

Frameworks

Selenium, Scrapy, Flask, Django, RStudio Shiny, Spark

Tools

Microsoft Excel, Jupyter, Git, MATLAB

Paradigms

Automation, Data Science

Storage

Amazon S3, MongoDB, Neo4j, MySQL, PostgreSQL

2001 - 2006

Ph.D. Degree in Space Physics

Royal Institute of Technology - Stockholm, Sweden

1996 - 1998

M.Sc. Degree in Nuclear Physics

Potchefstroom大学-南非Potchefstroom

1990 - 1993

B.Sc. (Hons) Degree in Physics & Mathematics

University of Natal - Durban, South Africa

JUNE 2006 - PRESENT

PhD

Royal Institute of Technology

Collaboration That Works

How to Work with Toptal

在数小时内,而不是数周或数月,我们的网络将为您直接匹配全球行业专家.

1

Share your needs

在与Toptal领域专家的电话中讨论您的需求并细化您的范围.
2

Choose your talent

在24小时内获得专业匹配人才的简短列表,以进行审查,面试和选择.
3

Start your risk-free talent trial

与你选择的人才一起工作,试用最多两周. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring