Hiring: Machine Learning Engineer vs. Data Engineer vs. Data Scientist
Apr 227 min read
Co-founder and CEO of Ideamotive. Entrepreneur, mentor and startup advisor.
The big data explosion from the early days of the XXI century was only a humble prelude to the breathtaking effects of the machine learning revolution. After the computers could have been used to enhance the human ability to gain information from large amounts of data and the tools to store, process and analyze mind-boggling volumes of data matured, the technology that automates this process and supports machine-powered data harvesting emerged.
The era of artificial intelligence has officially started and the technology itself is becoming prevalent. According to the IDC analysis, artificial intelligence technology has come into the business mainstream because it allows organizations to process an increasingly large amount of data in a feasible time. The prediction shows that the worldwide data will grow 61% to 175 zettabytes by 2025.
To show how big a “Zetta” thing is, just think about these three:
The mass of Earth’s atmosphere is about 5 zettagrams
There are 1.369 zettalitres of water in Earth oceans
The diameter of the Milky Way Galaxy is between 0.9 and 1.7 zettametres.
And now just think, that there are to be over 175 zettabytes of data out there in the cloud, hard drives and any other components of the World Wide Web. The greater the stockpile, the messier it gets and the need for people skilled in harvesting insights from there.
What do machine learning engineers, data engineers, and data scientists do in their daily jobs?
These three functions are closely related to data and analysis, but they are far from being the same. Also, it is not that common to share the responsibilities between them, while usually, they need to cooperate to efficiently deliver the project.
Who is a data engineer?
Data engineer seems to be overshadowed by the data scientist when thinking about the data-related jobs. So here is a short data engineer job description:
Sometimes referred to as a business intelligence architect, the specialist is responsible for delivering the infrastructure and framework to process the data. The person who specializes in this field needs a deep understanding of the business intelligence tools and manages the database in order to ensure easy access to the desired information contained in the database.
A data engineer is responsible for collecting, moving, storing and data preparation for further usage. Thus, he is not directly involved in the process of building artificial intelligence, but his work is essential for any data science or machine learning process to be conducted. When speaking about AI-readiness in the organization, it is usually about the job that needs to be done by the data engineer.
Also, the infrastructure and architecture designed by the data engineer is a solid foundation to build non-AI-powered systems like data visualizations and triggered process automation. Last but not least, data engineer ensures that there are no “dark data” that company stores bur nobody knows what is this information related to and what they refer.
What tools of the trade do data engineers have?
Data engineers and people whose work resembled that before the term was established used mostly SQL queries, but with the big data explosion databases evolved. Now, the data engineer role requires experience with:
Cloud services like Amazon Web Services or Google Cloud
Knowledge in Java and/or Scala is an advantage
Also understanding the database architecture concepts like data modeling and data warehousing is also an asset.
The list above is only an example of basic skills and can be further enhanced with additional tools and frameworks like Hadoop, Kafka, Spark or Tableau and ElasticSearch. So a data engineer working for the big IT or finance company will be a shock trooper when compared to the data engineer from a small startup, mostly due to the vast amount of data to process.
What is the role of a data engineer in the project?
First of all, a data engineer is required not only in artificial intelligence-related projects and is essential for every company that needs to process the data. This specialist enters the field as first. His role is to design the infrastructure that delivers the performance to the project and enables the company to effectively process or store the data to use in future projects.
When delivering big data-based projects that’s not an easy task - one needs to design data pipelines and streams connecting multiple sources into the one data warehouse. Also, one needs to build a system that is unified and enables multiple tools to process what is delivered, no matter how different these data would be.
It is especially challenging when the project involves totally different types of data to process and combine to produce a meaningful outcome. Just think about the app that enables one to recognize a bird by taking a photograph or recording the vocalization. The database needs to store different types of data to train the neural network, be able to find the links between various features and deliver the outcome. Possibilities to extrapolate this example on other industries are countless, be that healthcare, social care, public sector or manufacturing.
So if the app is a music performance, a data engineer is the team that delivers the sound system and makes sure that everything is ready to deliver the performance - a roadie maybe - that makes them crucial in delivering the project while not being at the spotlight.
Who is a data scientist?
A data scientist is a rare beast, named the hottest profession of 2019 but with a long tradition of being the specialist companies look for the most. Harvard Business Review described this role as “the sexiest job of 21st century” and IBM published a study in 2017 that predicted that the world needs additional 28% data scientists by 2020 to meet the demand.
The data scientist is part mathematician, part computer scientist and part the data analyst. While a data engineer is responsible for delivering the architecture and infrastructure of the solution, data scientist delivers the soul of it - basically the model or algorithm that enables the data to be used in a sensible way or to automate the task. Also, the data scientist delivers the model to automatically scrape and order the data in the dataset.
The specialist is responsible for data annotation and training the models that will perform the desired task. Depending on the company and the goal to achieve, the training can be done in a standard way or would need some other, more sophisticated approach. Thus, training the model can take various amounts of time, starting from a day and ending with weeks of constant computing.
What tools of the trade do data scientists have?
Currently, nearly all the data science is done with Python, one of the most versatile and flexible programming languages with a strong emphasis put on data processing and statistical analysis. So:
The data scientist needs to be a python-ninja, or at least have a good deal of knowledge about it.
Also, data scientists frequently read research papers and analysis to stay on the cutting edge and keep their knowledge about the AI development fresh. Many top-notch models or components are available as a part of the research, so keeping the knowledge fresh is also sensible from the business point of view - it reduces the time of delivering and the cost.
It is also welcome if a data scientist has a solid mathematical background, with many specialists having a degree in maths, physics, chemistry or other science fields.
Last but not least, a data scientist needs to be cunning and resilient, especially when he or she faces a frustrating moment when the model is not delivering the desired results.
Apart from their skills, data scientists need a tremendous amount of computing power to train the model. According to the OpenAI, delivering the cutting-edge model can cost up to $245,000. So basically, to get the results from data scientist’s work one needs not only to find the specialist and pay his or her wage but also to provide enough computing power. And that’s an additional, not-that-small cost to take in count.
What is the role of a data scientist in a project?
Where data engineer is a roadie, a data scientist is a conductor - and that’s why these specialists receive much more spotlight than data engineers.
The latter delivers the infrastructure and the architecture that enables the model to work properly and prepares the data to be used in an actionable way. Data scientist delivers the brain - the model and the neural network that powers the solution.
So basically there is neither machine learning nor artificial intelligence without the data scientist. But it would be highly unfair to say that only data scientists are behind the AI-related projects.
Who is a machine learning engineer?
The last one is both the simplest and the most complicated role to describe. The specialist combines the skills of data engineer and data scientist in the way the DevOps combines coding with administration skills. He or she is a jack-of-all-trades in machine learning and sometimes can deliver a project single-handedly.
But in fact, that’s not the purpose the machine learning engineer is employed. The specialist is responsible for the implementation - he or she gets the architecture and the logic of the system, the model designed by the data scientist and needs to combine these two to deliver business results.
What tools of the trade do machine learning engineers have?
Again - they need to be skilled data engineers and data scientists, so they need to be skilled in both roles. One needs to know frameworks, programming and cloud tools as well as training the neural networks and evaluating the models.
Where the data engineer is a roadie and the data scientist is a conductor, the machine learning engineer is a solo guitarist who goes to the club to perform - usually tweaking the sound system to his or her needs and later giving the performance, sometimes slightly tweaking the playlist to better fit the audience needs.
What is the role of a machine learning engineer in a project?
The answer is highly dependant on the ongoing project. Machine learning engineer stands in between the data science and data engineering, thus able to support and play both roles. Also, the deep understanding of the matter enables one to deliver the unique insight that can be used to avoid some mistakes in an early stage, to make the whole solution more stable or reliable.
Machine learning engineers can be also responsible for tweaking and polishing the model delivered by the data scientist to make it fit the project. It is not that uncommon for a data scientist to deliver a proof of concept or a high-level model that works - and that’s all. The role of the machine learning engineer is to make this work actually usable and suitable for the project. That’s why it is so important to have a data scientist skills at this position - one wouldn’t be able to tweak the algorithm to make it more suitable for the project, infrastructure or the solution.
How much do they make?
Of course, there is no universal and always-relevant answer on how much one makes. It depends on factors like one’s skills and achievements, the company’s size, market and resources, and one’s preferences. There are companies that try hard to attract a data scientist or a data engineer with wagons of cash with no visible success, while many specialists would probably work for few months for free only to be able to cooperate with guys and girls from Deepmind or Google Brain.
Various estimations show that the specialist with 2-5 years of experience earns:
Data engineer: $120k
Data scientist: $110k
Machine learning engineer: $140k
Data scientist earns the lowest because he or she is the least independent. The data engineer can deliver significant advantages for the company by designing the data architecture and the application logic. The machine learning engineer can do the same and deliver the AI model as a boon. So when thinking about data science vs. data engineering - the latter is usually a better pick.
The data scientist. being brilliant, designs only the model, that is neither usable without the architecture, nor ready to deploy - but it doesn’t mean the data scientist is obsolete, not at all. Being focused on machine learning development and neural networks, he or she usually has the deepest knowledge on the matter.
The machine learning-related roles are a fresh addition to the landscape of computer engineering work. Their names are neither descriptive nor intuitive, so delivering a knowledge-pill about them is essential to understand the ecosystem standing behind the AI-powered applications used in the modern business.
So if you are interested in hiring the AI-related specialist to train one in-house, don’t hesitate to contact us and talk about potential cooperation. We are more than sure that there are a lot of things we can discuss and problems we can solve!
Robert is a co-founder and CEO of Ideamotive. Entrepreneur, who with passion spreads digital revolution all around the internet. Mentor and advisor at startup accelerators. Loves to learn and discover new business models.