What Skills Do You Need to Become a Data Engineer

People often ask me what skills needed to become a data engineer. Before answering that question, let’s take a look at what data engineer do. According to Coursera

Data engineering is the practice designing and building systems for collecting, storing, and analyzing data at scale.

Data engineering has become the backbone of many applications across industries, and data engineer is an indispensable asset for many organizations.

I like using data to answer questions. I extracted 550 United States data engineer jobs from indeed.com and did some quick analyses using job description, location, and salary range. Although sample size is not big, it should be sufficient to reveal some insights and trends.

What DE do?

No alt text provided for this image

In general, a DE is expected to have experience in cloud, ETL, data warehouse, and data modeling. Some organizations also add requirements on producing report/visualization/dashboard. DevOps and API are also preferred skills.

What programming languages DE use?

No alt text provided for this image

Evidently SQL is the number one and most demanded coding language for DE. Python is also a popular programming language required for many data engineer jobs, followed by Java, Scala, and C/C++.

What databases DE use?

No alt text provided for this image

Snowflake, Redshift and Databricks are the most popular choices for developing data warehouse. Oracle and MySQL appear often in job descriptions.

What big data tools DE use?

No alt text provided for this image

Spark dominates big data space and is required by many DE positions. Apache Kafka and Airflow are highly demanded for developing streaming big data pipeline and workflow management.

What education DE need?

No alt text provided for this image

Most DE positions require an undergraduate degree, while a significant number of jobs (28%) prefer a Master degree.

How much is DE’s salary?

No alt text provided for this image

It really depends on the location of the job.

This is a box plot of annual salary per state. Big white dots indicate the average salaries. Each box defines the range of first and third quartiles of salaries, and vertical bar in the middle is the median value. NC appears to have the highest average pay, $118K, followed by CO and WA. Remote, the location at the bottom shows an average of $100K.

How many DE jobs needed cross US?

No alt text provided for this image

DEs are really welcomed in CA and TX. It is interesting to note that almost half of the DE positions allow working remote, which has been a trend after the pandemic started.

There you have it! Role of data engineering is evolving, so are the skills required to deliver the work. What skills/tools do you think a DE should master to succeed at work?