The data space is undeniably buzzing right now. Thanks to the AI movement—and before that, when data science was crowned "the sexiest job of the century"—professionals from business, technical, and scientific domains are flocking to data roles (like I once did myself). But here's what I've learned after years in this space: the skills that actually make or break your success as a data professional aren't the ones that are talked about.
Let me break down what I think are the most crucial skills to master, starting with something that might surprise you.
The Soft Skills that actually matter
Requirement Gathering: Your Secret Weapon
This is one of the most important skills you can develop, yet it's rarely mentioned in "how to become a data scientist" tutorials.
Here's why: You'll constantly encounter the XY problem. Someone comes to you saying, "I just want to do X"—they've already drafted a solution in their head. But X isn't the real problem. The real question is why they think they need X in the first place.
Your job isn't to build what they asked for. Your job is to extract what problem your stakeholder, colleague, or client actually has, then draft a solution that solves that problem in the best - often easiest - way possible. This skill alone will set you apart from A LOT of professionals who just execute requests without questioning them.
Stakeholder Management: Not just for Managers
Even if you're in a "back office" role like data engineering, stakeholder management is crucial. You have internal stakeholders—analysts, product managers, other engineers—and you need to know how to:
Tell them when something is broken (and how to say it)
Communicate that something isn't possible without burning bridges
Turn down requests
Communicate when you're behind schedule (and when to communicate it)
Upskill or onboard team members effectively
Not every topic mentioned strictly belongs in Stakeholder Management, but I think you can put it into one umbrella. These conversations happen daily, and how you handle them determines whether you're seen as a partner or just another resource. Stakeholder Management can be a really tough discipline, but it’s necessary.
Documentation: The boring and overlooked Game-Changer
This one's often overlooked and nobody enjoys doing it, but documenting your work, both for yourself and others, is transformative. You don’t need to write a diary of what you accomplished throughout a day. I'm talking about guides, tutorials or breaking down complicated concepts for business users on a Slide.
Make it a habit for things you anticipate as recurring (e.g. steps to upgrade package versions of your architecture or debugging some pipeline failure) or took you a long time to figure out. Yes, you might document something that isn’t needed anymore in the future. But I’m convinced that this habit will turn out to be net positive.
Also, always be aware of who you document for. Documentation for a fellow Engineer or the whole organization look very different and it’s not unusual to have two or more versions, varying and depth and detail.
Unfortunately, there aren't many tools or tutorials specifically for the soft skills in general. But simply being aware that documentation is a crucial step will get you far. Common sense can take you the rest of the way.
The Technical Skills that actually matter
Here's where I might ruffle some feathers: You don't need to learn everything. In fact, learning the wrong things can waste precious time you could spend mastering the fundamentals.
The Dynamic Duo: SQL + Python
SQL is the lingua franca of data. Don’t get demotivated by Text-To-SQL-Bots or some new Pipe Syntax launched by the big platforms (1, 2). If you want to work in and with data, you must learn SQL. It gets you the data you want, in the format you want, when you want it. Non-negotiable.
Python is your Swiss Army knife. It handles orchestration, analytics, visualization, machine learning— you can build entire backend applications with it. But most importantly for data professionals, it automates processes and helps you to get data from one place to another, like a database, where you can then work with SQL on top of it.
With AI-powered coding tools like Cursor, Lovable, and Windsurf, you might think you can skip understanding the fundamentals. Don't make this mistake. From experience, you need to know what the AI is doing. You'll hit problems, the AI will get stuck, and if you don't understand what's happening under the hood, it's impossible to code with confidence or ship anything to production in a business context, where things matter.
The surprising Must-Have: Spreadsheets
Even if you're a data engineer, know the basics of Google Sheets or Excel. Learn when to use them. If you need 1 thing to learn there?
Get familiar with pivot tables! They're incredibly helpful for quick analysis.
Basic filtering, VLOOKUP, or INDEX/MATCH, and that's it.
I once ran into a data engineer who opened a 200MB CSV in Spark just to look at column values. Please don't be that person.
(Yes, I am aware that TextEdit or Notepad would have done the job too here, but I hope you get the point :D)
Role specific orientation
If you want to go into Data Analytics, you’ll learn BI or Dashboarding Tools.
In my opinion, all of these tools have pros and cons, and they have different concepts, best practices and workflows. But usually, you just need a few minutes to get rolling, building the first dashboard.
(let me know if you want a post dedicated to BI tools)
Concepts of Data Visualization and Data Storytelling will come in handy, too!
Most importantly, I would expect a solid business understanding and a certain kind of curiosity from Analysts. You’ll need to dive deep into certain business processes or understand new business models from scratch (e.g. after switching jobs)
If you want to contribute to the data pipeline hands on, git fundamentals for Version Control will be necessary.
But don’t go crazy: Branching and reverting commits will get you far.
In Data Engineering-oriented roles, general Software Engineering principles (Testing, CI/CD, etc.) should be on your reading list. As you’ll work with infrastructure, some basic bash and Linux commands are required to interact with servers or Virtual Machines. But I would not recommend to learn them upfront, but rather know the top 5 commands and the rest you can google or ask your LLM of choice. Instead looking into deployment mechanisms like Docker Containers to solve the infamous “but it works on my machine”-problem.
For Data Science and AI-oriented profiles, obviously, math and statistics are a good foundation. But if you start straight in a Data Science job, you might have these skills anyway. Then you can choose to be more on the business and impact side (learn some “Analyst” topics) or want to potentially deploy your own work (“ML Engineer” topics). Get an understanding of why scheduling your Jupyter Notebook to run every night might not be a good idea.
What NOT to learn (and why)
Skip R
You might stumble into R if you look into data science or analytics, especially since it comes from academia. My advice? Don't bother.
Everything you can do in R, you can do in Python*. I know data scientists who can write in both languages, but when the same package was available in both, they chose Python even when R was technically superior. Why? Python integrates better into the production stack. You can build your entire pipeline—data extraction, transformation, loading, automation—all in Python. You just installed a package instead of translating between languages.
*There might be some very particular industries where R still dominates, but in the vast majority of business settings (not academia or research!), Python rules.
Skip Spark
You might also run into people who say that a good data engineer needs to know Spark to build robust pipelines. Here, it really comes down to the organization you work in. If you work in BigTech and handle PetaBytes of data, you will need it. But the chances are not unlikely that you won’t work in Netflix, Uber or Meta.
Then simpler pipelines with less overhead in the common Cloud Datawarehouses (Snowflake, BigQuery, Databricks) will do the job perfectly fine.
Don't Get Distracted by Rust
Rust gets a lot of hype because it's super fast, and some people are amazed by its resource efficiency. But unless you're a very technical data engineer building tooling for other engineers, save your time.
If you use Polars over pandas, congratulations—you're already using Rust under the hood and maybe didn't even know it. I don't need to understand how a clock works to use it, and the same principle applies here.
For data use cases, Rust isn't necessary.
The Career-Defining Principle
Here's the most important thing I can tell you:
Technologies change, concepts don't.
Take data modeling, for example. The foundational books—Kimball and Inmon—are 30-40 years old, and their concepts are still used every day in companies worldwide. In 10 years, these concepts will still be applied, regardless of what language you're modeling your data in.
Whether you're using SQL stored procedures, Informatica, SSIS, dbt, or SQLMesh, you can apply the same fundamental concepts. This makes your career future-proof and tool-agnostic.
My rule of thumb: if old things are still used, they’re apparently quite good.
(Just applies to concepts, not technologies like floppy disks)
Focus on learning principles that have already survived specific technologies. Master the fundamentals that will serve you regardless of what shiny new tool becomes popular next month.
The Bottom Line
The data field is full of people chasing the latest technology.
Don’t do “Resume Driven Development”. Instead, focus on:
Soft skills that help you understand problems and work with people
Core technical skills (SQL + Python + Spreadsheets) that handle 95% of your needs
Fundamental concepts that remain relevant across decades and technologies
What do you think? What did I miss, or what do you disagree with? I'd love to hear your perspective on what skills actually matter in data careers.