How to fill the first Data Role in a Company
Today, we're diving into a crucial topic: making your first data hire.
If you're scratching your head wondering what role to look for, what skills to prioritize, and how to navigate this complex landscape, you're in the right place. Let's break it down step by step.
Why a Generalist is your best bet
When it comes to your first data hire, think generalist. You need someone who can wear multiple hats and adapt to various tasks.
An Analytics Engineer or a tech-savvy Data Analyst who's comfortable with Python is a great fit. This person should be able to identify necessary data sources, transform data, and produce dashboards or analyses based on often vague stakeholder requirements. A general idea of testing, monitoring, and cost optimization (cloud computing) will be appreciated, too, to provide trustworthy data and not wake up with a huge monthly cloud bill.
Technical Skills to look for
Data Transformation: Your hire should be very seasoned at transforming raw data into meaningful insights with SQL (and Python).
Dashboard Creation: Some kind of Dashboards and automated Reports usually are the first deliverables. Excel/Google Sheet knowledge is expected as sometimes simply a refreshing Pivot Table is perfectly enough. Some knowledge of Jupyter Notebooks is helpful for advanced analysis.
Infrastructure Knowledge: While they don't need to be an expert, familiarity with some parts of the Infrastructure (Github, Cloud Provider Permissions etc.) is helpful, even though managed services can and should take away a lot of that.
While these technical skills might seem straightforward on paper, knowing when and how to apply them effectively requires seasoned judgment that only comes with experience.
Experience matters
In an era of an infinite amount of tutorials, blogs, and AI assistants available, it's easy to get lost. You need to be able to distinguish useful information from noise. But how?
”Theory is taught, experience is earned” they say.
Someone with 3-5 years of experience can navigate these resources more effectively and make better decisions. Take my last post as an example:
Somebody who did a lot of this debugging can detect the issue in minutes, while others dedicate their whole day to it. Also, since no tooling is established yet, you need to pick some. Somebody who either has done it before or already knows the Pro’s and Con’s of Snowflake vs BigQuery vs Databricks can help you more than somebody who needs to look it up or learn the hard way.
The Experience Range
Ideally, someone with 5 years of experience or more. They might come with a higher salary, but I would say their expertise is worth the investment.
Also, make sure the experience is relevant:
You’re a Series A Start Up and want to hire somebody with 10 years of experience in a Corporate? 🤷
You’re a mature family-run SME and want to hire somebody who has spent their whole life in VC-backed Start Ups? 🤷
The Role of a Data Scientist
Everybody wants to do AI. And the people who can give you must be smart enough to do the rest too, right? You might be tempted to hire a data scientist as your first data hire, but hold that thought.
There is the joke (unknown source) in the data industry:
”A Data Scientist is a Data Engineer until they have clean data”.
Trust me: No data will be clean right at the start (or stay that way for long).
Therefore, Data Scientists often spend a significant amount of time cleaning and transforming data, which might not be their core expertise, preference or what they hoped for (or you promised them).
So it might be that either you or they want to end things earlier than initially expected due to wrong expectations.
Often, people without much tech knowledge might assume that you will spend the first few weeks or months on the setup and infrastructure, build some automated reporting along the way and then can finally focus on the cool ML stuff. Data pipelines are not a skyscraper you build once and then just maintain a bit. Business logic changes, there will be new Marketing Channels (data sources) to integrate, Stakeholder want more use cases covered etc.
It’s rather like a garden that needs constant quality checks and maintenance to stay up to date with the business.
Also, they might not be that familiar with tooling around the platform, take care about things like pipeline monitoring & testing and production deployments.
When a Data Scientist makes sense
If your Core Product is an Algorithm.
If your Start Up's core product is an algorithm (e.g. the Matching Algorithm of a Dating App), a data scientist might be necessary. However, ensure you have clean data for them to work with and your Backend Engineers should take over the Deployement/Infrastructure duties.
Building an AI Start Up based on LLM’s?
If you plan to use any of the public models from OpenAI, Anthropic, Google, etc. chances are rather high that you need a Data Engineering rather than a Data Science profile. (Again, generalist experience is required, no matter where the particular focus/strength will be)
Soft Skills: The Icing on the Cake
Technical skills are crucial, but soft skills can make or break your first data hire.
(I assume it’s a smaller company or Start Up, hiring for Data)
Here are some key traits to look for:
Pragmatism: In a Start Up environment, perfectionism can be a roadblock.
Building the perfect solution doesn’t matter if it’s too late to close the funding round. You need someone who can deliver quick and effective solutions. 60/40 is your friend, while keeping an eye on Tech Debt.
But also in Non-Start Up settings, you need to know when 80/20 is a must, and when you want to win a Kaggle CompetitionProactivity: In small companies provactivity is usually rewarded. Roles aren’t clearly defined and things are very loose. You can identify opportunities, bring in new ideas, and drive projects forward - if you want to.
You rather want to stay in “your lane” (Role description) and the background? Then maybe a bigger Org is better.Communication: Your data hire will need to communicate with stakeholders, founders, and potentially even customers. Strong communication skills are essential for prioritizing tasks and saying “No” when necessary. (Also here experience is helpful to relate with stakeholders and have the necessary guts to say “No” to a Co-Founder or Managing Director)
Practical Tips for Your First Data Hire
Outsource Infrastructure: Consider using managed services for infrastructure needs and orchestration. This allows your data hire to focus on customer-facing datasets, applications and reports, rather than getting bogged down by technical setup. Datacoves, Orchestra or Paradime might be worth a look.
Best Practice != Best Practice: Be aware that best practices and tooling from Netflix or Lyft look impressive. But these companies operate on different scales and resources than you. When screening content around examples of stacks or solutions, always check who’s behind them (advertorial, scale, expertise?) and if the situation also applies to yourself.
Consulting: If you cannot find a more senior generalist, you can get lucky by hiring a junior, supplementing their skills with consulting hours from a veteran to ensure a well-rounded approach.
Conclusion
Making your first data hire is a significant milestone for any company. By focusing on a generalist with a mix of technical and soft skills, you can set your data domain on the path to data-driven success. Remember, experience matters, but a proactive and communicative hire can bring immense value to your team.
Happy hiring, and here's to your data-driven future! 🚀