Data & AI Report – Data Science Trends July 2024

August 5, 2024

1510

Trends in data science have brought a fresh wave of excitement to the data and analytics landscape this July. We’re seeing major moves towards scalability, efficient governance, and AI capabilities. Additionally, Dr. Randy Olson shows us just how far creative data use can take you—literally! Turns out, data science isn’t just about numbers, it can plan one epic road trip too! 

Firstly, discord’s transition to Dagster and dbt for data orchestration 

This month, Discord announced a major overhaul of their data orchestration infrastructure, moving from their in-house system, Derived, to a combination of Dagster and dbt. As their platform and user base expanded, the need for enhanced self-service capabilities and robust observability became evident. This decision was driven by the necessity for declarative automation, a modern unified interface, reliability on Kubernetes, and seamless integration with existing tools. 

After evaluating open-source options like Argo and Prefect, Discord chose Dagster for orchestration and dbt for data modeling. This transition has enabled them to support over 2,000 dbt tables, enhancing their ability to deliver seamless service and insightful data analytics while scaling efficiently. 

Read about it here

Meta unveils Llama 3.1 

This month, Meta introduced Llama 3.1, a massive leap in open-source AI. The Llama 3.1 405B model brings unmatched flexibility and state-of-the-art capabilities, unlocking new workflows like synthetic data generation and model distillation. Additionally, Meta is enhancing the Llama ecosystem with new security tools and a reference system. Over 25 partners, including AWS and Google Cloud, will offer services from day one. 

 

Llama 3.1 models feature expanded context lengths to 128K, multilingual support, and strong performance across benchmarks. Upgraded 8B and 70B models enhance capabilities in general knowledge, tool use, and translation. 

Read Meta’s full update

Building a data-driven analytics team at DoorDash 

Jessica Lachs, DoorDash’s VP of Analytics & Data Science, shares insights on what it means to be truly data-driven and how to structure an analytics team. Having joined DoorDash as the first General Manager in 2014, Lachs has built the analytics team from the ground up and now leads global analytics, including the Wolt Analytics team post-acquisition. 

Not only does Lachs highlight that the term “analytics” can be ambiguous, encompassing data science, business intelligence, product analytics, machine learning, and BizOps. She also emphasizes that to build a data-driven organisation, founders should focus on desired outcomes rather than semantics. At DoorDash, the role of analytics has evolved with the company’s growth, shifting from gut instinct decisions to data-centric strategies. Initially, DoorDash used quasi-experimental methods due to limited data, but as the company matured, they invested in scalable data models and advanced experimentation capabilities, expanding their analytics scope to drive better decision-making. 

Read the full post here

Databricks’ migration to unity catalog for data governance 

In a recent blog post, the Data Platform team at Databricks shared insights into their migration to Unity Catalog for enhanced data governance. As the company grows, establishing secure, compliant, and cost-effective data operations has become a priority. With thousands of employees analysing data, consistent governance standards are essential, making the migration to Unity Catalog a top priority. 

The blog outlines the challenges and benefits of migrating from the default Hive Metastore (HMS) to Unity Catalog. While HMS lacked fine-grained access controls, lineage support, audit logs, and effective search integration, UC provided these features out-of-the-box. Therefore, the team chose a transformational approach, selectively migrating datasets to establish a structured governance framework. This strategy required more effort initially, but enabled clear data ownership, naming conventions, and intentional access, setting the stage for future governance policies.

Read the blog

Finally, some creative Data use!

Dr. Randy Olson, a full stack data scientist and AI researcher, utilised his expertise in machine learning to develop an optimal search strategy.  

He approached this task using the Traveling Salesman Problem (TSP) algorithm, which aims to find the shortest route that visits each city exactly once and returns to the starting point.  

Dr. Olson applied three specific restrictions:  

  1. The trip must stop in all 48 contiguous U.S. states 
  2. Only visit National Natural Landmarks, National Historic Sites, National Parks, or National Monuments #
  3. Be taken entirely by car without leaving the U.S. 

Want to take the trip? The route spans 13,699 miles and requires 224 hours (or 9.33 days) of driving, assuming no traffic. You can find the full itinerary here. 

Dr Randy Olsen used Data to design the optimum road trip across the U.S. Showing how useful data can be. Data science trends really are everywhere!

Olsen’s epic road trip

To conclude 

July highlighted several key trends in data and analytics. The push for scalability is evident in Discord’s adoption of Dagster and dbt, and Databricks’ migration to Unity Catalog for better data governance. The importance of building effective data teams was underscored by DoorDash’s approach to analytics leadership. Another notable trend is the growing emphasis on enhanced self-service capabilities and robust observability in data platforms. These themes point towards a future focused on scalable infrastructure, efficient governance, structured teams, and innovative AI applications.

If you’re interested in how we can help scale your data team, get in touch.

Dutch Recruitment Trends and Skills to Watch 👀

May 21, 2024

1510

If you’ve had anything to do with recruitment in the Netherlands over the past year, you’re probably well aware that it’s been a bumpy ride. 

However, it looks like the clouds are parting (finally!), we’re seeing more roles on the market, and lots of highly-qualified candidates to fill them! We’re on the brink of some very exciting times, with lots of new roles and exciting developments coming this year. Read on for our take on the dutch recruitment trends and skills you need to watch for the second half of the year. ⬇️

What are the key trends? 

We’re seeing a major shift in the Dutch labour market, fueled by automation, artificial intelligence (AI), and data analytics.  

Data and AI are being increasingly integrated into daily business operations, and only the companies that act fast will be able to get ahead. 

GIF of Patrick Swayze's character, Johnny Castle in Dirty Dancing, with the caption 'nobody puts baby in the corner' - the word 'baby' is covered and replaced by the word 'data'. Nobody put's data in the corner. Relating to the fact that data used to be separate within companies, but it's being increasingly included in business operations.

Gone are the days of data being a secluded department in the shadows.

What Does This Mean for the Dutch IT Market? 

DATA TRENDS

In an effort to keep up with technological advancements and shifts towards more data-driven practices, we’re seeing more and more companies focusing on digital transformation projects, modernisation and streamlining of outdated processes. 

AI/ML TRENDS

Obviously, AI and ML technologies are taking over the world and it seems like companies are starting to realise that AI isn’t just a ‘fad’. It’s no longer about just keeping up with trends – for businesses that want to stay competitive and innovative, they’ve got to get ahead of their competitors and make use of all the technology available. There’s also huge potential for companies to boost team productivity, innovate and elevate customer experiences. 

In particular, we’re seeing companies in the FinTech, EdTech and EnviroTech spaces leaning into AI/ML. As a result, it’s likely that earlier adopters, who invest in this technology now, could end up being the unicorns of the future. 

CYBERSECURITY TRENDS

Given that The Netherlands is home to almost a third of Europe’s data centres (and Google starting work on a new data centre in Groningen this year!) clients recognise huge opportunities for growth.

On the flipside, with so much more of our data being processed online, the target for hackers is getting bigger and bigger. With this in mind, clients are expressing concerns around needs for improvement and investment in protection from cyber-terrorism. 

The Netherlands’ cybersecurity market is projected to continue increasing, from USD 2.16 billion in 2024 to USD 3.27 billion by 2029 (Security Insight).  

Does this mean we need to adopt a load of new languages or frameworks?  

Don’t panic!  

Python continues to reign supreme across all these areas, making it indispensable for most tasks. Candidates with a strong Python background often excel due to its simplicity and versatility, supported by over 135,000 libraries. 

Python’s role in data science, AI, ML, and cybersecurity is significant: with essential libraries like NumPy, Pandas, Scikit-learn, TensorFlow, PyTorch, and Scapy driving advancements. 

Candidates and clients we’ve spoken to recently highlight the importance of proficiency in C, Bash, and PowerShell for cybersecurity roles. They provide the low-level access and automation capabilities necessary for system programming, exploit development, and managing security operations.

Hiring managers frequently emphasise the value of candidates skilled in these areas to ensure teams are equipped to handle challenges effectively. 

However, staying ahead means being aware of other key players gaining traction: 

Data: 
  • Apache Spark, Snowflake, DBT for data transformation. 
  • Emerging tools like Polars and DuckDB. 
AI/ML: 
  • Hugging Face Transformers for NLP work. 
  • Julia for high-performance numerical tasks. 
  • Ray for scaling Python applications. 
Cybersecurity: 
  • Rust for secure application development. 
  • Tools like Darktrace and Cylance are leading in AI-driven threat detection. 
  • Burp Suite remains essential for web app security testing. 
  • Swift for system programming and security applications. 

What’s our advice? 

For candidates:

We’re advising Data candidates to focus on building skills in essential libraries like NumPy, Pandas, and TensorFlow. Also, engaging in data-centric projects helps build a strong portfolio while staying updated through online courses. 

As AI/Machine Learning are particularly sought after areas. As a result, we recommend enhancing your visibility by participating in competitions and contributing to open-source projects.  

For cybersecurity, focus on the tools we listed above, using these in personal projects where possible. What’s more, we’ve found that obtaining certifications like CISSP (Certified Information Systems Security Professional) or CEH (Certified Ethical Hacker) can be beneficial. 

For clients:  

Promote a culture of knowledge sharing through internal workshops and encourage team participation in industry events and conferences. This approach ensures your engineering team remains competitive and well-equipped to handle emerging challenges.

Investing in continuous learning and development not only keeps your team at the forefront of industry standards. It also aids in team collaboration and idea sharing helping keep staff engaged, crucial for retention. 

Finally, we recommend leveraging consultative recruiters (like Foxtek) to navigate technological advancements and avoid hiring mistakes. We focus our search on candidates familiar with key tools relevant to upcoming trends in each industry.

Wrapping up 

GIF of Lando Calrissian in Solo: A Star Wars Story with the caption 'you might want to buckle up, baby'. The gif is a joke referencing a line in the blog about how this is going to be a wild ride. We're expecting to see trends in the dutch market.

In conclusion, the next year is shaping up to be a wild ride. For tech professionals, the opportunities are vast and varied, with a strong push towards innovation, sustainability, and efficiency. Expect to see a load more companies looking to hire roles related to Data, AI, Machine Learning and Cybersecurity. 

On the other hand, for companies, adapting to these trends early will ensure you attract the right talent and stay competitive. 

Want to get ahead of the curve? Reach out to discuss your requirements, or check out our open roles.

Data & AI Report – April 2024

May 1, 2024

1510
Welcome to our first monthly update on data and AI. No need to scroll endlessly through news sites, we’ve compiled the month’s must-know developments right here!

April saw important developments in technology, highlighting investments and partnerships that emphasize the Netherlands’ involvement in the tech sector.

Google’s €640 Million Dutch Data Centre Project

Google announced a €640 million investment in a new data centre in Groningen, creating 125 jobs. This adds to Google’s total investment of over €3.8 billion in Dutch digital infrastructure since 2014. Read more

KLM Partners with Utrect University AI Labs

KLM Royal Dutch Airlines is collaborating with Utrecht University’s AI Labs to refine operational efficiency and minimize disruptions.

PhD students are developing algorithms to optimize crew and aircraft scheduling, and improve ground processes like baggage handling and passenger boarding. This partnership aims to enhance KLM’s ability to quickly adapt to changes, ensuring smoother operations and prioritising flights effectively through data. Read more

Google Launches Training Programs for AI, Cybersecurity, and Data Analytics

The U.S. Treasury and Google Cloud are partnering to boost data analytics and cybersecurity hiring, aligning with President Biden’s AI Executive Order.

New training programs, accessible via YouTube and Google Cloud Skills Boost, include courses on generative AI, cybersecurity, and data analytics, will equip individuals with the skills needed for digital transformation in the public sector.

Learners also get free access to generative AI tools, including Google’s interview prep tool, Interview Warmup. Read more

Gif showing Google Cloud's new Generative AI Interview Warmup tool.

Source: Google Cloud

AI Breakthrough in Breast Cancer Risk Assessment

Danish and Dutch researchers have advanced breast cancer risk assessment by combining an AI diagnostic tool with a mammographic texture model, under the leadership of Dr. Andreas D. Lauritzen.

This integrated approach improves the prediction of both short- and long-term breast cancer risks, identifying high-risk women more effectively. The innovation promises earlier cancer detection and could alleviate the strain on healthcare systems caused by a shortage of specialist breast radiologists. Read more

These developments underscore a growing need for expert Data, AI, and ML talent. Reach out to discuss how we can help to drive your innovation forward.

contact our team.