Data & AI Report – Data Trends from November 2024

December 10, 2024

1510

November’s data trends see a huge focus on AI developments & investments. 

We’ve delved into some of the biggest movements below – check them out 👇 

Big Tech’s Big Bets on AI 

In 2024, the world’s leading tech giants—including Microsoft, Amazon, Alphabet, and Meta—are significantly increasing their investments in AI, with their combined AI spending projected to exceed $240 billion this year. The demand for AI-powered tools continues to skyrocket, driving this unprecedented level of investment. 

With forecasts suggesting that AI could contribute an additional $20 trillion to the global economy by 2030, it’s no surprise that the largest corporations are heavily investing in this space. Beyond the economic potential, such investments create additional revenue opportunities. For example, Microsoft’s AI products are expected to generate $10 billion annually. 

For Big Tech, AI transcends being just a trendy concept; it’s a strategic, long-term commitment poised to transform entire industries.  

Samsung’s Second-Generation AI Model: Gauss2 

In November, Samsung held its virtual Samsung Developer Conference Korea 2024, where it unveiled its latest software innovations and future-focused vision. 

A highlight of the conference was the introduction of Samsung Gauss2, the second-generation AI model that promises improved performance, efficiency, and broader applications. Gauss2 is a multimodal AI system capable of processing language, code, and images, and is available in three versions tailored for different use cases: 

  • Compact: Optimized for on-device use in environments with limited computing resources, maximizing device performance. 
  • Balanced: Offers a blend of performance, speed, and efficiency suitable for various tasks. 
  • Supreme: Provides top-tier performance with Mixture of Experts technology, which reduces computational costs during training and inference while maintaining high efficiency. 

Samsung aims to boost productivity through Gauss2 by enhancing its code.i service, enriching the natural language Q&A capabilities of the Samsung Gauss Portal, and supporting multimodal functions such as table and chart analysis and image creation. 

Read more from Samsung here. 

LinkedIn & The Journey to Their GenAI Tech Stack 

LinkedIn has published a detailed blog post that explores the evolution of its reimagined product portfolio and how Generative AI has been integrated into its features. They identify several key focus areas and insights gained from developing their GenAI capabilities: 

  • Prompt Management: Efficiently managing prompts at scale requires systems for templating, versioning, and structuring to support complex applications 
  • Task Automation via Skills: GenAI-driven task automation can unlock significant value but demands advanced tools to scale effectively 
  • Contextual Awareness & Personalisation: Memory plays a crucial role in personalising GenAI experiences and must be thoughtfully integrated into the tech stack 
  • Model Inference & Fine-Tuning: Balancing quality, cost, and latency requires flexible infrastructure that accommodates various models and use cases 
  • Migration to the New Stack: Adopting new technologies through incremental migration and cross-training is essential for stability 

Since the introduction of their GenAI tools in early 2023, LinkedIn has continued to refine these capabilities, moving towards their vision of a robust GenAI tech stack that balances rapid development with long-term scalability. 

Read the full blog from LinkedIn here. 

Google’s Accidental Unveiling of Project Jarvis 

Google appears to be developing an advanced AI assistant that goes beyond traditional chatbots and voice assistants. Known as Project Jarvis, this initiative aims to perform tasks autonomously, rather than waiting for user commands. It can manage tasks such as handling emails, conducting research, and scheduling appointments, setting itself apart by proactively collaborating with users as a digital partner. 

While the technology holds exciting potential, it also raises important questions. On the skeptical side, concerns include the potential for job automation, which could impact routine roles, and security risks, given that Jarvis would have access to users’ sensitive data. However, the technology could also provide significant benefits, enhancing accessibility for people with disabilities and those with busy, on-the-go schedules. 

We’re excited to see where this innovative project leads! 

Conclusion 

As we move through 2024, one thing is clear: AI has firmly cemented itself as the cornerstone of innovation and economic growth. From groundbreaking AI models like Samsung Gauss2 to LinkedIn’s evolution of its product tech stack, and even Google’s latest AI assistant, the landscape is rapidly evolving, and the world’s leading tech companies are investing billions to aid in further development. 

Check back next month to see how we round up the data trends for the end of 2024. 

Data & AI Report – Data Trends in October 2024

November 13, 2024

1510

Data trends in October saw impressive investments, cleantech advancements, AI assistants and most excitingly, robots! 

We’ve delved into the most exciting news from the month – check them out below 👇

European Innovation Council to Boost Deep Tech Innovation with €1.4 Billion Investment in 2025 

The EIC has announced an impressive €1.4 billion investment: a huge boost to deep tech research and strategic technology start-ups across Europe for 2025. This increase represents a massive €200 million boost compared to 2024, underscoring the EU’s commitment to nurturing high-potential tech ventures that will shape Europe’s technological future. 

By improving access to capital, the EIC is actively working to bridge the funding gap that often limits the growth of Europe’s tech pioneers and hinders their global competitiveness. This offers a critical opportunity from European startups – we’re excited to see what this brings! 

Read more about the 2025 EIC programme here. 

Lunar’s AI Voice Assistant to Handle 75% of Customer Calls, Revolutionising Fintech Support 

Danish challenger bank Lunar takes major step forward in customer service by launching a voice assistant powered by AI. Aiming to handle  75% of customer calls, Lunar’s AI assistant promises a seamless 24/7 experience. Accessible answers even in the middle of the night!

Lunar’s move echoes a growing trend among fintechs (including Klarna and Bunq) who are using AI to streamline customer support without cutting jobs. With its forward-thinking approach and a valuation of $2.2 billion, Lunar is positioning itself at the forefront of fintech innovation in the Nordics, aiming to enhance customer service without sacrificing the personal touch. 

See Lunar’s press release here. 

Cleantech Companies Secure a Huge €13.2 billion in Funding in First Three Quarters of 2024 

Cleantech companies are at the forefront of Europe’s drive for sustainability: they are spearheading efforts to reduce carbon emissions and transition toward a circular economy. Their innovative work in renewable energy, sustainable materials, and resource management is not only crucial for environmental resilience but also fuels job creation and economic growth across the continent. 

These investments highlight the strong momentum behind green technologies and signal continued interest in sustainable growth from investors. With support like this, Europe is laying the groundwork for a cleaner, more sustainable future, one that aligns environmental priorities with economic opportunity. 

Check out a few of the companies that were involved in raising the funds: 

Northvolt // Avantium Technologies // BioBTX 

Starship and Bolt Team Up for Robot Grocery Deliveries in Tallinn 

Make way for the robot! 

Estonian-founded tech leaders Starship Technologies & Bolt have joined forces to launch Europe’s first robot-powered grocery delivery service – a huge push forward! 

This groundbreaking service combines Starship’s autonomous delivery robots with Bolt’s popular delivery app. Starship’s robots, consume minimal energy—about the amount needed to boil a kettle for a cup of tea. The robots offer a more sustainable option to usual deliveries – we have no doubt this will quickly spread throughout the rest of Europe! 

Conclusion

Europe is rapidly advancing through strategic investments, cutting-edge AI applications, and innovative green solutions. It’s setting a strong foundation for future growth in high-impact sectors and the developments signal a promising era of sustainable innovation, economic opportunity, and technological leadership across Europe. 

Check back to see what data trends we see develop next month.

Data & AI Report – Data Science Trends September 2024

October 9, 2024

1510

September trends in data science saw introductions into advanced search engines, internal developments & data protection from potential AI threats.

We’ve covered some exciting news from some big names.

Read on for more on all these exciting developments!

Vinted migrate to Vespa – how the online second hand shopping phenomenon is keeping up with the growth & complexity of data

After hitting the limits of their previous search engine, Vinted were on a mission to find a more scalable alternative.

Introducing Vespa: an open-source search engine & vector database. Vespa supports vector and keyword search and searches within structured data, all in one query. It also integrates machine learning which enables real-time AI insights from their data. It’s proven to handle thousands of queries per second, making it the top-runner for managing large & complex data.

Already used by others including Spotify & Yahoo and with continuous application improvements being delivered, will we continue to see increased use of Vespa?

Read more about Vinted’s migration here.

QueryGBT – allowing easier and faster data analysis for Uber

Uber’s data platform handles a huge 1.2 million interactive queries each month. The idea of QueryGBT is to better manage real-time data analytics & to query massive datasets. A combination of Presto (open-source SQL query engine) and Apache Hudi, which has the capabilities for handling upserts and managing large volumes of data in a cloud-based or distributed environment.

The system is part of Uber’s broader efforts to handle large-scale, real-time data streaming and querying – integral to its data-driven approach to decision-making. It’s said to minimise generating reliable queries from 10 minutes down to 3 which is a massive productivity gain for Uber.

Using advanced AI, QueryGPT fits smoothly into Uber’s data system, cutting query time and increasing accuracy to handle their complex data needs.

Read more about the advancements here.

Dropbox & Lakera Guard securing LLM’s

In a blog posted this September, Dropbox delved into how they’re using Lakera Guard to protect their LLM’s from potential security threats posed by AI.

Citing the importance of maintaining the trust of their millions of users to protect their content, Dropbox talk about how they chose Lakera Guard last year to protect user data & uphold the reliability and trustworthiness of their intelligent features, as outlined in their AI principles.

What were Dropbox looking for in their quest for protection? Their considerations concluded that it has to be deployable on their existing infrastructure, have low latency, strong confidence scores and scope for continuous improvement.

Dropbox have since invested in Lakera Guard, proving their strong belief in it’s abilities. Furthermore, they’ve also collaborated with the teams to develop improvements on the software itself. Working closely with Lakera, Dropbox have been able to help them meet their requirements whilst achieving their own security goals, too!

Read the full blog here.

Conclusion

In today’s fast-paced digital landscape, companies like Vinted, Uber and Dropbox are navigating many complexities. Vinted’s switch to Vespa demonstrates the importance of scalable search engines as companies grow. Uber’s QueryGBT highlights the need for faster and more accurate data analytics. Meanwhile, Dropbox’s partnership with Lakera Guard emphasises the need to secure AI systems to ensure data remains protected as AI technologies advance.

Data Jobs in the Netherlands

Interested in a new adventure within the data world?
For information on data jobs in the Netherlands, get in touch.

Data & AI Report – Data Science Trends July 2024

August 5, 2024

1510

Trends in data science have brought a fresh wave of excitement to the data and analytics landscape this July. We’re seeing major moves towards scalability, efficient governance, and AI capabilities. Additionally, Dr. Randy Olson shows us just how far creative data use can take you—literally! Turns out, data science isn’t just about numbers, it can plan one epic road trip too! 

Firstly, discord’s transition to Dagster and dbt for data orchestration 

This month, Discord announced a major overhaul of their data orchestration infrastructure, moving from their in-house system, Derived, to a combination of Dagster and dbt. As their platform and user base expanded, the need for enhanced self-service capabilities and robust observability became evident. This decision was driven by the necessity for declarative automation, a modern unified interface, reliability on Kubernetes, and seamless integration with existing tools. 

After evaluating open-source options like Argo and Prefect, Discord chose Dagster for orchestration and dbt for data modeling. This transition has enabled them to support over 2,000 dbt tables, enhancing their ability to deliver seamless service and insightful data analytics while scaling efficiently. 

Read about it here

Meta unveils Llama 3.1 

This month, Meta introduced Llama 3.1, a massive leap in open-source AI. The Llama 3.1 405B model brings unmatched flexibility and state-of-the-art capabilities, unlocking new workflows like synthetic data generation and model distillation. Additionally, Meta is enhancing the Llama ecosystem with new security tools and a reference system. Over 25 partners, including AWS and Google Cloud, will offer services from day one. 

 

Llama 3.1 models feature expanded context lengths to 128K, multilingual support, and strong performance across benchmarks. Upgraded 8B and 70B models enhance capabilities in general knowledge, tool use, and translation. 

Read Meta’s full update

Building a data-driven analytics team at DoorDash 

Jessica Lachs, DoorDash’s VP of Analytics & Data Science, shares insights on what it means to be truly data-driven and how to structure an analytics team. Having joined DoorDash as the first General Manager in 2014, Lachs has built the analytics team from the ground up and now leads global analytics, including the Wolt Analytics team post-acquisition. 

Not only does Lachs highlight that the term “analytics” can be ambiguous, encompassing data science, business intelligence, product analytics, machine learning, and BizOps. She also emphasizes that to build a data-driven organisation, founders should focus on desired outcomes rather than semantics. At DoorDash, the role of analytics has evolved with the company’s growth, shifting from gut instinct decisions to data-centric strategies. Initially, DoorDash used quasi-experimental methods due to limited data, but as the company matured, they invested in scalable data models and advanced experimentation capabilities, expanding their analytics scope to drive better decision-making. 

Read the full post here

Databricks’ migration to unity catalog for data governance 

In a recent blog post, the Data Platform team at Databricks shared insights into their migration to Unity Catalog for enhanced data governance. As the company grows, establishing secure, compliant, and cost-effective data operations has become a priority. With thousands of employees analysing data, consistent governance standards are essential, making the migration to Unity Catalog a top priority. 

The blog outlines the challenges and benefits of migrating from the default Hive Metastore (HMS) to Unity Catalog. While HMS lacked fine-grained access controls, lineage support, audit logs, and effective search integration, UC provided these features out-of-the-box. Therefore, the team chose a transformational approach, selectively migrating datasets to establish a structured governance framework. This strategy required more effort initially, but enabled clear data ownership, naming conventions, and intentional access, setting the stage for future governance policies.

Read the blog

Finally, some creative Data use!

Dr. Randy Olson, a full stack data scientist and AI researcher, utilised his expertise in machine learning to develop an optimal search strategy.  

He approached this task using the Traveling Salesman Problem (TSP) algorithm, which aims to find the shortest route that visits each city exactly once and returns to the starting point.  

Dr. Olson applied three specific restrictions:  

  1. The trip must stop in all 48 contiguous U.S. states 
  2. Only visit National Natural Landmarks, National Historic Sites, National Parks, or National Monuments #
  3. Be taken entirely by car without leaving the U.S. 

Want to take the trip? The route spans 13,699 miles and requires 224 hours (or 9.33 days) of driving, assuming no traffic. You can find the full itinerary here. 

Dr Randy Olsen used Data to design the optimum road trip across the U.S. Showing how useful data can be. Data science trends really are everywhere!

Olsen’s epic road trip

To conclude 

July highlighted several key trends in data and analytics. The push for scalability is evident in Discord’s adoption of Dagster and dbt, and Databricks’ migration to Unity Catalog for better data governance. The importance of building effective data teams was underscored by DoorDash’s approach to analytics leadership. Another notable trend is the growing emphasis on enhanced self-service capabilities and robust observability in data platforms. These themes point towards a future focused on scalable infrastructure, efficient governance, structured teams, and innovative AI applications.

If you’re interested in how we can help scale your data team, get in touch.

Data & AI Report – June 2024

July 2, 2024

1510

Welcome to our June Data & AI report!  

We’re covering some exciting news this month… who knew data catalogs could be so competitive? We’ve also got some interesting updates from NVIDIA and Netflix. 

Let’s get stuck in! 

Open Source battles: Databricks open sources Unity Catalog…

…live at the Data & AI summit

Following Snowflake’s announcement to open source their Polaris Catalog “within the next 90 days”, Matei Zaharia, Databricks’ CTO & Cofounder, went one up and opened the repo on his laptop during his Keynote speech at the Data & AI summit, navigated to the “danger zone” and in front of everyone, made the repo public there and then. Making Databricks the first to go open source in the industry. 

Gif of someone eating popcorn wearing 3d glasses. A joke referencing the drama Databricks opensourcing their data catalogs Data & AI Summit before Snowflake launched theirs.

Watch a video of the moment it went live here.

In Databricks’ announcement, they shared the reasoning behind their decision to make this public, explaining that “most data platforms today are walled gardens” going on to say “By open-sourcing Unity Catalog, we are giving organisations an open foundation for their current and future workloads.” 

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training LLMs 

NVIDIA has launched an open synthetic data generation pipeline for training large language models. The Nemotron-4 340B family offers advanced instruct and reward models, along with a dataset for generative AI training.  

NVIDIA's Synthetic Response Data diagram

This system provides developers with a free and scalable solution to create synthetic data for building powerful LLMs, enhancing performance and accuracy. The models are designed to work seamlessly with NVIDIA NeMo and TensorRT-LLM for efficient model training and inference. 

Read NVIDIA’s full blog here.

Netflix share a recap of their Data Engineering Open Forum 

Netflix released a summary this month of the sessions from their Data Engineering Open Forum back in April. (along with recordings of all the talks!) 

One session introduced Netflix’s “Auto Remediation” feature, which uses machine learning to handle job errors more efficiently. Jide Ogunjobi talked about using generative AI to help organizations easily manage and query their large data systems. 

Tulika Bhatt explained how Netflix manages 18 billion daily impressions and the importance of real-time data for recommendations. We found Tulika’s talk particularly interesting as it highlighted the creative solutions Netflix employs to balance scalability and cost while delivering real-time data. 

Jessica Larson shared her experience building a new data platform after GDPR, focusing on data protection and compliance. Clark Wright from Airbnb discussed their new Data Quality Score to improve data quality. 

You can read about, and watch all of the talks here 

How Machine Learning is transforming Online Banking security 

Zachary Amos’ recent blog explores how behavioral biometrics can drastically reduce online banking fraud. This ML-driven technology works in the background, monitoring user behavior like mouse movements and keystrokes to spot anything unusual. It processes data in real-time, handling multiple users at once, making it a more streamlined and user-friendly security solution than traditional Multi-Factor Authentication. 

Zachary’s insights show the power of machine learning in boosting security. As cyber threats become more sophisticated, using technology like this ensures accounts stay secure and protected. 

To conclude 

June has been a month full of exciting open-source updates. Databricks made waves by open-sourcing Unity Catalog live on stage, while NVIDIA launched a synthetic data generation pipeline for training large language models.  

We’re especially interested in these open-source developments. They represent a move towards greater collaboration and accessibility in the tech world. 

Want to discuss how we can help you or your data team? Get in touch, or check out our open roles. 

Data & AI Report – May 2024

June 7, 2024

1510

Welcome to our May Data & AI report! This month, we explore developments in data, AI, and machine learning. From transforming data warehouses to pioneering AI/ML applications.

Discover how industry leaders are pushing the boundaries of what’s possible⤵️

Are data warehouses evolving beyond analytics?

Mikkel Dengsøe, founder of Synq released some interesting research about the evolving role of data warehouses this month. Once the domain of reporting and analytics, we’re increasingly seeing them underpin crucial functions like AI/ML, automated marketing, and regulatory reporting. This evolution ups the stakes significantly and means that data accuracy is a top concern for most companies.

Mikkel also points out how data teams and their stacks are growing rapidly. Companies today manage thousands of models and juggle numerous daily jobs to keep things running smoothly. With more business-critical data and a surge in data assets, effective testing approaches are more vital than ever. It seems that basic tests won’t cut it anymore and that niche solutions will be essential to maintain data reliability.

Chart from Mikkel's blog that shows companies using data warehouses more and more for business critical operations, like AI, ML, Business ops and Reporting.

Data warehouses now have to support business-critical uses like AI/ML, automated marketing, and regulatory reporting.

You can check out Mikkel’s full blog here.

What’s next for Uber’s ML platform, Michelangelo?

In a recent blog from Uber, the company shares the strides it’s made so far in machine learning (ML). Since 2016, Michelangelo, Uber’s centralised ML platform, is leveraging data to drive key functions like ETA predictions, rider-driver matching, rankings, and fraud detection. With around 400 active ML projects and over 5K models in production, Michelangelo manages 20K model training jobs monthly and delivers up to 10 million real-time predictions per second.

Screenshot from Uber's Blog, showing how real-time Machine Learning underpins the UberEats app's core user flow.

Real-time ML underpins Eater app core user flow.

The blog goes on to explain Uber’s plans to use generative AI and large language models (LLMs), with the Gen AI Gateway is at the forefront of its mission. With the aim to aid security, efficiency, and cost-effectiveness.

Read the full blog here.

LinkedIn launches LakeChime

This month, LinkedIn introduced LakeChime, a powerful data trigger service designed to enhance the efficiency of their extensive data lake. Handling billions of data points daily, LakeChime streamlines data processing by unifying data trigger semantics across both modern and traditional table formats like Hive and Iceberg.

Central to LakeChime is the Data Change Event (DCE), which captures updates within data tables and triggers downstream workflows via platforms like dbt or Airflow. This innovation ensures timely data availability and enhances pipeline efficiency.

Looking forward, LinkedIn plans to integrate LakeChime with dbt and Coral to automate incremental view maintenance, simplifying the creation of high-performance data pipelines.

Discover more about LakeChime in LinkedIn’s full blog post.

Comic Strip by Todd Comics, making a joke about data lakes that look well constructed and organised above the surface, but underneath the surface is an angry octopus with the Excel for a head, with the file name 'orders_final.xlsx'.

Spotlight on Slack’s female data engineers

Slack shared a blog last month highlighting the incredible work of their female data engineers across their various data teams.

By optimising data workflows with Apache Airflow and Apache Pinot, ensuring sub-second query latency. Senior Software Engineer, Jessica’s team is migrating from virtual machines to Kubernetes, using custom Python tools and automated deployments to boost efficiency.

Senior Software Engineer, Ramya talks about leading the migration from Spark 2 to Spark 3 on AWS EMR6, explaining how it enhances performance and reduces reliance on legacy systems.

Shrushti, another Senior Software Engineer transitioned Slack’s data ingestion from Secor to Bedrock and is now moving to Kafka Connect for real-time streaming. A shift that aligns with industry standards and improves system adaptability.

It’s a really interesting read and shines a light on Slack’s dedication to diversity and inclusion, as well as some of the incredible ways they’re using data. Read the full blog post to meet more inspiring engineers and discover the innovative projects shaping the future at Slack.

In conclusion…

As data continues to grow in volume and complexity, the strategies and technologies we employ must evolve. How will these innovations from Synq, Uber, LinkedIn, and Slack shape your business?

To stay ahead, organisations must keep pace with technological advancements with a culture of continuous learning and adaptation.

Want to discuss how we can help you or your data team? Get in touch, or check out our open roles.

Data & AI Report – April 2024

May 1, 2024

1510
Welcome to our first monthly update on data and AI. No need to scroll endlessly through news sites, we’ve compiled the month’s must-know developments right here!

April saw important developments in technology, highlighting investments and partnerships that emphasize the Netherlands’ involvement in the tech sector.

Google’s €640 Million Dutch Data Centre Project

Google announced a €640 million investment in a new data centre in Groningen, creating 125 jobs. This adds to Google’s total investment of over €3.8 billion in Dutch digital infrastructure since 2014. Read more

KLM Partners with Utrect University AI Labs

KLM Royal Dutch Airlines is collaborating with Utrecht University’s AI Labs to refine operational efficiency and minimize disruptions.

PhD students are developing algorithms to optimize crew and aircraft scheduling, and improve ground processes like baggage handling and passenger boarding. This partnership aims to enhance KLM’s ability to quickly adapt to changes, ensuring smoother operations and prioritising flights effectively through data. Read more

Google Launches Training Programs for AI, Cybersecurity, and Data Analytics

The U.S. Treasury and Google Cloud are partnering to boost data analytics and cybersecurity hiring, aligning with President Biden’s AI Executive Order.

New training programs, accessible via YouTube and Google Cloud Skills Boost, include courses on generative AI, cybersecurity, and data analytics, will equip individuals with the skills needed for digital transformation in the public sector.

Learners also get free access to generative AI tools, including Google’s interview prep tool, Interview Warmup. Read more

Gif showing Google Cloud's new Generative AI Interview Warmup tool.

Source: Google Cloud

AI Breakthrough in Breast Cancer Risk Assessment

Danish and Dutch researchers have advanced breast cancer risk assessment by combining an AI diagnostic tool with a mammographic texture model, under the leadership of Dr. Andreas D. Lauritzen.

This integrated approach improves the prediction of both short- and long-term breast cancer risks, identifying high-risk women more effectively. The innovation promises earlier cancer detection and could alleviate the strain on healthcare systems caused by a shortage of specialist breast radiologists. Read more

These developments underscore a growing need for expert Data, AI, and ML talent. Reach out to discuss how we can help to drive your innovation forward.

contact our team.