Data & AI Report – AI Making Waves Again

April 3, 2025

1510
Welcome back! 

Last month’s Data & AI Report focused heavily on AI advancements – this month is no different! From open-source breakthroughs to policy debates and autonomous innovations, here are March’ top AI stories. 

🥇 DeepSeek Dominates Again 🥇

For the third month in a row, DeepSeek is making headlines! The latest model, DeepSeek V3-0324, now leads as the top-performing non-reasoning AI, surpassing Google’s Gemini 2.0 Pro & Meta’s Llama 3.3. 

While still catching up to reasoning models, its instant response speed makes it ideal for real-time applications like chatbots. This is a major win for open-source AI, proving its growing competitiveness, and a notable success for DeepSeek despite ongoing controversies. 

🫡 OpenAI & Google Push for U.S. AI Leadership 🫡

As China advances in AI, OpenAI & Google are urging the U.S government to act swiftly to maintain its global AI leadership. Their joint recommendations include: 

✅ Stronger export controls 

✅ Increased AI investment 

✅ Infrastructure development 

✅ Government AI adoption 

Both companies stress that policy decisions made now, will shape the future of AI dominance, impacting innovation, security & economic growth. 

👋 Meet Carl: AI’s First Automated Research Scientist 👋

The Autoscience Institute has introduced Carl, an AI system capable of independently producing academic research that passes rigorous peer review. Carl’s papers were accepted at the ICLR conference, marking a breakthrough in AI-driven scientific research. 

How does Carl work, you might ask? 

  • Generating ideas: Generates hypotheses by analysing academic literature 
  • Experimentation: Runs code, tests ideas, and visualises data 
  • Presentation: Writes polished research papers with clear conclusions 

There’s no doubt that as AI developments unfold, we’ll be seeing more Carl like creations. The academic community is going to need to adapt to this as it develops to ensure integrity, safeguarding and relevance. 

🚗 Japan’s Autonomous Vehicle Push 🚗

Nissan has launched driverless cars in Yokohama, using a system of 14 cameras, nine radards and six LiDAR sensors. Whilst Japan have been lagging behind America & China in this area so far, they are catching up now. 

By 2029, Nissan aims to have 20 fully autonomous vehicles on the road – without human intervention. With Japan’s reputation for precision engineering, expectations are high, and failure is not an option. 

Conclusion 

Another month of AI evolution of breakneck speed! Check back for next month’s Data & AI report to see how things are progressing.

Data & AI Report – AI Remains Center of Attention

March 10, 2025

1510
Welcome back! 

In January’s Data & AI Report, we highlighted the growing emphasis on AI – and this month is no exception! 

AI remains the center of attention, and we’ve gathered some of the top stories for you. Dive in below! 👇 

🔔 AI Action Summit raises alarms 🔔

The summit in Paris showed a collective understanding amongst global leaders, industry experts & academics to address AI’s challenges & opportunities.  

Key themes include: 

  • Governance & Regulation: leaders stressed the urgency of unified global AI guidelines to ensure safety, ethics, and fairness-
  • Workforce & Inclusion: experts emphasise upskilling workers for an AI-driven economy to ensure AI benefits extend beyond Silicon Valley to all communities 
  • Public Interest & Bias: academics advocate for AI development that serves the public good, tackling bias and pushing back against big tech’s dominance
  • Evaluation & Ethics: calls for systematic AI evaluation to balance innovation with ethical considerations

The summit marked a crucial moment in shaping AI’s future, with a strong focus on equity, transparency, and responsible innovation. 

💸 Big Tech’s Big Investments 💸

2025 is set to witness the largest AI investment in history, with tech giants Amazon, Microsoft, Google, and Meta committing a staggering $320 billion to AI infrastructure – a 30% increase from 2024. 

Last month’s Data & AI Report touched on the DeepSeek saga, which challenged the notion that developing advanced AI requires sky-high costs. However, big tech is showing no signs of scaling back.

The message is clear: the AI revolution is in full swing, and only the biggest players with the biggest investments are leading the charge. 

🤫 Meta’s Rumoured $200 billion AI Data Centre 🤫

Rumors are swirling about Meta’s potential $200 billion AI data center project, adding to its already massive investment in AI infrastructure. Reports suggest they’re exploring locations in Louisiana, Wyoming, or Texas, with site visits allegedly underway. 

However, a Meta spokesperson has denied these claims, stating that aside from its previously announced $65 billion investment, no additional projects are in the works. 

With speculation still lingering, we’re watching closely to see how this story develops. 

🛵 DeepSeek AI Revolutionising China’s Trasport Sector 🛵

DeepSeek’s AI technology is rapidly transforming China’s transportation sector & major car and e-scooter manufacturers are integrating it into their products. Initially adopted by EV giants like BYD and Geely, the technology is now making its way into electric two-wheelers.

BYD plans to incorporate DeepSeek into its software platform, enabling self-driving capabilities across multiple models without increasing costs. Meanwhile, e-scooter brands are leveraging AI for rider assistance, voice interaction, and personalised services. 

Industry experts predict that by 2025, two-thirds of new cars in China will feature autonomous driving capabilities, highlighting the technology’s growing impact on mobility. 

Read more via DeepSeek here. 

Conclusion 

AI continues to dominate headlines, shaping industries and economies worldwide. From global policy discussions at the AI Action Summit to record-breaking investments by tech giants, the momentum behind AI innovation is undeniable.  

As we move further into 2025, one thing is clear: AI is not just the future – it’s the present, evolving at an unprecedented pace. Stay tuned for next month’s update as we continue tracking the latest developments in this ever-changing landscape! 

Data & AI Report – Trends from January 2025

February 17, 2025

1510

Welcome back! Our previous Data & AI report highlighted significant AI advancements, and our January round-up is no different. 

AI continues to evolve at a rapid pace, with new developments emerging constantly. We’ve compiled the top stories from January below – take a look 👇   

🤖 DeepSeek vs OpenAI 🤖

The AI landscape saw a major shake-up this January! 

Chinese startup DeepSeek launched its AI chatbot, quickly soaring to the top as the highest-rated free app in the U.S., surpassing ChatGPT for the number one spot. 

What followed, however, has turned into a full-fledged battle of the bots. OpenAI, along with U.S. government officials, has accused DeepSeek of replicating responses from its models, alleging that this allowed the company to develop its chatbot at a fraction of the cost.

Yet, no concrete evidence has been presented. We’ll be watching closely to see how this unfolds! 

💻💄L’Oréal & AI – a merging of two worlds 💄💻

January marked an unexpected fusion of beauty and AI. L’Oréal announced a partnership with IBM’s GenAI technology to revolutionize its innovation and development process. 

With a focus on sustainability and product quality, the custom AI model will be built on a vast database of cosmetic formulas. This will streamline new product creation, enhance existing ones, and accelerate crucial tasks. 

If successful, L’Oréal and IBM could lead a broader industry shift, demonstrating AI’s transformative power even in unexpected sectors. 

Check out more on IBM’s website here.

🚫 Alterya & Chainalysis on the fight against scammers  🚫

The AI-powered fraud detection solution, Alterya, identifies scammers before they can act. They already work with top cryptocurrency exchanges & fintech’s, currently monitoring more than $8 billion in transactions per month. 

The acquisition means Chainalysis can double down on their strategy to invest in the prevention of fraudulent transactions. Integrating Alterya technology with the Chainalysis blockchain data platform will strengthen their effectiveness, creating network effects across blockchains and digital payment systems while enhancing Alterya’s capability to identify fraudulent activity. 

This could be a powerful acquisition to continue fighting scammers! 

Read more via Chainalysis here. 

🧠 Human AI data “exhausted” says Musk 🧠

Elon Musk has claimed that AI companies have run out of data for training their models. Human knowledge has been exhausted. 

What does this mean for the building of new systems? 

Well, AI firms are going to have to look toward synthetic data (that created by AI models) if they want to further develop their models. This isn’t all that new, though. Some of the giants including Meta, Microsoft & Google have already used synthetic data in their developments. 

As the developments progress, as does the skepticism around the ethics of AI. OpenAI openly admits that it would be impossible to create tools without access to copyrighted material, which is presenting challenges from the creative & publishing worlds. 

How this unfolds is something we’ll be keeping an eye on! 

Conclusion 

That’s it for this month’s Data & AI Report. From industry shake-ups to groundbreaking partnerships and ambitious national strategies, AI continues to redefine what’s possible. Whether it’s a battle for dominance in the chatbot space, beauty brands embracing AI-driven innovation, or how AI is going to gather data, one thing is clear—this technology is moving fast, and the stakes are higher than ever. 

As sustainability, ethics, and innovation collide, the coming months will be crucial in shaping the future of AI. Who will lead, who will adapt, and who will be left behind? We’ll be watching closely. Stay tuned! 🚀 

Data & AI Report – Data Trends from November 2024

December 10, 2024

1510

November’s data trends see a huge focus on AI developments & investments. 

We’ve delved into some of the biggest movements below – check them out 👇 

Big Tech’s Big Bets on AI 

In 2024, the world’s leading tech giants—including Microsoft, Amazon, Alphabet, and Meta—are significantly increasing their investments in AI, with their combined AI spending projected to exceed $240 billion this year. The demand for AI-powered tools continues to skyrocket, driving this unprecedented level of investment. 

With forecasts suggesting that AI could contribute an additional $20 trillion to the global economy by 2030, it’s no surprise that the largest corporations are heavily investing in this space. Beyond the economic potential, such investments create additional revenue opportunities. For example, Microsoft’s AI products are expected to generate $10 billion annually. 

For Big Tech, AI transcends being just a trendy concept; it’s a strategic, long-term commitment poised to transform entire industries.  

Samsung’s Second-Generation AI Model: Gauss2 

In November, Samsung held its virtual Samsung Developer Conference Korea 2024, where it unveiled its latest software innovations and future-focused vision. 

A highlight of the conference was the introduction of Samsung Gauss2, the second-generation AI model that promises improved performance, efficiency, and broader applications. Gauss2 is a multimodal AI system capable of processing language, code, and images, and is available in three versions tailored for different use cases: 

  • Compact: Optimized for on-device use in environments with limited computing resources, maximizing device performance. 
  • Balanced: Offers a blend of performance, speed, and efficiency suitable for various tasks. 
  • Supreme: Provides top-tier performance with Mixture of Experts technology, which reduces computational costs during training and inference while maintaining high efficiency. 

Samsung aims to boost productivity through Gauss2 by enhancing its code.i service, enriching the natural language Q&A capabilities of the Samsung Gauss Portal, and supporting multimodal functions such as table and chart analysis and image creation. 

Read more from Samsung here. 

LinkedIn & The Journey to Their GenAI Tech Stack 

LinkedIn has published a detailed blog post that explores the evolution of its reimagined product portfolio and how Generative AI has been integrated into its features. They identify several key focus areas and insights gained from developing their GenAI capabilities: 

  • Prompt Management: Efficiently managing prompts at scale requires systems for templating, versioning, and structuring to support complex applications 
  • Task Automation via Skills: GenAI-driven task automation can unlock significant value but demands advanced tools to scale effectively 
  • Contextual Awareness & Personalisation: Memory plays a crucial role in personalising GenAI experiences and must be thoughtfully integrated into the tech stack 
  • Model Inference & Fine-Tuning: Balancing quality, cost, and latency requires flexible infrastructure that accommodates various models and use cases 
  • Migration to the New Stack: Adopting new technologies through incremental migration and cross-training is essential for stability 

Since the introduction of their GenAI tools in early 2023, LinkedIn has continued to refine these capabilities, moving towards their vision of a robust GenAI tech stack that balances rapid development with long-term scalability. 

Read the full blog from LinkedIn here. 

Google’s Accidental Unveiling of Project Jarvis 

Google appears to be developing an advanced AI assistant that goes beyond traditional chatbots and voice assistants. Known as Project Jarvis, this initiative aims to perform tasks autonomously, rather than waiting for user commands. It can manage tasks such as handling emails, conducting research, and scheduling appointments, setting itself apart by proactively collaborating with users as a digital partner. 

While the technology holds exciting potential, it also raises important questions. On the skeptical side, concerns include the potential for job automation, which could impact routine roles, and security risks, given that Jarvis would have access to users’ sensitive data. However, the technology could also provide significant benefits, enhancing accessibility for people with disabilities and those with busy, on-the-go schedules. 

We’re excited to see where this innovative project leads! 

Conclusion 

As we move through 2024, one thing is clear: AI has firmly cemented itself as the cornerstone of innovation and economic growth. From groundbreaking AI models like Samsung Gauss2 to LinkedIn’s evolution of its product tech stack, and even Google’s latest AI assistant, the landscape is rapidly evolving, and the world’s leading tech companies are investing billions to aid in further development. 

Check back next month to see how we round up the data trends for the end of 2024. 

Data & AI Report – Data Trends in October 2024

November 13, 2024

1510

Data trends in October saw impressive investments, cleantech advancements, AI assistants and most excitingly, robots! 

We’ve delved into the most exciting news from the month – check them out below 👇

European Innovation Council to Boost Deep Tech Innovation with €1.4 Billion Investment in 2025 

The EIC has announced an impressive €1.4 billion investment: a huge boost to deep tech research and strategic technology start-ups across Europe for 2025. This increase represents a massive €200 million boost compared to 2024, underscoring the EU’s commitment to nurturing high-potential tech ventures that will shape Europe’s technological future. 

By improving access to capital, the EIC is actively working to bridge the funding gap that often limits the growth of Europe’s tech pioneers and hinders their global competitiveness. This offers a critical opportunity from European startups – we’re excited to see what this brings! 

Read more about the 2025 EIC programme here. 

Lunar’s AI Voice Assistant to Handle 75% of Customer Calls, Revolutionising Fintech Support 

Danish challenger bank Lunar takes major step forward in customer service by launching a voice assistant powered by AI. Aiming to handle  75% of customer calls, Lunar’s AI assistant promises a seamless 24/7 experience. Accessible answers even in the middle of the night!

Lunar’s move echoes a growing trend among fintechs (including Klarna and Bunq) who are using AI to streamline customer support without cutting jobs. With its forward-thinking approach and a valuation of $2.2 billion, Lunar is positioning itself at the forefront of fintech innovation in the Nordics, aiming to enhance customer service without sacrificing the personal touch. 

See Lunar’s press release here. 

Cleantech Companies Secure a Huge €13.2 billion in Funding in First Three Quarters of 2024 

Cleantech companies are at the forefront of Europe’s drive for sustainability: they are spearheading efforts to reduce carbon emissions and transition toward a circular economy. Their innovative work in renewable energy, sustainable materials, and resource management is not only crucial for environmental resilience but also fuels job creation and economic growth across the continent. 

These investments highlight the strong momentum behind green technologies and signal continued interest in sustainable growth from investors. With support like this, Europe is laying the groundwork for a cleaner, more sustainable future, one that aligns environmental priorities with economic opportunity. 

Check out a few of the companies that were involved in raising the funds: 

Northvolt // Avantium Technologies // BioBTX 

Starship and Bolt Team Up for Robot Grocery Deliveries in Tallinn 

Make way for the robot! 

Estonian-founded tech leaders Starship Technologies & Bolt have joined forces to launch Europe’s first robot-powered grocery delivery service – a huge push forward! 

This groundbreaking service combines Starship’s autonomous delivery robots with Bolt’s popular delivery app. Starship’s robots, consume minimal energy—about the amount needed to boil a kettle for a cup of tea. The robots offer a more sustainable option to usual deliveries – we have no doubt this will quickly spread throughout the rest of Europe! 

Conclusion

Europe is rapidly advancing through strategic investments, cutting-edge AI applications, and innovative green solutions. It’s setting a strong foundation for future growth in high-impact sectors and the developments signal a promising era of sustainable innovation, economic opportunity, and technological leadership across Europe. 

Check back to see what data trends we see develop next month.

Data & AI Report – Data Science Trends September 2024

October 9, 2024

1510

September trends in data science saw introductions into advanced search engines, internal developments & data protection from potential AI threats.

We’ve covered some exciting news from some big names.

Read on for more on all these exciting developments!

Vinted migrate to Vespa – how the online second hand shopping phenomenon is keeping up with the growth & complexity of data

After hitting the limits of their previous search engine, Vinted were on a mission to find a more scalable alternative.

Introducing Vespa: an open-source search engine & vector database. Vespa supports vector and keyword search and searches within structured data, all in one query. It also integrates machine learning which enables real-time AI insights from their data. It’s proven to handle thousands of queries per second, making it the top-runner for managing large & complex data.

Already used by others including Spotify & Yahoo and with continuous application improvements being delivered, will we continue to see increased use of Vespa?

Read more about Vinted’s migration here.

QueryGBT – allowing easier and faster data analysis for Uber

Uber’s data platform handles a huge 1.2 million interactive queries each month. The idea of QueryGBT is to better manage real-time data analytics & to query massive datasets. A combination of Presto (open-source SQL query engine) and Apache Hudi, which has the capabilities for handling upserts and managing large volumes of data in a cloud-based or distributed environment.

The system is part of Uber’s broader efforts to handle large-scale, real-time data streaming and querying – integral to its data-driven approach to decision-making. It’s said to minimise generating reliable queries from 10 minutes down to 3 which is a massive productivity gain for Uber.

Using advanced AI, QueryGPT fits smoothly into Uber’s data system, cutting query time and increasing accuracy to handle their complex data needs.

Read more about the advancements here.

Dropbox & Lakera Guard securing LLM’s

In a blog posted this September, Dropbox delved into how they’re using Lakera Guard to protect their LLM’s from potential security threats posed by AI.

Citing the importance of maintaining the trust of their millions of users to protect their content, Dropbox talk about how they chose Lakera Guard last year to protect user data & uphold the reliability and trustworthiness of their intelligent features, as outlined in their AI principles.

What were Dropbox looking for in their quest for protection? Their considerations concluded that it has to be deployable on their existing infrastructure, have low latency, strong confidence scores and scope for continuous improvement.

Dropbox have since invested in Lakera Guard, proving their strong belief in it’s abilities. Furthermore, they’ve also collaborated with the teams to develop improvements on the software itself. Working closely with Lakera, Dropbox have been able to help them meet their requirements whilst achieving their own security goals, too!

Read the full blog here.

Conclusion

In today’s fast-paced digital landscape, companies like Vinted, Uber and Dropbox are navigating many complexities. Vinted’s switch to Vespa demonstrates the importance of scalable search engines as companies grow. Uber’s QueryGBT highlights the need for faster and more accurate data analytics. Meanwhile, Dropbox’s partnership with Lakera Guard emphasises the need to secure AI systems to ensure data remains protected as AI technologies advance.

Data Jobs in the Netherlands

Interested in a new adventure within the data world?
For information on data jobs in the Netherlands, get in touch.

Data & AI Report – Data Science Trends July 2024

August 5, 2024

1510

Trends in data science have brought a fresh wave of excitement to the data and analytics landscape this July. We’re seeing major moves towards scalability, efficient governance, and AI capabilities. Additionally, Dr. Randy Olson shows us just how far creative data use can take you—literally! Turns out, data science isn’t just about numbers, it can plan one epic road trip too! 

Firstly, discord’s transition to Dagster and dbt for data orchestration 

This month, Discord announced a major overhaul of their data orchestration infrastructure, moving from their in-house system, Derived, to a combination of Dagster and dbt. As their platform and user base expanded, the need for enhanced self-service capabilities and robust observability became evident. This decision was driven by the necessity for declarative automation, a modern unified interface, reliability on Kubernetes, and seamless integration with existing tools. 

After evaluating open-source options like Argo and Prefect, Discord chose Dagster for orchestration and dbt for data modeling. This transition has enabled them to support over 2,000 dbt tables, enhancing their ability to deliver seamless service and insightful data analytics while scaling efficiently. 

Read about it here

Meta unveils Llama 3.1 

This month, Meta introduced Llama 3.1, a massive leap in open-source AI. The Llama 3.1 405B model brings unmatched flexibility and state-of-the-art capabilities, unlocking new workflows like synthetic data generation and model distillation. Additionally, Meta is enhancing the Llama ecosystem with new security tools and a reference system. Over 25 partners, including AWS and Google Cloud, will offer services from day one. 

 

Llama 3.1 models feature expanded context lengths to 128K, multilingual support, and strong performance across benchmarks. Upgraded 8B and 70B models enhance capabilities in general knowledge, tool use, and translation. 

Read Meta’s full update

Building a data-driven analytics team at DoorDash 

Jessica Lachs, DoorDash’s VP of Analytics & Data Science, shares insights on what it means to be truly data-driven and how to structure an analytics team. Having joined DoorDash as the first General Manager in 2014, Lachs has built the analytics team from the ground up and now leads global analytics, including the Wolt Analytics team post-acquisition. 

Not only does Lachs highlight that the term “analytics” can be ambiguous, encompassing data science, business intelligence, product analytics, machine learning, and BizOps. She also emphasizes that to build a data-driven organisation, founders should focus on desired outcomes rather than semantics. At DoorDash, the role of analytics has evolved with the company’s growth, shifting from gut instinct decisions to data-centric strategies. Initially, DoorDash used quasi-experimental methods due to limited data, but as the company matured, they invested in scalable data models and advanced experimentation capabilities, expanding their analytics scope to drive better decision-making. 

Read the full post here

Databricks’ migration to unity catalog for data governance 

In a recent blog post, the Data Platform team at Databricks shared insights into their migration to Unity Catalog for enhanced data governance. As the company grows, establishing secure, compliant, and cost-effective data operations has become a priority. With thousands of employees analysing data, consistent governance standards are essential, making the migration to Unity Catalog a top priority. 

The blog outlines the challenges and benefits of migrating from the default Hive Metastore (HMS) to Unity Catalog. While HMS lacked fine-grained access controls, lineage support, audit logs, and effective search integration, UC provided these features out-of-the-box. Therefore, the team chose a transformational approach, selectively migrating datasets to establish a structured governance framework. This strategy required more effort initially, but enabled clear data ownership, naming conventions, and intentional access, setting the stage for future governance policies.

Read the blog

Finally, some creative Data use!

Dr. Randy Olson, a full stack data scientist and AI researcher, utilised his expertise in machine learning to develop an optimal search strategy.  

He approached this task using the Traveling Salesman Problem (TSP) algorithm, which aims to find the shortest route that visits each city exactly once and returns to the starting point.  

Dr. Olson applied three specific restrictions:  

  1. The trip must stop in all 48 contiguous U.S. states 
  2. Only visit National Natural Landmarks, National Historic Sites, National Parks, or National Monuments #
  3. Be taken entirely by car without leaving the U.S. 

Want to take the trip? The route spans 13,699 miles and requires 224 hours (or 9.33 days) of driving, assuming no traffic. You can find the full itinerary here. 

Dr Randy Olsen used Data to design the optimum road trip across the U.S. Showing how useful data can be. Data science trends really are everywhere!

Olsen’s epic road trip

To conclude 

July highlighted several key trends in data and analytics. The push for scalability is evident in Discord’s adoption of Dagster and dbt, and Databricks’ migration to Unity Catalog for better data governance. The importance of building effective data teams was underscored by DoorDash’s approach to analytics leadership. Another notable trend is the growing emphasis on enhanced self-service capabilities and robust observability in data platforms. These themes point towards a future focused on scalable infrastructure, efficient governance, structured teams, and innovative AI applications.

If you’re interested in how we can help scale your data team, get in touch.

Data & AI Report – June 2024

July 2, 2024

1510

Welcome to our June Data & AI report!  

We’re covering some exciting news this month… who knew data catalogs could be so competitive? We’ve also got some interesting updates from NVIDIA and Netflix. 

Let’s get stuck in! 

Open Source battles: Databricks open sources Unity Catalog…

…live at the Data & AI summit

Following Snowflake’s announcement to open source their Polaris Catalog “within the next 90 days”, Matei Zaharia, Databricks’ CTO & Cofounder, went one up and opened the repo on his laptop during his Keynote speech at the Data & AI summit, navigated to the “danger zone” and in front of everyone, made the repo public there and then. Making Databricks the first to go open source in the industry. 

Gif of someone eating popcorn wearing 3d glasses. A joke referencing the drama Databricks opensourcing their data catalogs Data & AI Summit before Snowflake launched theirs.

Watch a video of the moment it went live here.

In Databricks’ announcement, they shared the reasoning behind their decision to make this public, explaining that “most data platforms today are walled gardens” going on to say “By open-sourcing Unity Catalog, we are giving organisations an open foundation for their current and future workloads.” 

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training LLMs 

NVIDIA has launched an open synthetic data generation pipeline for training large language models. The Nemotron-4 340B family offers advanced instruct and reward models, along with a dataset for generative AI training.  

NVIDIA's Synthetic Response Data diagram

This system provides developers with a free and scalable solution to create synthetic data for building powerful LLMs, enhancing performance and accuracy. The models are designed to work seamlessly with NVIDIA NeMo and TensorRT-LLM for efficient model training and inference. 

Read NVIDIA’s full blog here.

Netflix share a recap of their Data Engineering Open Forum 

Netflix released a summary this month of the sessions from their Data Engineering Open Forum back in April. (along with recordings of all the talks!) 

One session introduced Netflix’s “Auto Remediation” feature, which uses machine learning to handle job errors more efficiently. Jide Ogunjobi talked about using generative AI to help organizations easily manage and query their large data systems. 

Tulika Bhatt explained how Netflix manages 18 billion daily impressions and the importance of real-time data for recommendations. We found Tulika’s talk particularly interesting as it highlighted the creative solutions Netflix employs to balance scalability and cost while delivering real-time data. 

Jessica Larson shared her experience building a new data platform after GDPR, focusing on data protection and compliance. Clark Wright from Airbnb discussed their new Data Quality Score to improve data quality. 

You can read about, and watch all of the talks here 

How Machine Learning is transforming Online Banking security 

Zachary Amos’ recent blog explores how behavioral biometrics can drastically reduce online banking fraud. This ML-driven technology works in the background, monitoring user behavior like mouse movements and keystrokes to spot anything unusual. It processes data in real-time, handling multiple users at once, making it a more streamlined and user-friendly security solution than traditional Multi-Factor Authentication. 

Zachary’s insights show the power of machine learning in boosting security. As cyber threats become more sophisticated, using technology like this ensures accounts stay secure and protected. 

To conclude 

June has been a month full of exciting open-source updates. Databricks made waves by open-sourcing Unity Catalog live on stage, while NVIDIA launched a synthetic data generation pipeline for training large language models.  

We’re especially interested in these open-source developments. They represent a move towards greater collaboration and accessibility in the tech world. 

Want to discuss how we can help you or your data team? Get in touch, or check out our open roles. 

Data & AI Report – May 2024

June 7, 2024

1510

Welcome to our May Data & AI report! This month, we explore developments in data, AI, and machine learning. From transforming data warehouses to pioneering AI/ML applications.

Discover how industry leaders are pushing the boundaries of what’s possible⤵️

Are data warehouses evolving beyond analytics?

Mikkel Dengsøe, founder of Synq released some interesting research about the evolving role of data warehouses this month. Once the domain of reporting and analytics, we’re increasingly seeing them underpin crucial functions like AI/ML, automated marketing, and regulatory reporting. This evolution ups the stakes significantly and means that data accuracy is a top concern for most companies.

Mikkel also points out how data teams and their stacks are growing rapidly. Companies today manage thousands of models and juggle numerous daily jobs to keep things running smoothly. With more business-critical data and a surge in data assets, effective testing approaches are more vital than ever. It seems that basic tests won’t cut it anymore and that niche solutions will be essential to maintain data reliability.

Chart from Mikkel's blog that shows companies using data warehouses more and more for business critical operations, like AI, ML, Business ops and Reporting.

Data warehouses now have to support business-critical uses like AI/ML, automated marketing, and regulatory reporting.

You can check out Mikkel’s full blog here.

What’s next for Uber’s ML platform, Michelangelo?

In a recent blog from Uber, the company shares the strides it’s made so far in machine learning (ML). Since 2016, Michelangelo, Uber’s centralised ML platform, is leveraging data to drive key functions like ETA predictions, rider-driver matching, rankings, and fraud detection. With around 400 active ML projects and over 5K models in production, Michelangelo manages 20K model training jobs monthly and delivers up to 10 million real-time predictions per second.

Screenshot from Uber's Blog, showing how real-time Machine Learning underpins the UberEats app's core user flow.

Real-time ML underpins Eater app core user flow.

The blog goes on to explain Uber’s plans to use generative AI and large language models (LLMs), with the Gen AI Gateway is at the forefront of its mission. With the aim to aid security, efficiency, and cost-effectiveness.

Read the full blog here.

LinkedIn launches LakeChime

This month, LinkedIn introduced LakeChime, a powerful data trigger service designed to enhance the efficiency of their extensive data lake. Handling billions of data points daily, LakeChime streamlines data processing by unifying data trigger semantics across both modern and traditional table formats like Hive and Iceberg.

Central to LakeChime is the Data Change Event (DCE), which captures updates within data tables and triggers downstream workflows via platforms like dbt or Airflow. This innovation ensures timely data availability and enhances pipeline efficiency.

Looking forward, LinkedIn plans to integrate LakeChime with dbt and Coral to automate incremental view maintenance, simplifying the creation of high-performance data pipelines.

Discover more about LakeChime in LinkedIn’s full blog post.

Comic Strip by Todd Comics, making a joke about data lakes that look well constructed and organised above the surface, but underneath the surface is an angry octopus with the Excel for a head, with the file name 'orders_final.xlsx'.

Spotlight on Slack’s female data engineers

Slack shared a blog last month highlighting the incredible work of their female data engineers across their various data teams.

By optimising data workflows with Apache Airflow and Apache Pinot, ensuring sub-second query latency. Senior Software Engineer, Jessica’s team is migrating from virtual machines to Kubernetes, using custom Python tools and automated deployments to boost efficiency.

Senior Software Engineer, Ramya talks about leading the migration from Spark 2 to Spark 3 on AWS EMR6, explaining how it enhances performance and reduces reliance on legacy systems.

Shrushti, another Senior Software Engineer transitioned Slack’s data ingestion from Secor to Bedrock and is now moving to Kafka Connect for real-time streaming. A shift that aligns with industry standards and improves system adaptability.

It’s a really interesting read and shines a light on Slack’s dedication to diversity and inclusion, as well as some of the incredible ways they’re using data. Read the full blog post to meet more inspiring engineers and discover the innovative projects shaping the future at Slack.

In conclusion…

As data continues to grow in volume and complexity, the strategies and technologies we employ must evolve. How will these innovations from Synq, Uber, LinkedIn, and Slack shape your business?

To stay ahead, organisations must keep pace with technological advancements with a culture of continuous learning and adaptation.

Want to discuss how we can help you or your data team? Get in touch, or check out our open roles.

Data & AI Report – April 2024

May 1, 2024

1510
Welcome to our first monthly update on data and AI. No need to scroll endlessly through news sites, we’ve compiled the month’s must-know developments right here!

April saw important developments in technology, highlighting investments and partnerships that emphasize the Netherlands’ involvement in the tech sector.

Google’s €640 Million Dutch Data Centre Project

Google announced a €640 million investment in a new data centre in Groningen, creating 125 jobs. This adds to Google’s total investment of over €3.8 billion in Dutch digital infrastructure since 2014. Read more

KLM Partners with Utrect University AI Labs

KLM Royal Dutch Airlines is collaborating with Utrecht University’s AI Labs to refine operational efficiency and minimize disruptions.

PhD students are developing algorithms to optimize crew and aircraft scheduling, and improve ground processes like baggage handling and passenger boarding. This partnership aims to enhance KLM’s ability to quickly adapt to changes, ensuring smoother operations and prioritising flights effectively through data. Read more

Google Launches Training Programs for AI, Cybersecurity, and Data Analytics

The U.S. Treasury and Google Cloud are partnering to boost data analytics and cybersecurity hiring, aligning with President Biden’s AI Executive Order.

New training programs, accessible via YouTube and Google Cloud Skills Boost, include courses on generative AI, cybersecurity, and data analytics, will equip individuals with the skills needed for digital transformation in the public sector.

Learners also get free access to generative AI tools, including Google’s interview prep tool, Interview Warmup. Read more

Gif showing Google Cloud's new Generative AI Interview Warmup tool.

Source: Google Cloud

AI Breakthrough in Breast Cancer Risk Assessment

Danish and Dutch researchers have advanced breast cancer risk assessment by combining an AI diagnostic tool with a mammographic texture model, under the leadership of Dr. Andreas D. Lauritzen.

This integrated approach improves the prediction of both short- and long-term breast cancer risks, identifying high-risk women more effectively. The innovation promises earlier cancer detection and could alleviate the strain on healthcare systems caused by a shortage of specialist breast radiologists. Read more

These developments underscore a growing need for expert Data, AI, and ML talent. Reach out to discuss how we can help to drive your innovation forward.

contact our team.