Data & AI Report – Data Trends from November 2024

December 10, 2024

1510

November’s data trends see a huge focus on AI developments & investments. 

We’ve delved into some of the biggest movements below – check them out 👇 

Big Tech’s Big Bets on AI 

In 2024, the world’s leading tech giants—including Microsoft, Amazon, Alphabet, and Meta—are significantly increasing their investments in AI, with their combined AI spending projected to exceed $240 billion this year. The demand for AI-powered tools continues to skyrocket, driving this unprecedented level of investment. 

With forecasts suggesting that AI could contribute an additional $20 trillion to the global economy by 2030, it’s no surprise that the largest corporations are heavily investing in this space. Beyond the economic potential, such investments create additional revenue opportunities. For example, Microsoft’s AI products are expected to generate $10 billion annually. 

For Big Tech, AI transcends being just a trendy concept; it’s a strategic, long-term commitment poised to transform entire industries.  

Samsung’s Second-Generation AI Model: Gauss2 

In November, Samsung held its virtual Samsung Developer Conference Korea 2024, where it unveiled its latest software innovations and future-focused vision. 

A highlight of the conference was the introduction of Samsung Gauss2, the second-generation AI model that promises improved performance, efficiency, and broader applications. Gauss2 is a multimodal AI system capable of processing language, code, and images, and is available in three versions tailored for different use cases: 

  • Compact: Optimized for on-device use in environments with limited computing resources, maximizing device performance. 
  • Balanced: Offers a blend of performance, speed, and efficiency suitable for various tasks. 
  • Supreme: Provides top-tier performance with Mixture of Experts technology, which reduces computational costs during training and inference while maintaining high efficiency. 

Samsung aims to boost productivity through Gauss2 by enhancing its code.i service, enriching the natural language Q&A capabilities of the Samsung Gauss Portal, and supporting multimodal functions such as table and chart analysis and image creation. 

Read more from Samsung here. 

LinkedIn & The Journey to Their GenAI Tech Stack 

LinkedIn has published a detailed blog post that explores the evolution of its reimagined product portfolio and how Generative AI has been integrated into its features. They identify several key focus areas and insights gained from developing their GenAI capabilities: 

  • Prompt Management: Efficiently managing prompts at scale requires systems for templating, versioning, and structuring to support complex applications 
  • Task Automation via Skills: GenAI-driven task automation can unlock significant value but demands advanced tools to scale effectively 
  • Contextual Awareness & Personalisation: Memory plays a crucial role in personalising GenAI experiences and must be thoughtfully integrated into the tech stack 
  • Model Inference & Fine-Tuning: Balancing quality, cost, and latency requires flexible infrastructure that accommodates various models and use cases 
  • Migration to the New Stack: Adopting new technologies through incremental migration and cross-training is essential for stability 

Since the introduction of their GenAI tools in early 2023, LinkedIn has continued to refine these capabilities, moving towards their vision of a robust GenAI tech stack that balances rapid development with long-term scalability. 

Read the full blog from LinkedIn here. 

Google’s Accidental Unveiling of Project Jarvis 

Google appears to be developing an advanced AI assistant that goes beyond traditional chatbots and voice assistants. Known as Project Jarvis, this initiative aims to perform tasks autonomously, rather than waiting for user commands. It can manage tasks such as handling emails, conducting research, and scheduling appointments, setting itself apart by proactively collaborating with users as a digital partner. 

While the technology holds exciting potential, it also raises important questions. On the skeptical side, concerns include the potential for job automation, which could impact routine roles, and security risks, given that Jarvis would have access to users’ sensitive data. However, the technology could also provide significant benefits, enhancing accessibility for people with disabilities and those with busy, on-the-go schedules. 

We’re excited to see where this innovative project leads! 

Conclusion 

As we move through 2024, one thing is clear: AI has firmly cemented itself as the cornerstone of innovation and economic growth. From groundbreaking AI models like Samsung Gauss2 to LinkedIn’s evolution of its product tech stack, and even Google’s latest AI assistant, the landscape is rapidly evolving, and the world’s leading tech companies are investing billions to aid in further development. 

Check back next month to see how we round up the data trends for the end of 2024. 

Data & AI Report – Data Science Trends September 2024

October 9, 2024

1510

September trends in data science saw introductions into advanced search engines, internal developments & data protection from potential AI threats.

We’ve covered some exciting news from some big names.

Read on for more on all these exciting developments!

Vinted migrate to Vespa – how the online second hand shopping phenomenon is keeping up with the growth & complexity of data

After hitting the limits of their previous search engine, Vinted were on a mission to find a more scalable alternative.

Introducing Vespa: an open-source search engine & vector database. Vespa supports vector and keyword search and searches within structured data, all in one query. It also integrates machine learning which enables real-time AI insights from their data. It’s proven to handle thousands of queries per second, making it the top-runner for managing large & complex data.

Already used by others including Spotify & Yahoo and with continuous application improvements being delivered, will we continue to see increased use of Vespa?

Read more about Vinted’s migration here.

QueryGBT – allowing easier and faster data analysis for Uber

Uber’s data platform handles a huge 1.2 million interactive queries each month. The idea of QueryGBT is to better manage real-time data analytics & to query massive datasets. A combination of Presto (open-source SQL query engine) and Apache Hudi, which has the capabilities for handling upserts and managing large volumes of data in a cloud-based or distributed environment.

The system is part of Uber’s broader efforts to handle large-scale, real-time data streaming and querying – integral to its data-driven approach to decision-making. It’s said to minimise generating reliable queries from 10 minutes down to 3 which is a massive productivity gain for Uber.

Using advanced AI, QueryGPT fits smoothly into Uber’s data system, cutting query time and increasing accuracy to handle their complex data needs.

Read more about the advancements here.

Dropbox & Lakera Guard securing LLM’s

In a blog posted this September, Dropbox delved into how they’re using Lakera Guard to protect their LLM’s from potential security threats posed by AI.

Citing the importance of maintaining the trust of their millions of users to protect their content, Dropbox talk about how they chose Lakera Guard last year to protect user data & uphold the reliability and trustworthiness of their intelligent features, as outlined in their AI principles.

What were Dropbox looking for in their quest for protection? Their considerations concluded that it has to be deployable on their existing infrastructure, have low latency, strong confidence scores and scope for continuous improvement.

Dropbox have since invested in Lakera Guard, proving their strong belief in it’s abilities. Furthermore, they’ve also collaborated with the teams to develop improvements on the software itself. Working closely with Lakera, Dropbox have been able to help them meet their requirements whilst achieving their own security goals, too!

Read the full blog here.

Conclusion

In today’s fast-paced digital landscape, companies like Vinted, Uber and Dropbox are navigating many complexities. Vinted’s switch to Vespa demonstrates the importance of scalable search engines as companies grow. Uber’s QueryGBT highlights the need for faster and more accurate data analytics. Meanwhile, Dropbox’s partnership with Lakera Guard emphasises the need to secure AI systems to ensure data remains protected as AI technologies advance.

Data Jobs in the Netherlands

Interested in a new adventure within the data world?
For information on data jobs in the Netherlands, get in touch.

Data & AI Report – Data Science Trends July 2024

August 5, 2024

1510

Trends in data science have brought a fresh wave of excitement to the data and analytics landscape this July. We’re seeing major moves towards scalability, efficient governance, and AI capabilities. Additionally, Dr. Randy Olson shows us just how far creative data use can take you—literally! Turns out, data science isn’t just about numbers, it can plan one epic road trip too! 

Firstly, discord’s transition to Dagster and dbt for data orchestration 

This month, Discord announced a major overhaul of their data orchestration infrastructure, moving from their in-house system, Derived, to a combination of Dagster and dbt. As their platform and user base expanded, the need for enhanced self-service capabilities and robust observability became evident. This decision was driven by the necessity for declarative automation, a modern unified interface, reliability on Kubernetes, and seamless integration with existing tools. 

After evaluating open-source options like Argo and Prefect, Discord chose Dagster for orchestration and dbt for data modeling. This transition has enabled them to support over 2,000 dbt tables, enhancing their ability to deliver seamless service and insightful data analytics while scaling efficiently. 

Read about it here

Meta unveils Llama 3.1 

This month, Meta introduced Llama 3.1, a massive leap in open-source AI. The Llama 3.1 405B model brings unmatched flexibility and state-of-the-art capabilities, unlocking new workflows like synthetic data generation and model distillation. Additionally, Meta is enhancing the Llama ecosystem with new security tools and a reference system. Over 25 partners, including AWS and Google Cloud, will offer services from day one. 

 

Llama 3.1 models feature expanded context lengths to 128K, multilingual support, and strong performance across benchmarks. Upgraded 8B and 70B models enhance capabilities in general knowledge, tool use, and translation. 

Read Meta’s full update

Building a data-driven analytics team at DoorDash 

Jessica Lachs, DoorDash’s VP of Analytics & Data Science, shares insights on what it means to be truly data-driven and how to structure an analytics team. Having joined DoorDash as the first General Manager in 2014, Lachs has built the analytics team from the ground up and now leads global analytics, including the Wolt Analytics team post-acquisition. 

Not only does Lachs highlight that the term “analytics” can be ambiguous, encompassing data science, business intelligence, product analytics, machine learning, and BizOps. She also emphasizes that to build a data-driven organisation, founders should focus on desired outcomes rather than semantics. At DoorDash, the role of analytics has evolved with the company’s growth, shifting from gut instinct decisions to data-centric strategies. Initially, DoorDash used quasi-experimental methods due to limited data, but as the company matured, they invested in scalable data models and advanced experimentation capabilities, expanding their analytics scope to drive better decision-making. 

Read the full post here

Databricks’ migration to unity catalog for data governance 

In a recent blog post, the Data Platform team at Databricks shared insights into their migration to Unity Catalog for enhanced data governance. As the company grows, establishing secure, compliant, and cost-effective data operations has become a priority. With thousands of employees analysing data, consistent governance standards are essential, making the migration to Unity Catalog a top priority. 

The blog outlines the challenges and benefits of migrating from the default Hive Metastore (HMS) to Unity Catalog. While HMS lacked fine-grained access controls, lineage support, audit logs, and effective search integration, UC provided these features out-of-the-box. Therefore, the team chose a transformational approach, selectively migrating datasets to establish a structured governance framework. This strategy required more effort initially, but enabled clear data ownership, naming conventions, and intentional access, setting the stage for future governance policies.

Read the blog

Finally, some creative Data use!

Dr. Randy Olson, a full stack data scientist and AI researcher, utilised his expertise in machine learning to develop an optimal search strategy.  

He approached this task using the Traveling Salesman Problem (TSP) algorithm, which aims to find the shortest route that visits each city exactly once and returns to the starting point.  

Dr. Olson applied three specific restrictions:  

  1. The trip must stop in all 48 contiguous U.S. states 
  2. Only visit National Natural Landmarks, National Historic Sites, National Parks, or National Monuments #
  3. Be taken entirely by car without leaving the U.S. 

Want to take the trip? The route spans 13,699 miles and requires 224 hours (or 9.33 days) of driving, assuming no traffic. You can find the full itinerary here. 

Dr Randy Olsen used Data to design the optimum road trip across the U.S. Showing how useful data can be. Data science trends really are everywhere!

Olsen’s epic road trip

To conclude 

July highlighted several key trends in data and analytics. The push for scalability is evident in Discord’s adoption of Dagster and dbt, and Databricks’ migration to Unity Catalog for better data governance. The importance of building effective data teams was underscored by DoorDash’s approach to analytics leadership. Another notable trend is the growing emphasis on enhanced self-service capabilities and robust observability in data platforms. These themes point towards a future focused on scalable infrastructure, efficient governance, structured teams, and innovative AI applications.

If you’re interested in how we can help scale your data team, get in touch.

Data & AI Report – April 2024

May 1, 2024

1510
Welcome to our first monthly update on data and AI. No need to scroll endlessly through news sites, we’ve compiled the month’s must-know developments right here!

April saw important developments in technology, highlighting investments and partnerships that emphasize the Netherlands’ involvement in the tech sector.

Google’s €640 Million Dutch Data Centre Project

Google announced a €640 million investment in a new data centre in Groningen, creating 125 jobs. This adds to Google’s total investment of over €3.8 billion in Dutch digital infrastructure since 2014. Read more

KLM Partners with Utrect University AI Labs

KLM Royal Dutch Airlines is collaborating with Utrecht University’s AI Labs to refine operational efficiency and minimize disruptions.

PhD students are developing algorithms to optimize crew and aircraft scheduling, and improve ground processes like baggage handling and passenger boarding. This partnership aims to enhance KLM’s ability to quickly adapt to changes, ensuring smoother operations and prioritising flights effectively through data. Read more

Google Launches Training Programs for AI, Cybersecurity, and Data Analytics

The U.S. Treasury and Google Cloud are partnering to boost data analytics and cybersecurity hiring, aligning with President Biden’s AI Executive Order.

New training programs, accessible via YouTube and Google Cloud Skills Boost, include courses on generative AI, cybersecurity, and data analytics, will equip individuals with the skills needed for digital transformation in the public sector.

Learners also get free access to generative AI tools, including Google’s interview prep tool, Interview Warmup. Read more

Gif showing Google Cloud's new Generative AI Interview Warmup tool.

Source: Google Cloud

AI Breakthrough in Breast Cancer Risk Assessment

Danish and Dutch researchers have advanced breast cancer risk assessment by combining an AI diagnostic tool with a mammographic texture model, under the leadership of Dr. Andreas D. Lauritzen.

This integrated approach improves the prediction of both short- and long-term breast cancer risks, identifying high-risk women more effectively. The innovation promises earlier cancer detection and could alleviate the strain on healthcare systems caused by a shortage of specialist breast radiologists. Read more

These developments underscore a growing need for expert Data, AI, and ML talent. Reach out to discuss how we can help to drive your innovation forward.

contact our team.