Entrepreneurship

Data is everywhere - but can we trust it?

Data is vital to most business activities; it's among companies most valuable resource,especially in the age of AI. Which is precisely why we must avoid falling into uncritical use of bits and bytes. The rule still holds: junk in, junk out. Low-quality datasets mislead even the best algorithms, potentially resulting in higher costs and poor outcomes.

  • Date
  • Temps de lecture 5 minutes

In the age of big data and AI, data determine not only opportunities but also risks for businesses. © istock/MF3d

Summary

  • Data has become a central resource of the economy and is growing exponentially - driven by digitalisation, the Internet of Things and artificial intelligence.
  • The economic value of data depends critically on its quality - flawed datasets can lead to misjudgements and significant financial losses.
  • As data usage increases, so do risks and complexity - ranging from data protection and regulation to difficult-to-integrate systems.
  • Artificial intelligence amplifies existing challenges and introduces new ones - including hallucinations, biased training data and a growing disconnect from reality.

"The world's most valuable resource is no longer oil, but data." This is how The Economist declared the dawn of a new data age on 6 May, 2017. Today, data has become even more central, with the volume of raw data well over 1000 % greater than when article appeared. A 2017 study by the International Data Corporation (IDC) estimated a total of 16.1 zettabytes of stored data on internet-connected servers. For 2025, the study projected 175 zettabytes - a figure later revised upwards by Forbes to 181 zettabytes worldwide. 

To put this into perspective: one zettabyte equals 10²¹ bytes - that is, a one followed by 21 zeros, or one sextillion bytes. That a single byte consists of 8 bits and originally encoded a single character as the smallest unit of computer storage hardly carries any weight at this scale. And these are only stored data. According to the International Data Corporation, roughly 2.5 quintillion bytes (a 2.5 followed by 18 zeros) circulate across the internet every day - orders of magnitude that are difficult to grasp.

Behind the data economy lie vast infrastructures - they enable growth, but also drive resource demand and complexity. © Shutterstock/cybrain

Not only is data growing rapidly (by over 20 % annually according to IDC), business connected with data is also booming. Industry analyst Netguru forecasts the big data market to grow from just under USD 200 billion in 2024, to over USD 500 billion by 2032.  Expectations of rapid growth and heavy investment in the AI industry - from chip manufacturers like Nvidia, to AI tools like OpenAI's ChatGPT, Anthropic's Claude, Google's Gemini, and xAI's Grok, are fuelling equity markets worldwide.

At the same time, the data business itself is driving the creation of new data. Digital products like smartphones, cars, TVs, and computers constantly capture data on consumption behaviour, feeding it back as customer information and user profiles. Unlike fossil fuels, raw data doesn't look threatened by scarcity, and the growth of AI will keep this trend accelerating. It is hardly surprising that companies like Apple and OpenAI have announced investments of USD 500 billion each for the construction of new data centres over the next four years.  

Nevertheless, there are important questions to answer. For example, how will we meet the enormous energy demands of data centres, server farms, and AI applications? The MIT Technology Review estimates that by 2028, AI applications alone could consume as much electricity as 22 % of all US households. However these challenges are addressed, one thing is clear: in a fully digitised economy, nothing works without data. Those who use it intelligently and quickly increase their competitiveness. Data is not only a highly valuable resource, it's also a key corporate asset and driver of growth.

But the promise of limitless data growth also entails risks for businesses. Increasing dependence on data implies constraints on decision-making and a loss of control. So it's worth taking a look at some of the fundamental problems presented by the data world. Numerous data disasters, where the handling of data, and blind trust in it, led to costly consequences, show just how real the risks are and why a degree of caution is advisable from a business perspective.

Problem #1: Garbage in, Garbage out

The quality, completeness, and timeliness of data are not a given. If data is stored redundantly or duplicated, is incorrectly categorised, or is incomplete or outdated, the  software and algorithms using it will run into problems. Poor data quality leads inevitably to poor outputs; hence the well-known phrase "garbage in, garbage out".

One example is provided by Unity Technologies,  which sells technologies and applications for game developers and players. In spring 2022, its "Audience Pinpointer" product, designed to help developers acquire new players and place personalised advertising, ingested a large volume of poor-quality data from a third-party provider. Because the machine-learning algorithms were trained incorrectly by the poor data, this led to faulty predictions about potential future players, and advertising that was incorrectly tailored to active users. The damage caused by the resulting lost revenue, reputational harm among partners and developers, and the cost of reprogramming the tool, amounted to around USD 110 million. Unity's share price fell by 37 % - all because a faulty dataset entered the system.

Data is only as valuable as its quality - and can quickly become a risk when errors occur.

Another example is Equifax, a US data analytics company that assesses consumer creditworthiness. When determining customers credit scores, which form the basis for granting or denying loans such as for car or home purchases, and for calculating interest rates, Equifax made a coding error that went unnoticed and supplied incorrect data for three weeks in spring 2022. Around 300,000 people were affected by misjudgements of their creditworthiness. After the error became public Equifax's share price fell by 5 %. Shortly thereafter, the company faced a class-action lawsuit from a consumer who had been denied a car loan, with significant consequences for the company's credibility.

Problem #2: Data privacy, protection, and international law

In the case of Equifax, the data error weighed all the more heavily as it followed an earlier incident. In 2017, sensitive personal information on nearly 150 million Equifax customers was exposed in a data breach. To settle the class-action lawsuit resulting from this data protection violation, Equifax has had to set aside USD 700 million for those affected.

But when it comes to breaking international law governing the use of data, Equifax is not even the most prominent case. Due to its unlawful (under EU law) transfer of user data to the United States, Meta, the parent company of Facebook and Instagram, was fined USD 1.3 billion in Ireland. Amazon, TikTok, Didi (a Chinese Uber), and T-Mobile have also been fined hundreds of millions, or forced into out-of-court settlements. Inadequate data security and protection, as well as regulatory violations, can become costly and damaging to businesses through both reputational harm and financial penalties.

Problem #3: Complexity + incompatibility = data chaos

AI models recognise patterns in data – but flawed or biased training data can lead to misleading, "hallucinated" outputs. © Shutterstock/ARTEMENKO VALENTYN

Less spectacular, but still highly relevant, are problems that arise from the complexity and specialisation of data-processing systems. To illustrate: the seamless analysis and integration of data on customers, consumption behaviour, marketing, supply chains, production, and inventories into business processes is highly complex. It often requires data scientists, specialist programmers, and platform engineers to collect, measure, and continuously optimise and update data.

Data is no longer found in simple spreadsheets and databases. It moves through so-called "pipelines", or is stored in "data lakes", "clouds", or "warehouses". Goodbye Excel! Today, some tasks are outsourced to large industry providers, some are handled by in-house legacy systems, and others are performed using machine learning and large language models (LLMs) and AI. Coordinating these data-processing systems is a mammoth task for management - and can rapidly become inefficient and costly.

Upgrading or implementing new software systems is rarely simple or easy. If new versions of CRM (customer relationship management) or ERP (enterprise resource planning) systems do not function as fully or quickly as needed, supply chains, inventory processes, and HR planning can be thrown off course. Failed implementation can cost multiples of the originally budgeted software upgrade. There are plenty of horror stories: a city planning project in Birmingham, UK, ultimately cost taxpayers GBP 90 million more than planned; an avocado supply system at Mission Produce, USA, collapsed due to data chaos, resulting in losses of USD 22.5 million; and German supermarket chain Lidl had to abandon a system overhaul after spending EUR 500 million on it because the new system valued inventory at purchase prices, while the old system used selling prices - an inconsistency that could not be reconciled.

Problem #4: From AI hallucinations to the curse of recursion

The training of LLMs is specifically aimed at high-level pattern recognition, and enabling generative conversation through pre-training, fine-tuning, and human feedback. It involves ingesting vast quantities of text and data, and combining this with autonomous learning and behaviourist conditioning to deliver human-like, ethical responses. In ideal cases, the process should eliminate individual data errors. Nevertheless, LLMs are not immune to mistakes, resulting in incorrect or fabricated outputs.

Examples are readily available. For instance, if human trainers intervene too strongly in shaping the ethical behaviour of chatbots, this can lead to absurd outputs. In 2024, Google's AI Gemini, when asked to generate images of historical events, produced strikingly inaccurate results. Presumably in an attempt to follow contemporary ethical principles regarding ethnic and gender equality, it depicted Wehrmacht soldiers in the Second World War as Asian women or Black people, US presidents as native Americans, and the Pope as a woman. Human-imposed diversity, equity and inclusion principles guided pattern recognition and generation, but collided with historical and empirical realities. 

Hallucinations were also at play when an AI generated a recommended summer reading list for a journalist at the Chicago Sun-Times, and invented a number of book titles. These fabricated "beach reads" were then published without verification. More worryingly, in 2023, when lawyer Steven Schwartz relied on a chatbot for legal research, it presented him with a series of non-existent precedents. He stated later that he had been unaware that ChatGPT could mislead him. After the hallucinated basis of his pleading was exposed, the court imposed a fine of USD 5,000. His reputation suffered a far greater loss.

According to Google, hallucinations can be caused by faulty or biased training data, or a lack of understanding of real-world information, physical properties, or factual knowledge. The loss of a reliable knowledge base poses a significant risk for data-driven business.

Problem #5: Epistemic loops, epistemic crisis

Beyond LLM hallucinations or overly human biases, researchers also warn of a problem inherent in AI itself: circularity. Roberto Simanowski, currently a Distinguished Fellow at the Free University of Berlin, describes in his recent book "Sprachmaschinen. Eine Philosophie der künstlichen Intelligenz" (C.H. Beck, 2025) a phenomenon known in various forms as the "curse of recursion", or the Ouroboros effect (named after the ancient image of a snake devouring its own tail), or "text incest"

The underlying idea is as follows: since generative AI continuously and iteratively optimises its knowledge on the basis of existing information, which may itself have been generated by AI, rare phenomena, assessed as less probable, gradually fall out of the AI's representation of the world. In this recursive process, in which AI learns from content generated by AI, the system effectively begins to feed on itself. Less widely disseminated knowledge held by experts, historians, or niche specialists, as well as exceptions to general rules, disappear from the body of knowledge through recursive data cleansing.

Such a closed AI loop could lead to an impoverishment of available knowledge if we rely exclusively on AI-optimised data. We thus run the risk of losing parts of the real world without even being aware of it. This, in turn, increases the likelihood of being caught off-guard by what's known as a black swan: a statistically unlikely event with potentially far-reaching consequences.

Reality cannot be fully captured in data - it must be continually reassessed. © istock/SolStock

Human judgement will remain essential

Data chaos, hallucinations, circularity, recursion, and black swans have one thing in common: these failures arise when we lose sight of the real world and become absorbed in the world of data alone. While scientific theories are tested against reality through empirical and natural-science methods, the need for real-world validation doesn't appear to be as well understood when it comes to our new and seemingly inexhaustible data resources. Control mechanisms fall short, or are lacking altogether, allowing faulty training data to find its way even into simple data-processing tasks. And datasets created for a specific purpose and processed with particular algorithms may simply be too biased and too random. They may reflect only partial aspects of reality, while losing sight of the real world.

Blind trust in data can obscure the view of reality.

How best then to deal with data and avoid drifting into the problem areas outlined above? Even if the much-heralded "artificial general intelligence" were one day to become a perfectly functioning reality, the use of data must be guided by rigorous scrutiny of data quality and relevance, and careful assessment of how the algorithms function. AI will only truly succeed if the modelling of the data world is continuously informed by up-to-date, real-world knowledge, and by sound human judgement.

Investment strategy

Meet your new AI agent - an autonomous self-starter

Agentic AI - the latest evolution of this important technology - allows software to operate more independently.
A young Caucasian man with a dark beard is standing on a rooftop terrasse fronting the camera.
Entrepreneurship

GenAI at LGT: "We will evolve together"

As Head of Data and Innovation, Simon Gomez is responsible for generative artificial intelligence (GAI) at LGT Private Banking. In this interview, he talks about the benefits and limitations of GAI, and why humans (and our faith in them) remain a top priority at LGT.
Investment strategy

The AI barbell: short-term enablers vs long-term adopters

With data centres popping up everywhere and AI evolving rapidly, the future looks bright for both those companies enabling AI today and those integrating the technology in the future.
A black and white scene from the movie Metropolis shows a woman's head wearing a helmet with wires attached.
Entrepreneurship

Artificial intelligence and the art of imitation

The evolution of AI - humankind's attempt to artificially imitate its own intelligence.
A man sitting on a table and smiles
Entrepreneurship

"AI will disrupt every industry"

Entrepreneur and computer scientist Richard Socher expects that rapid advances in artificial intelligence will upend the way we search and, together with automation, leave almost no business untouched.
Investment strategy

Have digital assets gone mainstream?

Once laughed at, cryptocurrencies have now become part of the investment toolkit. Is this the moment for even the most sceptical investor to take a new look at digital assets?
A face can be seen in an electronic device, while in the background a poster refers to AI.
Investment strategy

How to protect yourself against AI-fuelled misinformation

Artificial intelligence makes it increasingly hard to filter information when making investment decisions. This could lead to disastrous consequences. Fortunately, there are ways to counter the AI challenge and ensure safe outcomes.
Investment strategy

Powering the AI revolution

More energy infrastructure is needed to support the rising demand for data centre capacity. But rapid power expansion will have to be compatible with decarbonisation commitments.
Entrepreneurship

Poisoning the AI well: the real dangers of model collapse

Experts are sounding the alarm about large language models increasingly being trained on synthetic or intentionally false data. This could have disastrous real-world consequences.
Investment strategy

Who has the edge in the AI race - China or the USA?

The two superpowers are vying for leadership in technologies that have the potential to reshape the global balance of power. There are striking differences in their approaches to AI.