Written by

Published

Category

Data is the revolutionary tool of our age, but it is not the objective measure many think: it is inextricably tied up in bias, prejudice and subjectivity

We live in a data-fixated age 鈥 the聽, as some have called it 鈥 one in which the consensus seems to be that data can solve all of our problems. It is true we have access to an unprecedented scale and depth of information; this is in fact one of the key challenges we face.

But more on that later. Let鈥檚 start by considering the very nature of the data at our disposal. We collect it through scientific means, using experiments that yield objective findings. The assumption is that, if you design the experiment correctly, ask the right questions, or measure the right behaviour, you will gain insights into human behaviour: how they react to a product or even a medicine, how they behave in certain situations, or what they value or believe.

This, however, is intellectually lazy because bias can be extremely difficult to circumvent in the process of collecting, measuring or reading the data from experiments. You鈥檒l be familiar with the placebo effect in medicine: in business terms, we might consider聽, which highlights the difficulty of conducting large-scale workplace experiments due to the changed behaviours of those aware an experiment is taking place.

But this effect was named after a series of experiments conducted in the 1930s 鈥 surely, you might say, with the scope of big data, we needn鈥檛 worry about such clouding of the dataset. Bias, however, applies even with an infinite quantity of data. Data points are not inert, or exogenous 鈥 they are not chosen by nature, but always by the analyst.

Tech companies are run by intelligent people 鈥 would it be a huge surprise if we saw some clever circumvention of the new regulations?

People then assume the problem is not with the experiment itself, but with the sample. If only they could get the perfect sample, then the experiment would yield results. It is not so simple, though 鈥 the bias can equally well exist in the methodology; methodologies that do not take into consideration that people have expectations and emotions.

To give an example of how results can be affected,聽聽from Princeton. When Americans were asked if they felt the government spent too little on 鈥渨elfare鈥, only 25 per cent said they did. Swap welfare for 鈥渁ssistance to the poor鈥, however, and the figure shot up to 65 per cent.

This is not to say is impossible to gain truly objective insights, only that understanding why people act as they do requires rigorous analysis of the entire pattern of behaviour.

Humans have intuition: robots don鈥檛

For managers and decision makers in organisations of any size, it is deeply problematic to set one鈥檚 stall on decisions made purely on the basis of so-called objective data. What鈥檚 the solution, then? Well, actually it鈥檚 quite simple:聽, or sometimes even in lieu of it, the best bet can be to place faith in the intuition gained from your experience (or from someone who has it if you do not).

Think of the scope of data held by the likes of Google or Facebook, which now includes your actual face

Steve Jobs, problematic a figure as he might have been,聽聽in product innovation or development. This might be read as an awareness of the fallibility of data, which could only be coloured by bias and limited by the imaginative scope of the potential customer base. The latter could scarcely be expected to know they wanted something they were as yet unaware existed.

The same challenge applies also to AI and machine learning. The latter is limited to past data, so what can be done by AI without humans is, by definition, limited. It is only from humans that new ways of thinking, which can give rise to truly original ideas, can arise.

Artificial intelligence might then be viewed as a potential impediment to true innovation. It also has huge ramifications for employment at all levels 鈥 a problem that is often acknowledged,聽, as yet,聽. Which leads to the second half of our case鈥

Data as the abuse of power

Leaving aside for one moment the difficulty of gaining access to completely objective and reliable data, let鈥檚 consider another related though somewhat different phenomenon. It is a key danger associated with the forensic levels of data gathering taking place in the modern world: privacy.

The potential customer base could scarcely be expected to know they wanted something they were as yet unaware existed

Perhaps you鈥檒l already be familiar with the story of American retail store Target鈥檚聽. This story broke in 2012 鈥 the relative dark ages in terms of data science.

Today, we see companies promising to聽聽or assess your susceptibility to certain medical conditions using nothing more than a tiny sample of your DNA. Amazing, no doubt, but who else might be interested in such data? What about insurance companies, for whom such deep genetic information would be of no small interest?

That鈥檚 something into which one must quite proactively opt, but what about the gradual spread of the Internet of Things? Devices wired to respond to your needs in real time 鈥 and to build up a repository of data on your behaviour to feedback to the manufacturers.聽聽said manufacturers are somewhat coy about what exactly is being done with this information.

Without wanting to get carried away, when you have devices that can monitor your vital signs, collecting data that can give indications of your mental health, it鈥檚 important to ensure that information doesn鈥檛 fall into unscrupulous hands. Think of the scope of data held by the likes of Google or Facebook (we hardly need reminding), which now includes聽.

Data points are not inert, or exogenous 鈥 they are not chosen by nature, but always by the analyst

And let鈥檚 not discount聽. Perhaps GDPR will ameliorate the situation, but tech companies are run by intelligent people 鈥 would it be a huge surprise if we saw some clever circumvention of the new regulations? It鈥檚 not just business either: certain governments and public bodies also hold what might be considered a worrying amount of information. China鈥檚聽聽or the unregulated, error-prone聽聽(in which 50 per cent of adults unwittingly feature) are two such examples.

It would be wrong to say we should shy away from data; when collected and analysed scientifically聽it can yield insights and connections that would have previously been impossible to glean. It鈥檚 also wrong to get carried away and fail to take a considered approach to the collection and usage of data: there are ramifications for the economy, for privacy, and even potentially for democracy. We also need to think carefully about the way we collect, measure, and read data 鈥 because as much as we assume data is immutable, where it comes to measuring human behaviour in particular, it is intrinsically subject to bias, prejudice and subjectivity.

Written by

Published

Category

Gilles Chemla Headshot

About Gilles Chemla

Professor of Finance, Co-Director of the Centre for Financial Technology
Gilles is a Professor of Finance at Imperial Business School, a research fellow at Centre National de la Recherche Scientifique, a research fellow at Centre for Economic Policy Research (CEPR), a programme director at CEPREMAP (a French equivalent to CEPR), and a member of the American Finance Association, American Economic Association, Western Finance Association, and European Finance Association. He has also worked in corporate finance at BNP Paribas, as an independent consultant for a variety of corporate, financial, and governmental institutions and professional and international organisations, and as an Assistant Professor of Finance at the Sauder School of Business, University of British Columbia. Gilles holds a PhD in economics from the London School of Economics, an MSc in economics from the Paris School of Economics, a degree in mathematics from the University Paris-Diderot, and is a graduate engineer from the Ecole Nationale des Ponts et Chauss茅es

Read for more information and publications.

Monthly newsletter

Receive the latest insights from Imperial Business School