If you're not quite sure what ‘big data’ means, you’re not alone. In fact, the definition itself is something of a moving target: the phrase is used in many different contexts to emphasise different aspects of data science, and the criteria used to define it are continually being updated as technology advances.
“Very often, ‘big data’ is equated with ‘big brother’. It is also seen as something quite monolithic. But it is very multifaceted – in different domains, it means different things,” explains Christian Jensen, president of the steering committee for the Swiss National Science Foundation project NRP 75: Big Dataexternal link.
But there are some common themes, most obviously volume: at the most basic level, big data refers to very large quantities of digital information – or data. And with modern technologies like fast internet, smartphones and satellites, we’re generating several billion gigabytes of data every single day.
Breaking down big data jargon
An algorithm is a set of steps or a procedure that a computer uses – often repeatedly – to solve a problem. Machine learning is a branch of computer science in which computer programs are developed that can use algorithms to “learn”, by adapting their operation when new data is encountered. Machine learning is considered to be one approach to artificial intelligence, the simulation of human intelligence by machines.
This explosion in digital data has been accompanied by a corresponding leap forward in computing power, allowing us to store and analyse data with greater power and speed than ever before. This combination of data and computing has created unprecedented opportunities for creating value from data, Jensen says.
“People talk about the digital universe as all the data available in electronic form, and this universe has grown exponentially in this decade – 90% of all data that is available was created in the past two years,” Jensen tells swissinfo.ch, citing a report by IBMexternal link.
Big data also tends to be gathered and analysed very rapidly, or in real-time. Often too large to be stored on a single computer, big data may be distributed over many computers and computing facilities all over the world. The term also covers a broad range of formats, including text, images, audio and video.