Every day, Google processes more than 3.5 billion searches. That is, we must admit, an enormous amount of data coming from the Google search queries only. This search data contains a lot of particularly valuable information about the searchers.
Back in 2008, Google launched a new project that was supposed to take advantage of this data. “Google Flu Trends” was the name of the new project, that was about to use search queries data to forecast flu outbreaks. However, although the ambitions were high and the data was there, Google Flu Trends failed after a few years – in 2013.
Five years later, we see another attempt to use social network data to forecast influenza epidemics. In a pre-print paper, posted on Tuesday on arXiv, researchers from Finland reveal their method for predicting flu outbreaks using Artificial Intelligence and Instagram posts.
They proved their hypothesis that Instagram posts have a significant statistical correlation with flu outbreaks. In the paper, they explain their method that relies on Artificial Intelligence to correlate numbers of hashtag references in Instagram posts to the official incidences of flu as recorded by Finland’s National Institute for Health and Welfare.
Big (Instagram) Data
They report that they collected data from Instagram posts from 2012 to 2018, counting over 22,000 posts. All of the data collected was public data gathered by searching for hashtags with words such as “flu” and comparing the image content of posts showing boxes and bottles of flu drugs.
They used public health data to predict historical outbreaks of influenza viruses. Their method employs convolutional networks, such as Inception and Resnet together with a tree search algorithm called XGBoost. In their article, they show that the method is able to predict flu outbreaks in the final year of data, using only data from previous years.
This shows that social networks data holds significantly valuable information. However, we still have to be careful with our approaches to extracting this information and relying on it when making decisions. Also, there is the privacy concern present when dealing with public data, especially from social networks.