Facebook AI Research (FAIR) has published an article where they explain the AI behind Instagram’s Explore recommender system.
In the article, they explain the challenges and the three major needs that Facebook engineers had to address when building a large-scale recommender system such as Explore. Researchers and engineers needed to perform rapid experimentation at scale, to obtain a stronger signal on people’s interests and needed to find an efficient way to provide fresh recommendations.
Considering these needs, they developed foundational tools to tackle each and every one of them in the best way possible. The first thing that was developed is a domain-specific meta-language that provides a level of abstraction over the retrieval of candidates for the recommendations. This language was called IGQL and was written in C++.
The second thing that they introduced were account embeddings for personalized ranking. To put it simply, a retrieval pipeline was designed and built to focus on account-level instead of media-level information.
Third, they introduce a ranking distillation model that helps pre-select candidates before using more complex (and computationally expensive) ranking models. These models are usually lightweight models that try to mimic the behavior of more complicated models.
Using these tools and techniques researchers designed the algorithm to work with two main stages: candidate generation stage and ranking stage. In the first stage, based on interactions a large number of candidate accounts are given. Then content from these seed accounts is further filtered based on safety, privacy, etc. Only 500 candidates of thousands of eligible media from this phase are sent downstream to the next stage.
In the ranking stage, the 500 candidates are filtered in three passes using different neural network models that predict the final 25 – highest quality and most relevant candidates. These candidates are then shown to users in their Explore tab in the Instagram app.
More in detail about the whole recommendation system can be read at the official blog post from FAIR.