A new, large-scale product image dataset was recently published by researchers from JD AI Research.
Motivated by the lack of precise and large-scale datasets of product images, researchers have started a project to collect and develop a dataset of images of products bought by customers from JD.com.
The new dataset, called Product-10K, contains 10 000 fine-grained and frequently bought products from a wide range of categories: fashion, food, healthcare, household items etc. The dataset features more than 150 000 images divided into these product categories which were also organized in a graph to capture the hierarchy and relationships between different products. All images were manually labeled and inspected by experts from JD.com making the Products-10K a high-quality dataset with an error of less than 0.5%.
Researchers conducted several experiments to verify the effectiveness of the new dataset. They trained EfficientNet-B3 model with the Product-10K dataset in a recognition task and have confirmed the effectiveness of the dataset.
The novel dataset was released and can be used freely for non-commercial research and educational purposes. Details about the data collection process as well as the experiments and obtained results can be found in the paper published on arxiv. Researchers are also hosting a Kaggle challenge based on this dataset.