Projects / Book genre categorization
Book genre categorization
Replacing mundane manual labor with AI - see how we helped Nextory being more efficient in categorizing books.
Keywords
Manual work replacement
Categorization
Machine learning
AI Pipelines
Python
Introduction
Nextory is one of Europe's largest audiobook and e-reading platforms. The company offers thousands of book titles in multiple languages and has one of the most elaborate categorization systems of all audiobook providers.
Having an elaborate multi-hierarchy categorization system helps end users find niche books in very specific categories and is a unique selling point of their product.
Challenge
Nextory had a mundane task which was to classify all books into a very elaborate categorization system called THEMA which has a hierarchical categorization structure.
The task of categorizing books according to this standard was done manually, and Nextory had an idea about using machine learning to automate this process.
Goal
Use existing information and data about books (author, title, description, cover photo) to implement a machine learning model that can predict the hierarchical book categories.
Solution
After analyzing the data and the previously hand laballed book categories, we investigated the THEMA specification and figured out a way to produce multi-hierarchical predictions. We used image recognition on the cover images and NLP tools for feature extraction on the text descriptions. A machine learning model that could accurately predict book categories according to the THEMA standard was implemented. This model was put into production in a cloud environment and could be accessed by engineers at Nextory.
As a result, staff that previously hand-labelled categories could work with more qualified tasks instead.
Results
Overall, the model correctly predicted book categories 96% of the time. This metric refers to correctly labeling all book categories for a certain book.