Streaming data analytics systems are being implemented to meet the specific challenges of real-time data mining and processing. Real-time recommendations are one possible implementation due to the continuous, large-volume nature of the data generated by user activity and interaction with a platform. Recommendation systems play a very important role in a world of ever-increasing consumer facing databases because they offer a solution to the Paradox of Choice among users. To achieve this there are two different methodological approaches to building the recommendation algorithm, content-based and collaboration-based. Likewise, there are two approaches to the data architecture, historical/offline and online/real-time.
What are the Benefits of a Recommender System?
The basic idea of a recommender system is very straightforward. It arises out of the intersection of very large databases and the limitations of human decision-making in processing an overwhelming number of options. Traditional decision-making heuristics for people involve relying on hearsay from peers or the opinions of critics. These, however, are not always dependable when dealing with the web platforms, digitised services and e-commerce sites that have dramatically increased the number of consumer options directly available to individuals.
This development may seem like a good thing, however, when presented with increasingly large numbers of options individuals are faced with what psychologist Barry Schwartz refers to as the Paradox of Choice. Within commerce, this paradox refers to the contradiction between two things:
- The belief among marketers that increasing the number of products available and saturating as many niches and preferences as possible is a good sales strategy.
- The fact that too much choice causes people to experience “psychological distress”.
According to Schwartz, this distress is the result of the bittersweetness of modernity’s success - a form of depression that is derived from a sense of helplessness. This helplessness, however, is a result of increasing levels of personal autonomy and a utility maximising mindset conflicting with greater amounts of choice in all levels of a consumer society. “Unlimited choice”, he writes, “can produce genuine suffering”.
To manage this distress, one approach that Schwartz suggests is to focus time and energy on the most important choices. This is where recommendation engines can prove their serious worth. Although they do, in a sense, decrease a user’s autonomy, this is more than offset by the psychological benefits of restricting choice.
Why are there Different Types of Recommender System?
Moving beyond the psychological theory, recommendation systems simply work and they work well for some of the biggest players around. Around 35% of Amazon’s revenue has been stated to come directly due to its recommendation system. Likewise, 75% of videos watched on Netflix and 60% on Youtube are directly attributable to their recommendation systems. Netflix has previously even offered a 1 million dollar prize to anyone who could improve their recommendation engine.
There is no single approach to building a recommendation engine which works. Each large user facing database will typically have its own bespoke solution that is constantly being revised. However, there are two high level ways of thinking about recommendations that are useful to start with when thinking about recommender engines.
A content-based recommendation takes the items being viewed, bought or otherwise consumed by the user as their starting point. A system built on this model will break down the items by feature and description at as fine a granularity as possible. Following this, the relevance of each individual feature in relation to a user’s stated preferences or behaviour is established. Finally, a recommendation is made by plotting this weighting against the database items themselves with a similarity heuristic.
The success of this type of system depends on having a significant number of features for each item in a well-structured format. Additionally, these features need to exist within a “data ontology”, which means that they must have well defined relationships between each other that can be processed algorithmically and have meaning in relation to the aims of the recommendation engine.
A benefit of this approach is that it does not require the data of many user interactions in order to produce results. It is also comparatively transparent in the fact that the features and values are all known and can be visualised in a rational, relational way.
Unlike the feature focus of the content-based system, a collaboration or social-based system focuses on users. In particular, such an approach will utilise the subjective user feedback scoring or rating of the database items. The assumption of this approach is that users that exhibit a similar ratings pattern will be likely to share tastes.
An advantage of this system is that it is much less reliant on high quality structured data being present to describe every little feature of the item. It also doesn’t require all of the those features to exist within a complex data ontology. It is much more likely to capture the nuanced ways why individuals like certain items, which may not be captured by breaking the object down into weighted feature sets. However, this means that it requires substantial amounts of user data before it can provide any kind of accurate recommendations.
A further advantage of this system is that it can be utilised as part of a machine learning model. Rather than the computationally demanding process of plotting every new user against the data rich profiles of every existing user, a model can be trained based on this data. This model can then offer predictions without requiring ongoing access to the source data.
What are Real-Time Recommender Systems?
Another important concept to consider when thinking about recommendation engines at a high level is the data architecture that underpins the entire system. There are two main approaches to this issue.
Offline/Historical Data System
The classic/offline method relies on the accumulation of historical datasets generated by users and extrapolation based on that. This is based on the idea that the larger the dataset, the more insights are potentially contained within. One advantage of is the fact that it is indeed more substantial and can be analysed with conventional statistical models that rely on having whole and complete datasets.
Another advantage of such an approach is that it does not require access to a direct feedback loop with the platform that the user is interacting with. This is advantageous for less sophisticated data systems or ones that, because of some non-digital, non-quantifiable aspect of the process, don’t lend themselves to directly feeding back into a real-time data system. For a recommendation system, this would be a better approach when the number of options are relatively limited and do not change significantly over time. Additionally, this would be an option when there are no strict latency requirements and data storage is plentiful, with periodic aggregation of large volumes of statistics.
Online/Real-Time Data System
A real-time approach to data assumes that, although large historical datasets provide great input for extrapolating certain types of statistics, direct interaction provides much more immediately valuable insights. This relies on an ongoing interplay of analysis and action, with a direct feedback loop in which action immediately informs the analysis and likewise, the analysis will immediately impact the subsequent action.
A way in which this approach excels in over non real-time systems is its adaptability. When models are built out of a historical datasets they are, as a matter of course, outdated by the time they are actually implemented. In a large, frequently-updated database with large volumes of users, waiting for data to enter hard storage and be processed in batch can be insufficient in regards to SLAs as well as optimal user experience.
Furthermore, an online recommendation system has the benefit of not storing large volumes of user data to disk. This can be beneficial both in not having to spend as much on storage infrastructure as well as not hanging on to potentially sensitive customer data any longer than is absolutely necessary.
To learn more about real-time analytics, scoring and recommendation as well as examples of their real-world applications, watch the free BizData webinar: