Harnessing Hundreds of Millions of Cases: Case-Based Prediction at Industrial Scale

Vahid Jalali,David B. Leake

Harnessing Hundreds of Millions of Cases: Case-Based Prediction at Industrial Scale

2018

Building predictive models is central to many big data applications. However, model building is computationally costly at scale. An appealing alternative is bypassing model building by applying case-based prediction to reason directly from data. However, to our knowledge case-based prediction still has not been applied at true industrial scale. In previous work we introduced a knowledge-light/data intensive approach to case-based prediction, using ensembles of automatically-generated adaptations. We developed foundational scaleup methods, using Locality Sensitive Hashing (LSH) for fast approximate nearest neighbor retrieval of both cases and adaptation rules, and tested them for millions of cases. This paper presents research on extending these methods to address the practical challenges raised by case bases of hundreds of millions of cases for a real world industrial e-commerce application. Handling this application required addressing how to keep LSH practical for skewed data; the resulting efficiency gains in turn enabled applying an adaptation generation strategy that previously was computationally infeasible. Experimental results show that our CBR approach achieves accuracy comparable to or better than state of the art machine learning methods commonly applied, while avoiding their model-building cost. This supports the opportunity to harness CBR for industrial scale prediction.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations