Large Scale Decision Forests: Lessons Learned

August 25, 2015 8:51 pm Published by

At Sift Science, we use a variety of popular machine learning models to detect fraud for our customers. However, until recently we relied exclusively on a combination of linear models and sophisticated feature engineering. As we were reaching the limits of this setup, we began experimenting with our first non-linear model: random decision forests. Several months and over 100 experiments later, we were thrilled to announce the addition of random decision forests to our ensemble of models used to fight fraud. Along the way we learned quite a few things about designing a random decision forest classifier for the fraud detection use case. Here we detail several of these learnings, including how we handled sparse and missing features, useful model visualization techniques, heuristics we used to improve class separation, specialized feature engineering, and how we combined our random decision forest with our existing models. All told, these learnings resulted in an 18% reduction in error for our customers.

Turn Up the Bayes, Part 2

August 12, 2015 5:10 pm Published by

We really love tech talks.At Sift Science, sharing knowledge and facilitating great discussion are two of our favorite things (just behind fraud-fighting, board games, ML, and really beautiful data visualization). In that vein, we've been delighted to host a summer tech talk series entitled Turn Up The Bayes, where we invite awesome engineers to chat about the interesting things that they're working on. To set the mood, we provide delicious pizza and refreshing beverages, and set aside plenty of time for discussion, questions, and more pizza.