The Spirit

I volunteer at a church as a Sunday school teacher; the children I teach are two years old. The church did a series on the Holy Spirit and during that time they asked me if I had any stories about…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Telecom Customer Retention Analysis

A telecom company wants to know what it can do to retain customers and has provided data on 3300 customers, some of whom have left.

The below list shows the data provided about each customer.

The variables above will help predict if a customer will leave (churn) or not but has potential issues. Much of the data is connected and correlated. The number of minutes you spend talking at night is likely related to the number of calls you make at night which is likely related to the total amount you are charged at night.

As can be seen above, a handful of variables had near-perfect correlations with each other. Given the strength of these correlations, I dropped one variable from each set. This cuts the number of predictive variables by roughly a third.

On the flip side, I compared my remaining predictive variable to Churn to check for any obvious connections and found none (see table below).

To find a better connection between my predictive variables and my target I decided to switch my target variable from Churn to “account length.” Assuming there are some features that caused some people to stay longer than others this should help. And to simplify I made account length a dummy variable looking only at if a customer stayed longer than average or not.

Additionally, in some states, all or most customers had accounts that lasted longer or shorter than average. Almost no one in RI, NM, LA had accounts lasting shorter than average while no one in IA stayed longer than average.

From there I built 9 different basic models predicting Churn to see which would do best. I then built a table (see below) to compare all the metrics on both the train and test samples.

As you can see there were some general trends, overtraining, but some models did perform noticeably better than others. The KNN and Decision Tree models, (circled in red), did not do well and appear to have overtrained more than others. XG Boost and Gradient Boost, (circled in green), did the best overall.

More specifically I chose to focus on the accuracy and recall metrics, (circled in blue). I chose accuracy as a general test of my models ability to correctly predict outcomes and recall since a false negate, guessing a customer will stay when they leave, is worse than a false positive, a customer staying who I thought would leave.

Given this, I chose to move forward with the XG Boost and Gradient Boost models.

Starting with Gradient Boost I used Gridsearch to test a few ranges parameters to see if I could improve the results of my model. As you can see below there were some minor improvements.

The red bars, the new tuned Gradient Boost model, is higher for both accuracy and recall than both the initial Gradient Boost and the baseline models.

Gradient Boost Metrics comparison

After a similar process, I got improved results for the XG Boost model (see below).

Moving onto the results I graphed the top 10 most significant features for both models, (see below), to see which had the most impact on customers leaving or staying.

While you can see quite a few different features made the top ten for both models, a few stood out. The number of customer service calls someone made if they had an international plan, and if they had a voicemail plan made the top 5 for both models.

From the prion data and models, a few steps are advisable to increase customer retention, reduce customer churn, and generally improve business.

They should invest in, or generally improve, their customer service. The number of customer service calls stood out in both improved models and a higher number of these calls is loosely correlated with customers leaving.

They should target domestic customers particularly in Rhode Island, New Mexico, and Louisiana. Both models pointed to this feature and having an international plan was most correlated with customers leaving. While the exact states importances were drawn from a limited sample size they are worth keeping in mind.

Lastly, they should promote their voicemail plan. Customers with a voicemail plan were least likely to leave.

Add a comment

Related posts:

The Dark Side of Clean Eating

I still remember it like it was yesterday. It was one year ago when my girlfriend and I were going to a party with some friends. Everybody was going to is drinking, eating the right foods, and having…

And I Call Myself A Writer

I feel like smashing my face through the window, like grazing the shattered edges with the tip of my finger, like cutting myself down the middle to find defective parts, like yelling until my…

Know It Sunday

Owen woke really early this morning. I suppose I did too. Maybe last night should be classified as no sleep for me. Seems that’s how the story goes lately. But Owen woke happy and hungry. And again I…