Telco Customer Churn Prediction Project

01

I started with the retention decision, not the algorithm

The case begins with a management problem: acquisition can look healthy while revenue stalls because existing customers quietly leave. I framed churn prediction as a resource-allocation problem for a retention team. The goal was to identify accounts worth acting on, explain why they were risky, and keep the model transparent enough that the recommendation could be defended in business language.

02

I turned the analysis into a reproducible workflow

I did not want the work to live only as a final report. The R Markdown workflow in the GitHub repository made the project rerunnable: load the raw customer table, clean the data, engineer features, split the sample, fit the model, evaluate performance, and save the prediction file. That structure made the case closer to a small analytics system than a one-off spreadsheet exercise.

03

I made the customer table model-ready without hiding assumptions

The source data contained 7,043 customer accounts with demographic, service, contract, billing, and churn fields. The important cleaning decision was TotalCharges: it arrived as text and produced 11 missing values after conversion. I imputed those with the median to preserve the sample size, recoded categorical fields into factors, converted SeniorCitizen into readable yes/no labels, and removed customerID so the model could focus on behavior rather than identifiers.

04

I used EDA to locate the shape of churn risk

The exploratory layer showed that churn was not randomly scattered across the customer base. About 26.5% of customers churned, with risk concentrated among early-tenure customers, month-to-month contracts, and higher monthly charges. The monthly-charge density plot made that last signal visible: churners were more concentrated in the higher-charge range, while retained customers had a stronger low-charge cluster.

Density plot of monthly charges split by churn status — Customers who churned were more concentrated at higher monthly charges, especially compared with the low-charge retained group.

05

I used logistic regression because the drivers mattered

This case needed interpretation, not only prediction. Logistic regression let me describe how tenure, contract type, internet service, charges, and lifecycle stage changed churn odds. That mattered because management could act on drivers such as contract commitment and fiber-optic service risk. The odds-ratio view made the result easier to explain than a black-box score.

06

I validated performance as a decision tradeoff

The holdout set contained 1,760 customers. At the default 0.50 threshold, the model achieved 77.6% accuracy and high specificity, meaning it was strong at recognizing customers who would stay. But sensitivity was lower, so many churners would be missed if the company used the default cutoff. That is why I treated threshold tuning as part of the business decision rather than a technical footnote.

ROC curve for the Telco churn logistic regression model with AUC of 0.822 — The ROC curve shows good discriminatory performance, with AUC equal to 0.822.

07

I converted scores into a retention operating model

The model becomes useful when it changes how a team works. I translated the evidence into four operating moves: move month-to-month customers toward longer contracts, strengthen onboarding during the first year, monitor fiber-optic service quality, and lower the action threshold when the objective is to catch more churners early. The score is the prioritization layer; the driver tells the team what action to take.

08

I kept the final artifacts reviewable

The final case is supported by the GitHub repository, the written report, the presentation deck, and the R Markdown analysis file. Together they show the business framing, the statistical workflow, and the reproducible implementation. That matters because an analytics case should be inspectable from multiple angles: manager, reviewer, and technical reader.

09

What this project says about how I work

This case is smaller than the Trading Systems literature-review project, but I approached it with the same discipline: make the assumptions visible, preserve the evidence, explain the model in business terms, and connect the output to decisions. The professional value is not only the AUC. It is the chain from raw customer records to a retention action list that someone else can follow.

Telco Customer Churn Prediction

The Challenge

The Approach

How it works

I started with the retention decision, not the algorithm

I turned the analysis into a reproducible workflow

I made the customer table model-ready without hiding assumptions

I used EDA to locate the shape of churn risk

I used logistic regression because the drivers mattered

I validated performance as a decision tradeoff

I converted scores into a retention operating model

I kept the final artifacts reviewable

What this project says about how I work

Results

Key features

Tech stack

Interested in similar work?

Portfolio menu

Telco Customer Churn Prediction

The Challenge

The Approach

How it works

I started with the retention decision, not the algorithm

I turned the analysis into a reproducible workflow

I made the customer table model-ready without hiding assumptions

I used EDA to locate the shape of churn risk

I used logistic regression because the drivers mattered

I validated performance as a decision tradeoff

I converted scores into a retention operating model

I kept the final artifacts reviewable

What this project says about how I work

Results

Key features

Tech stack

Interested in similar work?