Churn Prediction

Churn prediction with PySpark


It is expected to develop a machine learning model that can predict customers who will leave the company.

About Dataset

Consists of 10000 observations and 12 variables.

The independent variables contain information about customers.

The dependent variable represents the customer abandonment status.


  • Surname – Customer surname
  • CreditScore – Customer's credit score
  • Geography – Country where the customer is located
  • Gender – Customer's gender
  • Age – Customer's age
  • Tenure – Information on how many years of customer it is
  • NumOfProducts – Used bank product
  • HasCrCard – Credit card status (0=No,1=Yes)
  • IsActiveMember – Active Membership status (0=No,1=Yes)
  • EstimatedSalary – Customer's estimated salary
  • Exited: – Exited or not (0=No,1=Yes)