The project began by loading the dataset containing information about credit card customers, including various features such as customer demographics, transaction details, and account attributes. The dataset was then explored to gain a better understanding of its structure and contents. This included checking the number of records, identifying the available features, and inspecting the data types. To gain insights into the data, exploratory data analysis (EDA) techniques were employed. This involved examining the distribution of different features, identifying any missing values, and understanding the relationships between variables. Visualizations were created to represent the distribution of features. These visualizations helped identify any patterns, outliers, or potential correlations in the data.
The target variable for prediction was the attrition flag, which indicated whether a customer had churned or not. The dataset was split into input features (X) and the target variable (y) accordingly. Machine learning algorithms were then applied to predict the attrition flag. Various classifiers such as Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), K-Nearest Neighbors (NN), Gradient Boosting, Extreme Gradient Boosting, Light Gradient Boosting, were utilized. These models were trained using the training dataset and evaluated using appropriate performance metrics.
Model evaluation involved measuring the accuracy, precision, recall, and F1-score of each classifier. These metrics provided insights into how well the models performed in predicting customer attrition. Additionally, a confusion matrix was created to analyze the true positive, true negative, false positive, and false negative predictions. This matrix allowed for a deeper understanding of the classifier's performance and potential areas for improvement.
Next, a deep learning approach using an artificial neural network (ANN) was employed for attrition flag prediction. The dataset was preprocessed, including features normalization, one-hot encoding of categorical variables, and splitting into training and testing sets. The ANN model architecture was defined, consisting of an input layer, one or more hidden layers, and an output layer. The number of nodes and activation functions for each layer were determined based on experimentation and best practices. The ANN model was compiled by specifying the loss function, optimizer, and evaluation metrics. Common choices for binary classification problems include binary cross-entropy loss and the Adam optimizer. The model was then trained using the training dataset. The training process involved feeding the input features and target variable through the network, updating the weights and biases using backpropagation, and repeating this process for multiple epochs. During training, the model's performance on both the training and validation sets was monitored. This allowed for the detection of overfitting or underfitting and the adjustment of hyperparameters, such as the learning rate or the number of hidden layers, if necessary.
The accuracy and loss values were plotted over the epochs to visualize the training and validation performance of the ANN. These plots provided insights into the model's convergence and potential areas for improvement. After training, the model was used to make predictions on the test dataset. A threshold of 0.5 was applied to the predicted probabilities to classify the predictions as either churned or not churned customers. The accuracy score was calculated by comparing the predicted labels with the true labels from the test dataset. Additionally, a classification report was generated, including metrics such as precision, recall, and F1-score for both churned and not churned customers.
To further evaluate the model's performance, a confusion matrix was created. This matrix visualized the true positive, true negative, false positive, and false negative predictions, allowing for a more detailed analysis of the model's predictive capabilities. Finally, a custom function was utilized to create a plot comparing the predicted values to the true values for the attrition flag. This plot visualized the accuracy of the model and provided a clear understanding of how well the predictions aligned with the actual values.
Through this comprehensive analysis and prediction process, valuable insights were gained regarding customer attrition in credit card churning scenarios. The machine learning and ANN models provided predictions and performance metrics that can be used for decision-making and developing strategies to mitigate attrition. Overall, this project demonstrated the power of machine learning and deep learning techniques in understanding and predicting customer behavior. By leveraging the available data, it was possible to uncover patterns, make accurate predictions, and guide business decisions aimed at retaining customers and reducing attrition in credit card churning scenarios.
Vivian Siahaan is a fast-learner who likes to do new things. She was born, raised in Hinalang Bagasan, Balige, on the banks of Lake Toba, and completed high school education from SMAN 1 Balige. She started herself learning Java, Android, JavaScript, CSS, C ++, Python, R, Visual Basic, Visual C #, MATLAB, Mathematica, PHP, JSP, MySQL, SQL Server, Oracle, Access, and other programming languages. She studied programming from scratch, starting with the most basic syntax and logic, by building several simple and applicable GUI applications. Animation and games are fields of programming that are interests that she always wants to develop. Besides studying mathematical logic and programming, the author also has the pleasure of reading novels. Vivian Siahaan has written dozens of ebooks that have been published on Sparta Publisher: Data Structure with Java; Java Programming: Cookbook; C ++ Programming: Cookbook; C Programming For High Schools / Vocational Schools and Students; Java Programming for SMA / SMK; Java Tutorial: GUI, Graphics and Animation; Visual Basic Programming: From A to Z; Java Programming for Animation and Games; C # Programming for SMA / SMK and Students; MATLAB For Students and Researchers; Graphics in JavaScript: Quick Learning Series; JavaScript Image Processing Methods: From A to Z; Java GUI Case Study: AWT & Swing; Basic CSS and JavaScript; PHP / MySQL Programming: Cookbook; Visual Basic: Cookbook; C ++ Programming for High Schools / Vocational Schools and Students; Concepts and Practices of C ++; PHP / MySQL For Students; C # Programming: From A to Z; Visual Basic for SMA / SMK and Students; C # .NET and SQL Server for High School / Vocational School and Students. At the ANDI Yogyakarta publisher, Vivian Siahaan also wrote a number of books including: Python Programming Theory and Practice; Python GUI Programming; Python GUI and Database; Build From Zero School Database Management System In Python / MySQL; Database Management System in Python / MySQL; Python / MySQL For Management Systems of Criminal Track Record Database; Java / MySQL For Management Systems of Criminal Track Records Database; Database and Cryptography Using Java / MySQL; Build From Zero School Database Management System With Java / MySQL.
Rismon Hasiholan Sianipar was born in Pematang Siantar, in 1994. After graduating from SMAN 3 Pematang Siantar 3, the writer traveled to the city of Jogjakarta. In 1998 and 2001 the author completed his Bachelor of Engineering (S.T) and Master of Engineering (M.T) education in the Electrical Engineering of Gadjah Mada University, under the guidance of Prof. Dr. Adhi Soesanto and Prof. Dr. Thomas Sri Widodo, focusing on research on non-stationary signals by analyzing their energy using time-frequency maps. Because of its non-stationary nature, the distribution of signal energy becomes very dynamic on a time-frequency map. By mapping the distribution of energy in the time-frequency field using discrete wavelet transformations, one can design non-linear filters so that they can analyze the pattern of the data contained in it. In 2003, the author received a Monbukagakusho scholarship from the Japanese Government. In 2005 and 2008, he completed his Master of Engineering (M.Eng) and Doctor of Engineering (Dr.Eng) education at Yamaguchi University, under the guidance of Prof. Dr. Hidetoshi Miike. Both the master's thesis and his doctoral thesis, R.H. Sianipar combines SR-FHN (Stochastic Resonance Fitzhugh-Nagumo) filter strength with cryptosystem ECC (elliptic curve cryptography) 4096-bit both to suppress noise in digital images and digital video and maintain its authenticity. The results of this study have been documented in international scientific journals and officially patented in Japan. One of the patents was published in Japan with a registration number 2008-009549. He is active in collaborating with several universities and research institutions in Japan, particularly in the fields of cryptography, cryptanalysis and audio / image / video digital forensics. R.H. Sianipar also has experience in conducting code-breaking methods (cryptanalysis) on a number of intelligence data that are the object of research studies in Japan. R.H. Sianipar has a number of Japanese patents, and has written a number of national / international scientific articles, and dozens of national books. R.H. Sianipar has also participated in a number of workshops related to cryptography, cryptanalysis, digital watermarking, and digital forensics. In a number of workshops, R.H. Sianipar helps Prof. Hidetoshi Miike to create applications related to digital image / video processing, steganography, cryptography, watermarking, non-linear screening, intelligent descriptor-based computer vision, and others, which are used as training materials. Field of interest in the study of R.H. Sianipar is multimedia security, signal processing / digital image / video, cryptography, digital communication, digital forensics, and data compression / coding. Until now, R.H. Sianipar continues to develop applications related to analysis of signal, image, and digital video, both for research purposes and for commercial purposes based on the Python programming language, MATLAB, C ++, C, VB.NET, C # .NET, R, and Java.