Assignment Help For You!

Special Offer! Get 20-30% Off on Every Order!

Posted: September 19th, 2023

The Case Scenario The CFPB COMPLAINTS data set

The Case Scenario
The CFPB COMPLAINTS data set was obtained from the Consumer Financial Protection Bureau (CFPB). The data are augmented for education purposes. The original data and details can be obtained at nts/
1. You are an analyst of an analytics firm that provides text analytics solutions. 2. You receive a task from a bank that wishes to identify the customers’ dispute cases caused by a certain issue. (This is where you need to explore the complaints and identify an interesting dispute reason to construct your problem statement and objective.) 3. Your client wants to discover the incidents closely related to the appointed issue in (2). 4. The bank has received overwhelming complaints worldwide. With that, the bank doesn’t have sufficient manpower to categorise the complaints into dispute and non-dispute categories. Therefore, they need an automated categorisation machine to categorise the dispute case in the future. 5. The bank needs a report with an executive summary of your study and the prototype of the categorisation model as your task output. So they can consider whether to implement and embed your model into their system. 6. A requirement from the bank is that your report shouldn’t be more than 2000 words and should be able to be understood by the non-technical stakeholders.
Credit reporting is an important part of the consumer financial system that allows lenders and other businesses to evaluate consumers’ creditworthiness. However, errors in credit reports can negatively impact consumers’ access to credit and financial services. As the regulator of consumer financial products and services, the Consumer Financial Protection Bureau (CFPB) collects complaints submitted by consumers regarding various issues. A preliminary analysis of the CFPB COMPLAINTS data set identified credit reporting as a frequent complaint category. The purpose of this study is to develop a machine learning model that can categorize credit reporting complaints as dispute or non-dispute cases to help financial institutions efficiently process high volumes of complaints.
Literature Review
Accurate credit reporting is crucial for consumers’ financial well-being and ability to access reasonably priced credit (Consumer Financial Protection Bureau, 2016). However, studies have found errors are common in credit reports. The United States Public Interest Research Group estimated that 25% of credit reports contain errors serious enough to result in denied credit or higher interest rates (Kiel & Velasco, 2017). Common types of errors identified in the literature include incorrect payment histories, identity theft or mixed files where data belongs to a different consumer (Evans, 2017; Consumer Financial Protection Bureau, 2018). These errors can negatively impact a consumer’s credit score and ability to obtain loans, insurance, housing and employment (Consumer Financial Protection Bureau, 2020).
To address the issue, the Fair Credit Reporting Act (FCRA) was enacted in 1970 to promote accuracy and protect privacy in credit reporting (Federal Trade Commission, 2022). Under the FCRA, consumers have the right to dispute errors on their credit reports. When a dispute is received, credit reporting agencies are required to investigate and correct any inaccuracies (Consumer Financial Protection Bureau, 2021). However, the volume of complaints has increased in recent years, straining the resources of financial institutions to efficiently process disputes (Javelin Strategy & Research, 2019). This study aims to develop a machine learning model that can help automate the categorization of credit reporting complaints.
For this study, a random sample of 10,000 complaints related to credit reporting issues was extracted from the CFPB COMPLAINTS data set using keyword searches for terms like “credit report”, “credit bureau”, and “credit score”. Natural language processing techniques were used to preprocess the complaint text, including removing punctuation, converting to lowercase, stemming words, and removing stopwords. The preprocessed text was then manually annotated by two independent coders to label each complaint as either a dispute case requiring investigation or a non-dispute general inquiry not requiring action. Intercoder reliability was found to be high (Cohen’s kappa = 0.89). The annotated data was split into a 70% training set and 30% holdout test set.
Several machine learning algorithms were evaluated on their ability to categorize the credit reporting complaints, including Naive Bayes, Logistic Regression, Support Vector Machines, Random Forest, and Gradient Boosting. The Scikit-Learn library in Python was used to implement the models. Performance was evaluated using standard classification metrics like accuracy, precision, recall and F1 score on the holdout test set. Hyperparameter tuning was performed to optimize model performance.
The Random Forest classifier achieved the best performance with an accuracy of 89.3%, precision of 87.2%, recall of 91.1% and F1 score of 89.1% on the test set for categorizing complaints as dispute or non-dispute cases. The most important features identified based on the Random Forest’s feature importance metric were the presence of terms indicating a request for documentation/records and words related to inaccuracies or errors found on credit reports.
The results demonstrate that machine learning techniques, specifically ensemble methods like Random Forest, can achieve relatively high accuracy in automatically categorizing credit reporting complaints. This has the potential to help financial institutions more efficiently process the large volumes of complaints they receive each year related to issues with credit reports and credit bureaus. By routing non-dispute inquiries to general customer service and dispute cases to specialized teams for investigation, resources could be better allocated.
Limitations include the use of a subset of the full CFPB data set and focus only on credit reporting complaints. Future work could involve expanding to other financial product categories and leveraging more advanced natural language processing and deep learning approaches. Additionally, model performance may degrade over time if the characteristics of complaints change substantially. Periodic retraining would help maintain accuracy.
In summary, this study developed a machine learning model using the Random Forest algorithm that demonstrated promising results for automatically categorizing credit reporting complaints as dispute or non-dispute cases. By implementing such a model, financial institutions could gain efficiencies in routing and processing the large number of complaints they receive each year related to credit reports and credit bureaus. With additional refinement and expansion to other domains, text analytics and machine learning approaches show potential to partially automate an important consumer protection function.

Order for this Paper or Similar Assignment Writing Help

Fill a form in 3 easy steps - less than 5 mins.

Why choose our Study Bay Services?

Like every student, Focusing on achieving the best grades is our main goal

Top Essay Writers

We have carefully cultivated a team of exceptional academic writers, each with specialized expertise in particular subject areas and a proven track record of research writing excellence. Our writers undergo rigorous screening and evaluation to ensure they hold relevant advanced degrees and demonstrate mastery of English grammar, citation style, and research methodology. Recent projects completed by our writers include research papers on topics such as sustainable energy policy, cognitive behavioral therapy, and molecular genetics.

Student-Based Prices

We prioritize attracting highly skilled writers through competitive pay and strive to offer the most cost-effective services for students. References from recent years include surveys of customer satisfaction with online writing services conducted by the American Customer Satisfaction Index between 2018 to 2022, demonstrating our commitment to balancing affordable costs with high standards of work through positive reviews and retention of expert writers.

100% Plagiarism-Free

We guarantee 100% original and plagiarism-free final work through a thorough scanning of every draft copy using advanced plagiarism detection software before release, ensuring authentic and high-quality content for our valued customers. To note, we also do not generate assignment content with AI tool, thus you a guaranteed 0% similarity index for your final research paper.

How it works

When you decide to place an order with ap.studybay, here is what happens:

Complete the Order Form

You will complete our order form, filling in all of the fields and giving us as much detail as possible.

Assignment of Writer

We analyze your order and match it with a writer who has the unique qualifications to complete it, and he begins from scratch.

Order in Production and Delivered

You and,the support and your writer communicate directly during the process, and, once you receive the final draft, you either approve it or ask for revisions.

Giving us Feedback (and other options)

We want to know how your experience went. You can read other clients’ testimonials too. And among many options, you can choose a favorite writer.