Technical ReportPDF Available

Project Report - Consumer Financial Protection Bureau

Authors:

Abstract and Figures

The Consumer Financial Protection Bureau (CFPB) [1] is a United States government organisation that is focused on consumer protection in the area of financial products. The CFPB handles complaints from consumers about issues that they have with financial organisations and brings those to the attention of the companies and assist in getting those issues resolved. The Bureau has a number of metrics that they are interested in, the primary one being a 3-month rolling average, which they will compare to the same period in the previous year. The available Bureau data, along with additional public data, was used to help answer a number of questions that are useful for the Bureau, members of the public, and elected officials to have answered.
Content may be subject to copyright.
Data Visualisation
PROJECT REPORT
Stephen Redmond, 15021815 | MSCDA2 | 10th December 2016
PAGE 2
PAGE 3
Contents
Introduction .................................................................................................................................................. 4
Objectives ................................................................................................................................................. 4
Technical Overview ................................................................................................................................... 5
Interactive data visualisation ................................................................................................................ 5
Advanced analytics and visualisation .................................................................................................... 6
Methods and Implementation ...................................................................................................................... 7
Data Acquisition and Initial Analysis ......................................................................................................... 7
CFPB Data .............................................................................................................................................. 7
Consumer Credit data ........................................................................................................................... 8
Population data ..................................................................................................................................... 8
Extract, Transform and Load (ETL) ............................................................................................................ 9
Design ........................................................................................................................................................ 9
Dashboard ............................................................................................................................................. 9
Analysis ............................................................................................................................................... 11
Reporting............................................................................................................................................. 11
Advanced Analytics ................................................................................................................................. 12
R Analysis ............................................................................................................................................ 12
Results and Conclusions .............................................................................................................................. 13
Answers to questions based on the data ................................................................................................ 14
Are the number of complaints increasing or decreasing over time? ................................................. 14
What products are driving the change in complaint volume? ........................................................... 14
What companies are people complaining about the most? ............................................................... 14
From which States are the highest volume of complaints coming? ................................................... 14
Are there correlations between different complaint types and market conditions? ......................... 14
Conclusion ............................................................................................................................................... 15
References .................................................................................................................................................. 16
PAGE 4
Introduction
The Consumer Financial Protection Bureau (CFPB) [1] is a United States government organisation that is
focused on consumer protection in the area of financial products. It was created by the Dodd-Frank Act
of 2010 [2] which imposed several regulations on financial institutions. The CFPB handles complaints from
consumers about issues that they have with financial organisations and brings those to the attention of
the companies and assist in getting those issues resolved. They issue a monthly report, showing a high-
level snapshot of trends. They also make anonymised data available for any interested parties.
The Bureau has a number of metrics that they are interested in, the primary one being a 3-month rolling
average, which they will compare to the same period in the previous year. They are also interested in
month-to-month trends. Other metrics of interest are the time it takes for them to send complaints on to
companies (a measure of their own efficiency), the time it takes for companies to respond tagged as
“timely”, and the level of disputed complaints.
In the project it has been investigated how publicly available data can be used to extend the reporting
options beyond just the numbers of complaints. By bringing in additional public information, it is shown
that there are some complaint types that seem to track the consumer credit market and some that do
not. This insight was discovered using an interactive visualisation tool. A more detailed examination of the
data was performed using the open-source R programming language.
The available data was used to help answer a number of questions that are useful for the Bureau,
members of the public, and elected officials to have answered.
Objectives
There are several objectives in this project.
- Acquire data suitable to calculate the metrics issued by the CFPB.
- Acquire additional data to deliver additional insight.
- Prepare the data and load it into a suitable data repository.
- Design a number of interactive visualisations to present the data to a business user.
- Create more advanced statistical analyses and present them visually.
- Answer a number of questions based on the data:
o Are the number of complaints increasing or decreasing over time?
o What products are driving the change in complaint volume?
o What companies are people complaining about the most?
o From which States are the highest volume of complaints coming?
o Are there correlations between different complaint types and market conditions?
PAGE 5
Technical Overview
To achieve the objectives above, it was important to choose technologies that supported the full end-to-
end implementation. In particular, the tool used for presenting the interactive visualisations should fully
support the seven categories of interactivity proposed by Yi et al [3].
Interactive data visualisation
The proposed solution also needs to easily accept data from more than one source. It is necessary, for
example, to be able to bring in both the CFPB complaint information as well as population information
from the US Census Bureau [4] and consumer credit information from the Federal Reserve [5].
Four products were initially examined for the purpose of this project:
- Tableau
- QlikView
- Qlik Sense
- Microsoft PowerBI
The list of products selected is consistent with the industry view of the marketplace, as demonstrated by
the Gartner Group’s “magic quadrant” report for business intelligence and advanced analytics
platforms[6]. An image of this quadrant is show in Figure 1.
Figure 1. Gartner "magic quadrant" for BI products, 2016, showing Tableau, Qlik and Microsoft as leaders.
PAGE 6
Table 1 below compares the features of each product as evaluated for this project.
Table 1. Visualisation product comparison against ETL and Yi’s interactivity categories
Feature
Tableau
QlikView
Qlik Sense
PowerBI
Extract, Transform and Load (ETL)
Limited
Yes
Yes
Limited
Select
Yes
Some
Yes
Yes
Explore
Yes
Yes
Yes
Yes
Reconfigure
Yes
Yes
Yes
Yes
Encode
Yes
Yes
Yes
Yes
Abstract/Elaborate
Some
Some
Yes
Some
Filter
Some
Yes
Yes
Some
Connect
Yes
With coding
Yes
Yes
There are two features here that were important in making a decision on which product to use for this
project ETL (used to load data into the tool) and Filter.
In both Tableau and PowerBI, it is quite straightforward to import the raw CSV files for analysis. However,
if there are anything other than minor transformation required, then an external step of pre-processing
must be performed. On the other hand, both of the Qlik products come with a rich, fully-featured, ETL
scripting language.
When filtering in Tableau, the default behaviour is to filter only on the object to which the filter is
associated. It requires a configuration step to be able to make this a global feature. Also, when data is
filtered in one selector, that filter is not reflected in the other selectors. A user could select, for example,
the continent of Europe in one selector box and then the country of Canada in another filter and have no
visual queue as to why there are no results. PowerBI does not share filtering across separate pages.
Further, selections in charts only brush, there appears to be no function to allow that to be made as a
more permanent selection.
In this evaluation, Qlik Sense scores well in every category. It is also a more modern looking tool than
some of its rivals, and delivers a better user experience. This is borne out by the recent BARC survey on
business intelligence products [7].
Advanced analytics and visualisation
To create the required advanced analytics, the R programming language was used. R is well suited to such
analytical requirements as it is cross-platform, low cost, and easy-to-use and can help with finding insights
in data [8].
There is always a decision to be made about whether to aggregate more, to remove influence from low-
level counting errors, versus the potential loss of detail due to the aggregation. In this case, it is suitable
to aggregate the data up to month and product level. We can use the BI product to perform that
aggregation for us so as to export it to a format that can be read into R.
PAGE 7
Methods and Implementation
This section details the technical implementation of the project. Firstly, the data acquisition is described
and how the data was analysed. Then we discuss the ETL process. Following that there is a description of
the design methodology and how it was applied. Finally, there is a discussion on some advanced analytics
that were performed.
Data Acquisition and Initial Analysis
There were three data sources used in this project, as show in Table 2.
Table 2. List of data sources used in the project
Data Source
Acquired from
CFPB Complaints database [1]
http://www.consumerfinance.gov/data-
research/consumer-complaints/#download-the-data
Federal Reserve Consumer Credit Report
(G.19) [5] Historical data
https://www.federalreserve.gov/releases/g19/HIST/
cc_hist_sa_levels.html
US Census Bureau population estimates [4]
https://www.census.gov/popest/
CFPB Data
The CFPB data contains 14 fields of data and 627,557 rows.
To help analyse the contents, the data was loaded into the Trifacta tool created by Heer et al [9]. A
screenshot of the data in this tool is presented in Figure 2.
Figure 2. Screenshot of the Trifacta tool showing analysis of the Complaints data.
PAGE 8
The Trifacta tool allows the data to be explored and information to be discovered. Based on this
exploration of the data, useful field data was revealed, as show in Table 3.
Table 3. Fields in the CFPB data
Field
% Density
# Distinct
Most Common
Product
100%
12
Mortgage (46%)
Sub-product
70.60%
47
Other Mortgage (18%)
Issue
100%
95
Loan.Modification…(25%)
Sub-issue
39.90%
68
Account status (4%)
Consumer complaint narrative
16.20%
103632
n/a
Company public response
20.50%
10
No public response (8%)
Company
100%
3847
Bank of America (13%)
State
99.20%
62
CA (16%)
ZIP code
99.20%
27699
48382 (0.35%)
Tags
14%
3
Older American (9%)
Consumer consent provided?
98.10%
5
N/A (68%)
Submitted via
100%
6
Web (54%)
Company response to consumer
100%
8
Closed with explanation (75%)
Timely response?
100%
2
Yes (98%)
Consumer disputed?
93.70%
2
No (77%)
The information in this table helped inform the design of the solution.
Consumer Credit data
The consumer credit table has only 4 fields. The details are show in Table 4.
Table 4. Consumer Credit table
Field
Description
Month
Calendar month
Revolving
Revolving credit, such as credit card or overdraft.
Nonrevolving
Non-revolving credit, such as mortgages.
Total
The total credit for that month.
Population data
The population data contains only two fields State and Population. It represents the most up-to-date
estimate of the state populations from the US Census Bureau.
PAGE 9
Extract, Transform and Load (ETL)
The data was imported into the Qlik Sense in-memory data store using Qlik’s in-built ETL scripting tool.
This was relatively straightforward as the data was coming from well formatted text files.
The population count data was associated to the complaint data on the State Name field. The consumer
credit (G19) data is associated using the year and month. A screen-shot of Qlik Sense’s Data Model viewer
is show in Figure 3.
Figure 3. Qlik in-memory data model showing population and consumer credit (G19) data associated to
the complaints data.
Design
Qlik, the company who have created the Qlik Sense application, propose a method of designing data
applications called DAR Dashboard, Analysis, Reporting [10].
This design method proposes structuring the application with a dashboard designed for users who need
a high-level overview, analysis views for those users who need more details, and report-style tabular views
for those users who want lower level information. This method has been followed in this project to create
the interactive visualisation.
The interactive visualisation has been created following this DAR design method.
Dashboard
The design for the dashboard is that it should display the most important metrics that a high-level user
would be interested in. The user should be able to see the information that they need to see very quickly
and not have to dive into too much detail.
Pre-attentive perception is supported by using both length and colour on the main metrics. Keeping too
much detail off the dashboard makes it easy to find the information quickly.
The finished dashboard design is shown in Figure 4.
PAGE 10
Figure 4. Qlik Sense dashboard showing the main four metrics plus sparklines for trend visualisation and
bar charts showing top products and companies for complaints.
The dashboard presents the four metrics average complaint numbers, average time to send, percentage
timely and percentage disputed for the period under examination, June to August 2016. The four bar
charts show, at a glance, the difference between the current period and the previous period.
The bar chart is ideal for this purpose as, as discussed by Ware et al, we have a high visual accuracy for
length comparisons [11]. Judgement of length has been identified by Cleveland and McGill, in their
seminal paper on the subject, as one of the primary elementary tasks [12]. A pie chart might have been
considered, however this is not a part-to-whole comparison and the pie would not be appropriate for this
representation. Few argues that a bar graph is better for measuring magnitudes [13].
The dashboard also presents sparklines originally proposed by Tufte as “intense continuous time-series
[14]. These give the user a view of how each metric has changed over the 15 months from June 2015 to
August 2016.
Finally, the dashboard gives the user two sorted bar charts that show the top products and top companies.
These charts are interactive and the user can drill into particular company or particular product. They can
also change the metric displayed from number of complaints to percentage change since the same period
last year. An alternative option for this display might be a treemap [15]. However, the percentage change
metric can have negative values which are not valid in a treemap.
PAGE 11
Analysis
Following the DAR design methodology, several additional analysis screens were designed to allow users
to discover more information. Figure 5 shows four of these views.
Figure 5. Four analysis visualisations a time series trend of complaints, a map showing the states with
highest complaints per person, a scatter showing population versus complaints and a correlation of
complaints to consumer credit.
These show examples of some of the other common visualisations identified by Ward et al line charts,
scatterplots and maps [11].
Reporting
The final stage of the DAR methodology is reporting. A number of reports were created to allow users to
view the information in a tabular fashion. These also allows the user to export data to another format
(e.g. Excel). The reports are shown in Figure 6.
PAGE 12
Figure 6. Tabular reports showing the four main metrics by both company.
Advanced Analytics
One of the line charts that was created for the analytics (Figure 5) compares the normalised credit total
(using a min/max formula) a number that has steadily increased over the years represented in the data
versus the normalised complaints total.
Selecting different products in this view appears to demonstrate that the trend of the number of
complaints does not always follow the trend of the credit total. So as to examine this in more detail, the
data was exported to a format that could be read into R and statistical processes were applied.
R Analysis
Once the table of data was loaded into R, it could be iterated over to calculate the correlation and
covariance values for each metric versus the Total Credit value. These are shown in Table 5.
Table 5. Correlation and Covariance of metrics to Total Credit
Metric
Correlation
Covariance
Other financial service
0.615158
0.022053
Student loan
0.653834
0.041918
Consumer Loan
0.961389
0.091858
Debt collection
0.492888
0.022036
Money transfers
0.823979
0.048367
Payday loan
-0.065520
-0.003150
Prepaid card
0.242387
0.006837
Bank account or service
0.681000
0.045585
Credit card
0.595281
0.039000
Mortgage
-0.040260
-0.001990
Credit reporting
0.929540
0.061012
Total Complaints
0.928317
0.070846
PAGE 13
It can be difficult to see exactly what is going on here, although Payday loan and Mortgage appear to show
both negative correlation and non-covariance. As discussed by Tufte, when presenting Anscombe’s
quartet, graphics can be more precise and show more than statistics [14]. The R plot outputs of four of
these analysis are shown in Figure 7.
Figure 7. Visualisation of consumer credit versus complaints by customer with correlation and covariance
calculations. Student loan, Mortgage, Credit reporting and Total for all products are shown.
Results and Conclusions
At the beginning of this report a number of objectives were established.
- Acquire data suitable to calculate the metrics issued by the CFPB.
- Acquire additional data to deliver additional insight.
- Prepare the data and load it into a suitable data repository.
- Design a number of interactive visualisations to present the data to a business user.
- Create more advanced statistical analyses and present them visually.
- Answer a number of question
PAGE 14
The required datasets were acquired and loaded into the Qlik Sense in-memory data repository. This
allowed a number of highly interactive visualisations to be generated.
To enable the more advanced statistical analyses of the data, the BI tool was used to extract aggregated
date to export to the R programming language.
Answers to questions based on the data
In the objectives, a number of questions were identified to be answered. The results of the analyses allow
those questions to be answered as follows:
Are the number of complaints increasing or decreasing over time?
For most products, the number of complaints are increasing over time. However, one of the traditionally
biggest complained about products, Mortgages, has seen a year-on-year reduction. It would be interesting
to examine further what is happening here.
Student loan had an unusual spike in 2016. This may be to do with changes in the way that federal loans,
mostly processed by Navient, began to be handled by the CFPB this year.
What products are driving the change in complaint volume?
The largest volume of complains is around credit reporting. This also had a 24.6% year-on-year increase.
It has been steadily increasing since reporting began in 2012.
Debt collection is the second largest complained about product, but it shows a 4.9% year-on-year
decrease.
What companies are people complaining about the most?
Equifax, Experian and TransUnion are the largest complained about companies, having almost 25% of all
complaints in the June to August period. All of them have shown an increase in 2016 over 2015. This
reflects that these three organisations have by far the biggest share of the credit reporting market.
From which States are the highest volume of complaints coming?
As would be expected, the most complaints come from California, Texas, Florida and New York as these
are the states with the highest populations. However, Georgia shows the highest number of complaints
per head of population. This is something that could be looked into further.
Are there correlations between different complaint types and market conditions?
There appears to be a very close correlation between the amount of credit outstanding on the market
and the amount of complaints being made.
This close correlation and covariance would appear to indicate that companies are being complained
about at the same rate as they have been over several years. There is no improvement apparent. Further
study is needed to see what can be done about this situation.
PAGE 15
Conclusion
This project set out to acquire a number of dataset, create an interactive visualisation, and then answer
several questions.
It was important to establish the correct tool in which to implement the interactive visualisation and it
has been demonstrated that the Qlik Sense product is better versus competitors, especially when
considering both Extract, Transform and Load (ETL) process and also when looking at the seven categories
of interactivity proposed by Yi et al [3].
When designing dashboards, the principles of supporting pre-attentive processing, using colour and
length, have been followed. Best practices recommended by the literature, such as Tufte [14] and Few
[13], have been followed.
When designing the interactive analysis views, following Qlik’s DAR [10] methodology, chart types
recommended by Ward et al [11] bar charts, line charts, scatterplots and maps - have been used.
To perform the more advanced correlation and covariance analysis, the data, aggregated by product and
month, was exported to the R programming language. R was used to create the statistical analyses and
produce visualisations to support it.
The results of this project reveal some interesting results in and around particular products. The
information that Mortgage product complaints are not increasing with the increase in consumer credit is
particularly interesting and will require additional information to establish the cause.
The increase of complaints about credit reports is concerning as there are just three companies who
command the majority of the market and these three companies are complained about more often than
any other companies in the data. As there is almost a monopoly of this product space between these three
companies, it would be hoped that the number of complaints would cease to co-vary with the increasing
market as time goes by, showing that the companies are improving their processes, but this has not
happened. Further work is needed to look into the issues that are being complained about so as to
improve this.
PAGE 16
References
[1] “Consumer Financial Protection Bureau,” 2016. [Online]. Available:
http://www.consumerfinance.gov/. [Accessed: 14-Oct-2016].
[2] United States Government Publishing Office, “Dodd-Frank Wall Street Reform and Consumer
Protection Act,” 2010. [Online]. Available: https://www.gpo.gov/fdsys/pkg/PLAW-111publ203.
[3] J. S. Yi, Y. ah Kang, J. Stasko, and J. Jacko, “Toward a deeper understanding of the role of interaction
in information visualization,” IEEE Trans. Vis. Comput. Graph., vol. 13, no. 6, pp. 12241231, 2007.
[4] United States Census Bureau, “Vintage 2015 Population Estimates: Population Estimates,” 2016.
[Online]. Available: https://www.census.gov/popest/. [Accessed: 15-Oct-2016].
[5] Board of Governors of the Federal Reserve System, “Consumer Credit - G.19,” Economic Research
& Data, 2016. [Online]. Available: https://www.federalreserve.gov/releases/g19/. [Accessed: 05-
Nov-2016].
[6] J. Parenteau, R. L. Sallam, C. Howson, J. Tapadinhas, K. Schlegel, and T. W. Oestreich, “Magic
Quadrant for Business Intelligence and Analytics Platforms,” 2016. [Online]. Available:
https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204. [Accessed: 08-Dec-2016].
[7] BARC, “Comparison of the Best Business Intelligence Software Products in 2016,” 2016. [Online].
Available: https://bi-survey.com/business-intelligence-software-comparison. [Accessed: 08-Dec-
2016].
[8] B. Lantz, Machine Learning with R, Second. Packt Publishing Ltd, 2015.
[9] J. Heer, J. M. Hellerstein, and S. Kandel, “Predictive Interaction for Data Transformation.,” in CIDR,
2015.
[10] Qlik, “DASHBOARD , ANALYSIS , REPORTING ( DAR ),” 2013. [Online]. Available:
https://community.qlik.com/servlet/JiveServlet/download/38-77929/Technical Paper - DAR - US
LETTER.pdf. [Accessed: 20-Oct-2016].
[11] M. O. Ward, G. Grinstein, and D. Keim, Interactive data visualization: foundations, techniques, and
applications, Second. CRC Press, 2015.
[12] W. S. Cleveland and R. McGill, “Graphical perception: Theory, experimentation, and application to
the development of graphical methods,” J. Am. Stat. Assoc., vol. 79, no. 387, pp. 531554, 1984.
[13] S. Few, “Save the pies for dessert,” Vis. Bus. Intell. Newsl., pp. 114, 2007.
[14] E. R. Tufte, The Visual Display of Quantitative Information, Second. Cheshire, CT: Graphics Press,
2010.
[15] B. Shneiderman, “Tree visualization with tree-maps: 2-d space-filling approach,” ACM Trans.
Graph., vol. 11, no. 1, pp. 9299, 1992.
ResearchGate has not been able to resolve any citations for this publication.
Article
IntroductionThe traditional approach to representing tree structures is as a rooted, directed graph with theroot node at the top of the page and children nodes below the parent node with linesconnecting them (Figure 1). Knuth (1968, p. 305-313) has a long discussion about thisstandard representation, especially why the root is at the top and he offers several alternativesincluding brief mention of a space-filling approach. However, the remainder of hispresentation and most other...
Article
Even though interaction is an important part of information visualization (Infovis), it has garnered a relatively low level of attention from the Infovis community. A few frameworks and taxonomies of Infovis interaction techniques exist, but they typically focus on low-level operations and do not address the variety of benefits interaction provides. After conducting an extensive review of Infovis systems and their interactive capabilities, we propose seven general categories of interaction techniques widely used in Infovis: 1) Select, 2) Explore, 3) Reconfigure, 4) Encode, 5) Abstract/Elaborate, 6) Filter, and 7) Connect. These categories are organized around a user's intent while interacting with a system rather than the low-level interaction techniques provided by a system. The categories can act as a framework to help discuss and evaluate interaction techniques and hopefully lay an initial foundation toward a deeper understanding and a science of interaction.
Magic Quadrant for Business Intelligence and Analytics Platforms
  • J Parenteau
  • R L Sallam
  • C Howson
  • J Tapadinhas
  • K Schlegel
  • T W Oestreich
J. Parenteau, R. L. Sallam, C. Howson, J. Tapadinhas, K. Schlegel, and T. W. Oestreich, "Magic Quadrant for Business Intelligence and Analytics Platforms," 2016. [Online]. Available: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204. [Accessed: 08-Dec-2016].
Comparison of the Best Business Intelligence Software Products in 2016
BARC, "Comparison of the Best Business Intelligence Software Products in 2016," 2016. [Online].
Predictive Interaction for Data Transformation
  • J Heer
  • J M Hellerstein
  • S Kandel
J. Heer, J. M. Hellerstein, and S. Kandel, "Predictive Interaction for Data Transformation.," in CIDR, 2015.
Save the pies for dessert
  • S Few
S. Few, "Save the pies for dessert," Vis. Bus. Intell. Newsl., pp. 1-14, 2007.