Technical ReportPDF Available

Project Report - Consumer Financial Protection Bureau

December 2016

December 2016

DOI:10.13140/RG.2.2.29131.64802

Affiliation: National College of Ireland

Authors:

Stephen Redmond

National College of Ireland

The Consumer Financial Protection Bureau (CFPB) [1] is a United States government organisation that is focused on consumer protection in the area of financial products. The CFPB handles complaints from consumers about issues that they have with financial organisations and brings those to the attention of the companies and assist in getting those issues resolved. The Bureau has a number of metrics that they are interested in, the primary one being a 3-month rolling average, which they will compare to the same period in the previous year. The available Bureau data, along with additional public data, was used to help answer a number of questions that are useful for the Bureau, members of the public, and elected officials to have answered.

Gartner "magic quadrant" for BI products, 2016, showing Tableau, Qlik and Microsoft as leaders.

…

Qlik in-memory data model showing population and consumer credit (G19) data associated to the complaints data.

…

. Fields in the CFPB data

…

. Consumer Credit table Field Description

…

Four analysis visualisations-a time series trend of complaints, a map showing the states with highest complaints per person, a scatter showing population versus complaints and a correlation of complaints to consumer credit.

…

Figures - uploaded by Stephen Redmond

Content may be subject to copyright.

Content uploaded by Stephen Redmond

Content may be subject to copyright.

Data Visualisation

PROJECT REPORT

Stephen Redmond, 15021815 | MSCDA2 | 10th December 2016

PAGE 2

PAGE 3 
Contents 
Introduction .................................................................................................................................................. 4 
Objectives ................................................................................................................................................. 4 
Technical Overview ................................................................................................................................... 5 
Interactive data visualisation ................................................................................................................ 5 
Advanced analytics and visualisation .................................................................................................... 6 
Methods and Implementation ...................................................................................................................... 7 
Data Acquisition and Initial Analysis ......................................................................................................... 7 
CFPB Data .............................................................................................................................................. 7 
Consumer Credit data ........................................................................................................................... 8 
Population data ..................................................................................................................................... 8 
Extract, Transform and Load (ETL) ............................................................................................................ 9 
Design ........................................................................................................................................................ 9 
Dashboard ............................................................................................................................................. 9 
Analysis ............................................................................................................................................... 11 
Reporting............................................................................................................................................. 11 
Advanced Analytics ................................................................................................................................. 12 
R Analysis ............................................................................................................................................ 12 
Results and Conclusions .............................................................................................................................. 13 
Answers to questions based on the data ................................................................................................ 14 
Are the number of complaints increasing or decreasing over time? ................................................. 14 
What products are driving the change in complaint volume? ........................................................... 14 
What companies are people complaining about the most? ............................................................... 14 
From which States are the highest volume of complaints coming? ................................................... 14 
Are there correlations between different complaint types and market conditions? ......................... 14 
Conclusion ............................................................................................................................................... 15 
References .................................................................................................................................................. 16 
 
 
 

PAGE 4

Introduction

The Consumer Financial Protection Bureau (CFPB) [1] is a United States government organisation that is

focused on consumer protection in the area of financial products. It was created by the Dodd-Frank Act

of 2010 [2] which imposed several regulations on financial institutions. The CFPB handles complaints from

consumers about issues that they have with financial organisations and brings those to the attention of

the companies and assist in getting those issues resolved. They issue a monthly report, showing a high-

level snapshot of trends. They also make anonymised data available for any interested parties.

The Bureau has a number of metrics that they are interested in, the primary one being a 3-month rolling

average, which they will compare to the same period in the previous year. They are also interested in

month-to-month trends. Other metrics of interest are the time it takes for them to send complaints on to

companies (a measure of their own efficiency), the time it takes for companies to respond – tagged as

“timely”, and the level of disputed complaints.

In the project it has been investigated how publicly available data can be used to extend the reporting

options beyond just the numbers of complaints. By bringing in additional public information, it is shown

that there are some complaint types that seem to track the consumer credit market and some that do

not. This insight was discovered using an interactive visualisation tool. A more detailed examination of the

data was performed using the open-source R programming language.

The available data was used to help answer a number of questions that are useful for the Bureau,

members of the public, and elected officials to have answered.

Objectives

There are several objectives in this project.

- Acquire data suitable to calculate the metrics issued by the CFPB.

- Acquire additional data to deliver additional insight.

- Prepare the data and load it into a suitable data repository.

- Design a number of interactive visualisations to present the data to a business user.

- Create more advanced statistical analyses and present them visually.

- Answer a number of questions based on the data:

o Are the number of complaints increasing or decreasing over time?

o What products are driving the change in complaint volume?

o What companies are people complaining about the most?

o From which States are the highest volume of complaints coming?

o Are there correlations between different complaint types and market conditions?

PAGE 5

Technical Overview

To achieve the objectives above, it was important to choose technologies that supported the full end-to-

end implementation. In particular, the tool used for presenting the interactive visualisations should fully

support the seven categories of interactivity proposed by Yi et al [3].

Interactive data visualisation

The proposed solution also needs to easily accept data from more than one source. It is necessary, for

example, to be able to bring in both the CFPB complaint information as well as population information

from the US Census Bureau [4] and consumer credit information from the Federal Reserve [5].

Four products were initially examined for the purpose of this project:

- Tableau

- QlikView

- Qlik Sense

- Microsoft PowerBI

The list of products selected is consistent with the industry view of the marketplace, as demonstrated by

the Gartner Group’s “magic quadrant” report for business intelligence and advanced analytics

platforms[6]. An image of this quadrant is show in Figure 1.

Figure 1. Gartner "magic quadrant" for BI products, 2016, showing Tableau, Qlik and Microsoft as leaders.

PAGE 6

Table 1 below compares the features of each product as evaluated for this project.

Table 1. Visualisation product comparison against ETL and Yi’s interactivity categories

Feature

Tableau

QlikView

Qlik Sense

PowerBI

Extract, Transform and Load (ETL)

Limited

Yes

Limited

Select

Yes

Some

Yes

Explore

Yes

Reconfigure

Yes

Encode

Yes

Abstract/Elaborate

Some

Yes

Some

Filter

Some

Yes

Some

Connect

Yes

With coding

Yes

There are two features here that were important in making a decision on which product to use for this

project – ETL (used to load data into the tool) and Filter.

In both Tableau and PowerBI, it is quite straightforward to import the raw CSV files for analysis. However,

if there are anything other than minor transformation required, then an external step of pre-processing

must be performed. On the other hand, both of the Qlik products come with a rich, fully-featured, ETL

scripting language.

When filtering in Tableau, the default behaviour is to filter only on the object to which the filter is

associated. It requires a configuration step to be able to make this a global feature. Also, when data is

filtered in one selector, that filter is not reflected in the other selectors. A user could select, for example,

the continent of Europe in one selector box and then the country of Canada in another filter and have no

visual queue as to why there are no results. PowerBI does not share filtering across separate pages.

Further, selections in charts only brush, there appears to be no function to allow that to be made as a

more permanent selection.

In this evaluation, Qlik Sense scores well in every category. It is also a more modern looking tool than

some of its rivals, and delivers a better user experience. This is borne out by the recent BARC survey on

business intelligence products [7].

Advanced analytics and visualisation

To create the required advanced analytics, the R programming language was used. R is well suited to such

analytical requirements as it is cross-platform, low cost, and easy-to-use and can help with finding insights

in data [8].

There is always a decision to be made about whether to aggregate more, to remove influence from low-

level counting errors, versus the potential loss of detail due to the aggregation. In this case, it is suitable

to aggregate the data up to month and product level. We can use the BI product to perform that

aggregation for us so as to export it to a format that can be read into R.

PAGE 7

Methods and Implementation

This section details the technical implementation of the project. Firstly, the data acquisition is described

and how the data was analysed. Then we discuss the ETL process. Following that there is a description of

the design methodology and how it was applied. Finally, there is a discussion on some advanced analytics

that were performed.

Data Acquisition and Initial Analysis

There were three data sources used in this project, as show in Table 2.

Table 2. List of data sources used in the project

Data Source

Acquired from

CFPB Complaints database [1]

http://www.consumerfinance.gov/data-

research/consumer-complaints/#download-the-data

Federal Reserve Consumer Credit Report

(G.19) [5] Historical data

https://www.federalreserve.gov/releases/g19/HIST/

cc_hist_sa_levels.html

US Census Bureau population estimates [4]

https://www.census.gov/popest/

CFPB Data

The CFPB data contains 14 fields of data and 627,557 rows.

To help analyse the contents, the data was loaded into the Trifacta tool created by Heer et al [9]. A

screenshot of the data in this tool is presented in Figure 2.

Figure 2. Screenshot of the Trifacta tool showing analysis of the Complaints data.

PAGE 8

The Trifacta tool allows the data to be explored and information to be discovered. Based on this

exploration of the data, useful field data was revealed, as show in Table 3.

Table 3. Fields in the CFPB data

Field

% Density

# Distinct

Most Common

Product

100%

Mortgage (46%)

Sub-product

70.60%

Other Mortgage (18%)

Issue

100%

Loan.Modification…(25%)

Sub-issue

39.90%

Account status (4%)

Consumer complaint narrative

16.20%

103632

n/a

Company public response

20.50%

No public response (8%)

Company

100%

3847

Bank of America (13%)

State

99.20%

CA (16%)

ZIP code

99.20%

27699

48382 (0.35%)

Tags

14%

Older American (9%)

Consumer consent provided?

98.10%

N/A (68%)

Submitted via

100%

Web (54%)

Company response to consumer

100%

Closed with explanation (75%)

Timely response?

100%

Yes (98%)

Consumer disputed?

93.70%

No (77%)

The information in this table helped inform the design of the solution.

Consumer Credit data

The consumer credit table has only 4 fields. The details are show in Table 4.

Table 4. Consumer Credit table

Field

Description

Month

Calendar month

Revolving

Revolving credit, such as credit card or overdraft.

Nonrevolving

Non-revolving credit, such as mortgages.

Total

The total credit for that month.

Population data

The population data contains only two fields – State and Population. It represents the most up-to-date

estimate of the state populations from the US Census Bureau.

PAGE 9

Extract, Transform and Load (ETL)

The data was imported into the Qlik Sense in-memory data store using Qlik’s in-built ETL scripting tool.

This was relatively straightforward as the data was coming from well formatted text files.

The population count data was associated to the complaint data on the State Name field. The consumer

credit (G19) data is associated using the year and month. A screen-shot of Qlik Sense’s Data Model viewer

is show in Figure 3.

Figure 3. Qlik in-memory data model showing population and consumer credit (G19) data associated to

the complaints data.

Design

Qlik, the company who have created the Qlik Sense application, propose a method of designing data

applications called DAR – Dashboard, Analysis, Reporting [10].

This design method proposes structuring the application with a dashboard designed for users who need

a high-level overview, analysis views for those users who need more details, and report-style tabular views

for those users who want lower level information. This method has been followed in this project to create

the interactive visualisation.

The interactive visualisation has been created following this DAR design method.

Dashboard

The design for the dashboard is that it should display the most important metrics that a high-level user

would be interested in. The user should be able to see the information that they need to see very quickly

and not have to dive into too much detail.

Pre-attentive perception is supported by using both length and colour on the main metrics. Keeping too

much detail off the dashboard makes it easy to find the information quickly.

The finished dashboard design is shown in Figure 4.

PAGE 10

Figure 4. Qlik Sense dashboard showing the main four metrics plus sparklines for trend visualisation and

bar charts showing top products and companies for complaints.

The dashboard presents the four metrics – average complaint numbers, average time to send, percentage

timely and percentage disputed – for the period under examination, June to August 2016. The four bar

charts show, at a glance, the difference between the current period and the previous period.

The bar chart is ideal for this purpose as, as discussed by Ware et al, we have a high visual accuracy for

length comparisons [11]. Judgement of length has been identified by Cleveland and McGill, in their

seminal paper on the subject, as one of the primary elementary tasks [12]. A pie chart might have been

considered, however this is not a part-to-whole comparison and the pie would not be appropriate for this

representation. Few argues that a bar graph is better for measuring magnitudes [13].

The dashboard also presents sparklines – originally proposed by Tufte as “intense continuous time-series”

[14]. These give the user a view of how each metric has changed over the 15 months from June 2015 to

August 2016.

Finally, the dashboard gives the user two sorted bar charts that show the top products and top companies.

These charts are interactive and the user can drill into particular company or particular product. They can

also change the metric displayed from number of complaints to percentage change since the same period

last year. An alternative option for this display might be a treemap [15]. However, the percentage change

metric can have negative values which are not valid in a treemap.

PAGE 11

Analysis

Following the DAR design methodology, several additional analysis screens were designed to allow users

to discover more information. Figure 5 shows four of these views.

Figure 5. Four analysis visualisations – a time series trend of complaints, a map showing the states with

highest complaints per person, a scatter showing population versus complaints and a correlation of

complaints to consumer credit.

These show examples of some of the other common visualisations identified by Ward et al – line charts,

scatterplots and maps [11].

Reporting

The final stage of the DAR methodology is reporting. A number of reports were created to allow users to

view the information in a tabular fashion. These also allows the user to export data to another format

(e.g. Excel). The reports are shown in Figure 6.

PAGE 12

Figure 6. Tabular reports showing the four main metrics by both company.

Advanced Analytics

One of the line charts that was created for the analytics (Figure 5) compares the normalised credit total

(using a min/max formula) – a number that has steadily increased over the years represented in the data

– versus the normalised complaints total.

Selecting different products in this view appears to demonstrate that the trend of the number of

complaints does not always follow the trend of the credit total. So as to examine this in more detail, the

data was exported to a format that could be read into R and statistical processes were applied.

R Analysis

Once the table of data was loaded into R, it could be iterated over to calculate the correlation and

covariance values for each metric versus the Total Credit value. These are shown in Table 5.

Table 5. Correlation and Covariance of metrics to Total Credit

Metric

Correlation

Covariance

Other financial service

0.615158

0.022053

Student loan

0.653834

0.041918

Consumer Loan

0.961389

0.091858

Debt collection

0.492888

0.022036

Money transfers

0.823979

0.048367

Payday loan

-0.065520

-0.003150

Prepaid card

0.242387

0.006837

Bank account or service

0.681000

0.045585

Credit card

0.595281

0.039000

Mortgage

-0.040260

-0.001990

Credit reporting

0.929540

0.061012

Total Complaints

0.928317

0.070846

PAGE 13

It can be difficult to see exactly what is going on here, although Payday loan and Mortgage appear to show

both negative correlation and non-covariance. As discussed by Tufte, when presenting Anscombe’s

quartet, graphics can be more precise and show more than statistics [14]. The R plot outputs of four of

these analysis are shown in Figure 7.

Figure 7. Visualisation of consumer credit versus complaints by customer with correlation and covariance

calculations. Student loan, Mortgage, Credit reporting and Total for all products are shown.

Results and Conclusions

At the beginning of this report a number of objectives were established.

- Acquire data suitable to calculate the metrics issued by the CFPB.

- Acquire additional data to deliver additional insight.

- Prepare the data and load it into a suitable data repository.

- Design a number of interactive visualisations to present the data to a business user.

- Create more advanced statistical analyses and present them visually.

- Answer a number of question

PAGE 14

The required datasets were acquired and loaded into the Qlik Sense in-memory data repository. This

allowed a number of highly interactive visualisations to be generated.

To enable the more advanced statistical analyses of the data, the BI tool was used to extract aggregated

date to export to the R programming language.

Answers to questions based on the data

In the objectives, a number of questions were identified to be answered. The results of the analyses allow

those questions to be answered as follows:

Are the number of complaints increasing or decreasing over time?

For most products, the number of complaints are increasing over time. However, one of the traditionally

biggest complained about products, Mortgages, has seen a year-on-year reduction. It would be interesting

to examine further what is happening here.

Student loan had an unusual spike in 2016. This may be to do with changes in the way that federal loans,

mostly processed by Navient, began to be handled by the CFPB this year.

What products are driving the change in complaint volume?

The largest volume of complains is around credit reporting. This also had a 24.6% year-on-year increase.

It has been steadily increasing since reporting began in 2012.

Debt collection is the second largest complained about product, but it shows a 4.9% year-on-year

decrease.

What companies are people complaining about the most?

Equifax, Experian and TransUnion are the largest complained about companies, having almost 25% of all

complaints in the June to August period. All of them have shown an increase in 2016 over 2015. This

reflects that these three organisations have by far the biggest share of the credit reporting market.

From which States are the highest volume of complaints coming?

As would be expected, the most complaints come from California, Texas, Florida and New York – as these

are the states with the highest populations. However, Georgia shows the highest number of complaints

per head of population. This is something that could be looked into further.

Are there correlations between different complaint types and market conditions?

There appears to be a very close correlation between the amount of credit outstanding on the market

and the amount of complaints being made.

This close correlation and covariance would appear to indicate that companies are being complained

about at the same rate as they have been over several years. There is no improvement apparent. Further

study is needed to see what can be done about this situation.

PAGE 15

Conclusion

This project set out to acquire a number of dataset, create an interactive visualisation, and then answer

several questions.

It was important to establish the correct tool in which to implement the interactive visualisation and it

has been demonstrated that the Qlik Sense product is better versus competitors, especially when

considering both Extract, Transform and Load (ETL) process and also when looking at the seven categories

of interactivity proposed by Yi et al [3].

When designing dashboards, the principles of supporting pre-attentive processing, using colour and

length, have been followed. Best practices recommended by the literature, such as Tufte [14] and Few

[13], have been followed.

When designing the interactive analysis views, following Qlik’s DAR [10] methodology, chart types

recommended by Ward et al [11] – bar charts, line charts, scatterplots and maps - have been used.

To perform the more advanced correlation and covariance analysis, the data, aggregated by product and

month, was exported to the R programming language. R was used to create the statistical analyses and

produce visualisations to support it.

The results of this project reveal some interesting results in and around particular products. The

information that Mortgage product complaints are not increasing with the increase in consumer credit is

particularly interesting and will require additional information to establish the cause.

The increase of complaints about credit reports is concerning as there are just three companies who

command the majority of the market and these three companies are complained about more often than

any other companies in the data. As there is almost a monopoly of this product space between these three

companies, it would be hoped that the number of complaints would cease to co-vary with the increasing

market as time goes by, showing that the companies are improving their processes, but this has not

happened. Further work is needed to look into the issues that are being complained about so as to

improve this.

PAGE 16

References

[1] “Consumer Financial Protection Bureau,” 2016. [Online]. Available:

http://www.consumerfinance.gov/. [Accessed: 14-Oct-2016].

[2] United States Government Publishing Office, “Dodd-Frank Wall Street Reform and Consumer

Protection Act,” 2010. [Online]. Available: https://www.gpo.gov/fdsys/pkg/PLAW-111publ203.

[3] J. S. Yi, Y. ah Kang, J. Stasko, and J. Jacko, “Toward a deeper understanding of the role of interaction

in information visualization,” IEEE Trans. Vis. Comput. Graph., vol. 13, no. 6, pp. 1224–1231, 2007.

[4] United States Census Bureau, “Vintage 2015 Population Estimates: Population Estimates,” 2016.

[Online]. Available: https://www.census.gov/popest/. [Accessed: 15-Oct-2016].

[5] Board of Governors of the Federal Reserve System, “Consumer Credit - G.19,” Economic Research

& Data, 2016. [Online]. Available: https://www.federalreserve.gov/releases/g19/. [Accessed: 05-

Nov-2016].

[6] J. Parenteau, R. L. Sallam, C. Howson, J. Tapadinhas, K. Schlegel, and T. W. Oestreich, “Magic

Quadrant for Business Intelligence and Analytics Platforms,” 2016. [Online]. Available:

https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204. [Accessed: 08-Dec-2016].

[7] BARC, “Comparison of the Best Business Intelligence Software Products in 2016,” 2016. [Online].

Available: https://bi-survey.com/business-intelligence-software-comparison. [Accessed: 08-Dec-

2016].

[8] B. Lantz, Machine Learning with R, Second. Packt Publishing Ltd, 2015.

[9] J. Heer, J. M. Hellerstein, and S. Kandel, “Predictive Interaction for Data Transformation.,” in CIDR,

2015.

[10] Qlik, “DASHBOARD , ANALYSIS , REPORTING ( DAR ),” 2013. [Online]. Available:

https://community.qlik.com/servlet/JiveServlet/download/38-77929/Technical Paper - DAR - US

LETTER.pdf. [Accessed: 20-Oct-2016].

[11] M. O. Ward, G. Grinstein, and D. Keim, Interactive data visualization: foundations, techniques, and

applications, Second. CRC Press, 2015.

[12] W. S. Cleveland and R. McGill, “Graphical perception: Theory, experimentation, and application to

the development of graphical methods,” J. Am. Stat. Assoc., vol. 79, no. 387, pp. 531–554, 1984.

[13] S. Few, “Save the pies for dessert,” Vis. Bus. Intell. Newsl., pp. 1–14, 2007.

[14] E. R. Tufte, The Visual Display of Quantitative Information, Second. Cheshire, CT: Graphics Press,

2010.

[15] B. Shneiderman, “Tree visualization with tree-maps: 2-d space-filling approach,” ACM Trans.

Graph., vol. 11, no. 1, pp. 92–99, 1992.

ResearchGate has not been able to resolve any citations for this publication.

The Visual Display of Quantitative Information

Article

Oct 1986

Machine learning with R

Book

Jan 2013

Brett Lantz

Graphical perception: Theory, experimentation, and application to the development of graphical methods

Article

Jan 1984

Interactive Data Visualization - Foundations, Techniques, and Applications

Book

Jan 2010

Tree Visualization with Tree-Maps: A 2-D Space-Filling Approach

Article

Jan 1992

Ben Shneiderman

IntroductionThe traditional approach to representing tree structures is as a rooted, directed graph with theroot node at the top of the page and children nodes below the parent node with linesconnecting them (Figure 1). Knuth (1968, p. 305-313) has a long discussion about thisstandard representation, especially why the root is at the top and he offers several alternativesincluding brief mention of a space-filling approach. However, the remainder of hispresentation and most other...

Toward a Deeper Understanding of the Role of Interaction in Information Visualization

Article

Dec 2007

Even though interaction is an important part of information visualization (Infovis), it has garnered a relatively low level of attention from the Infovis community. A few frameworks and taxonomies of Infovis interaction techniques exist, but they typically focus on low-level operations and do not address the variety of benefits interaction provides. After conducting an extensive review of Infovis systems and their interactive capabilities, we propose seven general categories of interaction techniques widely used in Infovis: 1) Select, 2) Explore, 3) Reconfigure, 4) Encode, 5) Abstract/Elaborate, 6) Filter, and 7) Connect. These categories are organized around a user's intent while interacting with a system rather than the low-level interaction techniques provided by a system. The categories can act as a framework to help discuss and evaluate interaction techniques and hopefully lay an initial foundation toward a deeper understanding and a science of interaction.

Magic Quadrant for Business Intelligence and Analytics Platforms

Dec 2016
8

J Parenteau
R L Sallam
C Howson
J Tapadinhas
K Schlegel
T W Oestreich

J. Parenteau, R. L. Sallam, C. Howson, J. Tapadinhas, K. Schlegel, and T. W. Oestreich, "Magic Quadrant for Business Intelligence and Analytics Platforms," 2016. [Online]. Available: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204. [Accessed: 08-Dec-2016].

Comparison of the Best Business Intelligence Software Products in 2016

Jan 2016

BARC, "Comparison of the Best Business Intelligence Software Products in 2016," 2016. [Online].

Predictive Interaction for Data Transformation

Jan 2015

J Heer
J M Hellerstein
S Kandel

J. Heer, J. M. Hellerstein, and S. Kandel, "Predictive Interaction for Data Transformation.," in CIDR, 2015.

Save the pies for dessert

Jan 2007
1-14

S Few

S. Few, "Save the pies for dessert," Vis. Bus. Intell. Newsl., pp. 1-14, 2007.

Project Report - Consumer Financial Protection Bureau

Abstract and Figures

Recommended publications

Communication Between a Local Microcomputer and a Mainframe Computer (IBM).

Going Public: Using the Cloud to Improve Project Delivery

MANIPULATING DEVICES

National Aeronautics and Space Administration