There are a group of professionals fundamental to the success of Open Banking that aren’t getting the recognition they deserve. The data scientists. These unsung heroes have spent upto 18 months understanding and analysing the fire hose of transaction data coming out of every bank, perfecting the standardisation and categorisation of that data so us mortals can understand and use it.
Making bank account information useable isn’t a small feat, it’s a huge and ongoing challenge but one that is essential to ensure the accuracy of the products and services that Open Banking enables.
To celebrate the data scientists of Open Banking we caught up with Alex Sadler, Head of Data Science, Risk and Compliance at OpenWrks. Alex has provided his insight into the types of data returned by banks, the mechanism of turning that messy data into a standardised and categorised format and the effect that GDPR will have within the financial services market.
I’ve worked in the financial services industry for 15 years with my background primarily in credit risk, across both big corporates and start up businesses.
Throughout my career, I’ve focused on building models through sophisticated algorithms to profile people and small businesses with a view to assessing what their risk is - essentially risk profiling.
About three years ago, I joined OpenWrks where I’ve been profiling consumers as well as Directors and their small businesses, helping them understand what information is available about them and what that information means for them. Whether that’s via the credit reference agencies or across a wider digital profile including key areas online and across social media.
Now I’m adding Open Banking data to that equation. My team of Data Scientists develop, build and continuously improve our models for analysing Open Data which is gathered from a variety of sources including Open Banking.
I am also responsible for our rules and policies regarding risks to our business with customer and data security at the very top of our priority list.
If you visualise looking at a bank statement in electronic form, you’ve got individual transactions listed, the date, a description of the transactions and an amount. Some banks will give people access to more information in their online banking and if they do, there’s a requirement under PSD2 to make that extra information available via Open Banking. As you can imagine, there is a lot of variability even in this information from the different banks.
As well as that, each bank also varies the amount of historic data that they make available to their customer, some go back three years, some five years, so very quickly on a single account you can see a significant amount of information.
PSD2 rules are allowing people and small businesses to give third party providers (TPPs) like us, access to this financial information, but only if those TPPs are regulated to do so by the FCA.
This is the principle of PSD2. To facilitate transactional information to be passed to businesses people trust, in a secure and transparent way.
There’s no dressing it up, it’s tricky!
There is an Open Banking standard that defines the order and basic structure that the participating banks, (the CMA9) have to use when returning Open Banking data. However, the data content itself is generally not consistent across banks.
As a simple example, a date might be presented in a slightly different way by each bank. Months could be in letters and a two digit year or it could be a four digit year and month in numbers, slashes or dots. There are many different formats that a bank may use to return the date that don’t collectively match and this gets even more complex when it comes to types of balance and transaction descriptions.
This is why we do what we do.
As an Account Information Services Provider (AISP) offering standardisation, OpenWrks integrates with each participating bank, we manage those individual API connections and have created a standardised format for that data. Our API is essentially a one-stop shop where businesses can easily access customer account data, regardless of the bank their customer uses and make that data available in their own systems and processes in a consistent and structured format.
Categorisation is critical for businesses. Categorisation goes leaps and bounds over just seeing what’s in a bank statement, you’re able to collate transactions into a sets of relevant and useful groups that can subsequently be used to inform practical use cases such as income verification, affordability assessments, budgeting and cash forecasting.
This is the value add to most products, services and processes we’re seeing. Getting quickly and easily to the answer so that people can get the the financial products they need the most, whether that’s a faster mortgage, more affordable credit or a spending forecast and savings plan. Categorisation make the transaction data very usable.
The average person is likely to see everyday benefit simply through more streamlined, lower risk processes. Open Banking replaces arduous tasks such as providing photocopies of bank statements, taking a photo of and electronically transmitting statements or even removing the need for a 30 minute income and expenditure conversation.
Having access to this data allows those within the financial services to enhance their products or services, making them easier to access, more secure and scalable.
Having bank verified income and expenditure data for customers means that any decision a business makes is based on that customer’s most up to date and accurate financial position. It means any assessments they’ve made will be more sustainable and ensure they’re not putting customers in undue financial stress. Also, with the forthcoming changes to the FCA handbook on assessing affordability, Open Banking offers a chance for businesses to get ahead of the curve from a regulatory standpoint.
As well as these key improvements to data accuracy, there are also operational efficiencies. We talked about how Open Banking can help customers spend less time sending in statements or spending time on phone calls but obviously this extends to the businesses as well. It means less time keying in data or chasing customers for information.
It truly represents a win / win.
What sets our data analysis apart is twofold; firstly the fact that we’ve been analysing and organising unstructured and disparate data sets from different sources for years, but secondly how we use layers of analysis and supervised machine learning algorithms to systematically calculate the OpenWrks categorisation.
A key differentiator for us is that, as well as just identifying where a transaction has come from or is going, we also use metadata gathered from the additional Open Banking variables to create facts about the account behaviour which feeds into and enhances the categorisation model.
For example, if I want to identify a mortgage payment, I’m expecting it to have a certain frequency, probably monthly, I’m expecting it to be a certain amount that’s within a range that would be acceptable for a mortgage payment. We use this kind of information along with the fact that we’re expecting them to be paying a financial services provider that offers mortgages to make sure we get more accurate categories rather than just looking for a keyword or a name.
We’ve laid the foundations for AI and machine learning in our models to help them continuously improve. Plus, we’re able to continually improve because we’ve built a feedback mechanism that allows us to understand if our model isn’t accurate, so that we can proactively analyse, iterate and adjust quickly.
Initially there needs to be human interaction somewhere along the line, whether that be from my team or from an end user, but as more and more volume goes through the model it will become even smarter over time and able to make rational improvements on its own.