Graph databases: the key to foolproof fraud detection?

Graph databases: the key to foolproof fraud detection?

FEATURE Graph databases: the key to foolproof fraud detection? Emil Eifrem, Neo Technology Emil Eifrem Technology has made life more convenient – b...

824KB Sizes 156 Downloads 257 Views

FEATURE

Graph databases: the key to foolproof fraud detection? Emil Eifrem, Neo Technology

Emil Eifrem

Technology has made life more convenient – but it has also made life more insecure. This is especially true in the context of our growing use of cyberspace as the place we choose to shop, entertain ourselves, manage our life savings and spend our money. As that money has shifted to bits and bytes, criminals have followed behind, causing a global crisis of fraud and deceit. UK businesses alone lose just under a £100bn annually to fraud, according to a report by the University of Portsmouth’s Centre for Counter Fraud Studies and accountancy firm PKF Littlejohn.1 However, the same study found that fraud can be reduced by up to 40% through countermeasures. While no fraud prevention measure can claim to be 100% foolproof, procedures can be put in place by responsible businesses to dramatically curb risk.

Joining the dots One of the first and most useful ways to do that is to look past the individual data points to the connections between them – joining the dots to uncover the suspicious pattern beyond. The problem is that these dots and that picture go unmonitored, when they are the best place to start in terms of uncovering patterns of malicious behaviour. Unfortunately, making connections between the dots (the data) and making sense of these links isn’t as easy as it sounds. You’re not going to achieve it by simply gathering more data. Meaningful insight can be derived from these complex data sets – such as customer banking or online credit card transactions – by looking at them in a new way, which draws out new connections and hidden patterns in a visual way to build up a picture you were not able to see before.

March 2016

A highly promising technology that delivers that precise approach is the graph database. Unlike most other ways of viewing data, graph databases have been developed to exploit relationships in data. That means they can expose patterns usually difficult to identify using traditional representations, such as tables. And a growing number of enterprises, from banks and financial institutions to online retailers, are using them to solve a variety of data problems, and in particular to identify advance fraud scenarios in real time. Graphs are enabling businesses to develop nextgeneration fraud detection systems based on connected intelligence – today.

Key points in common Graph databases, as opposed to traditional business (relational) database management systems, excel at managing the kind of highly connected data that today’s digital and social media world is generating in such gigantic volumes. That explains their growing attraction to businesses: last year, analyst group Forrester Research predicted that just over a quarter of enterprises will be using these databases by 2017. Meanwhile, graph databases have quietly been powering the web for some time – with leading consumer and e-commerce sites owing much of their climb to dominance to using graph technology to capture and rapidly exploit online data relationships. But while early graph database converts such as Google and LinkedIn had

to build their own in-house graph data stores from scratch, off-the-shelf graph databases are now available to any business wanting to exploit data.

“There are various types of online criminality to be addressed – banking fraud, insurance fraud, e-commerce fraud, for instance. What characterises them all is layers of trickery to cover up the crime that can only be discovered through connected analysis” Especially when it comes to fraud. There are various types of online criminality to be addressed – banking fraud, insurance fraud, e-commerce fraud, for instance. What characterises them all is layers of trickery to cover up the crime that can only be discovered through connected analysis. Experience is showing us that in each of these types of fraud, graph databases can assist existing methods of fraud detection, making detection much more feasible and cost-effective.

Stopping cyber-criminals First party ‘bust-out’ frauds have a serious financial impact on banking institutions.2 This type of illegal deceit is estimated to account for as much as one-quarter or even more of total consumer credit writeoffs in the United States alone, while it’s thought that 10 to 20% of unsecured bad debt at leading US and European banks is down to this form of deception. You might be surprised to learn that there needs to be only two people to carry out a bust-out fraud. Fraudster One lives Computer Fraud & Security

5

FEATURE 2. They often track multiple internal departments, such as loans and credit cards, making it more difficult to detect suspicious activities, while often the fraudsters work in networks together, adhering to the thresholds that might otherwise trigger a red flag, making them hard to spot. 3. The speed of execution of a fraud is very fast – gangs roll up their fake accounts and disappear in seconds. In first-party fraud, the only person that actually gets hurt is the bank, so cases of bust-out fraud don’t get the same attention from the media and general public that the money-laundering and identity theft cases do, which makes them that much easier to quietly perpetrate. The basis for optimism in all this, however, is that these types of scams are very susceptible to graph-based methods of fraud detection.

Uncovering the crime

Figure 1: ‘Bust-out’ fraud, as defined by Experian.

at 1, Sample Street, Sample Town (his real address) and gets a prepaid phone, while Fraudster Two lives at 21, Sample Street, Sample Town (also her real address), and also gets a prepaid phone number. Sharing only phone number and address (two pieces of data), they can combine these to create four synthetic identities with fake names with 4-5 accounts for each synthetic identity – a total of 18 accounts. Assuming an average of £6,000 in credit exposure per account, the bank’s loss could be as high as £108,000 as a result, from just two criminals. There are three key reasons why firstparty fraud is so hard to identify and block: 1. Fraudsters understand exactly how banking procedures work and so disguise themselves to look like normal customers. 6

Computer Fraud & Security

Because of the speed with which all of this activity happens, most banking systems do not detect it fast enough to react and block purchases and accounts. First-party fraud, whether originating from individuals or more organised crime syndicates, is difficult to detect via traditional application screening and account management processes as these are not designed to look the right patterns – in this case, shared identifiers. This is where graph databases come into their own. Unmasking fraud rings with traditional relational database technologies requires modelling the data as a set of tables and columns, then carrying out a series of complex joins and self-joins. These queries are very complex to build, and also expensive to run. Scaling them so that they support real-time access creates some major technical challenges, with performance becoming increasingly poor as the size of the ring increases and as the total data set expands. Graph databases have shown themselves to be an ideal tool for overcoming these challenges, while powerful query lan-

guages such as Cypher provide a simple semantic for detecting fraud rings and navigating connections in memory in real time. The graph data model (see Figure 2) shows how the data appears to the graph database, and illustrates how fraud rings can be detected by traversing the graph. Assisting existing fraud detection infrastructures to support ring detection can be achieved by running appropriate entity link analysis queries using a graph database, enhanced by running checks during key stages in the customer and account lifecycle, ideally at the time the account is created, as soon as a credit balance threshold is hit, and finally when a cheque bounces.

“Graph databases provide a unique ability to crack open a host of key fraud patterns, in real time, either in groups or on an individual basis” Real-time graph traversals bound to these types of events can help banks identify probable fraud rings, during or even before the bust-out actually occurs. This is a step eminently worth taking, as the faster banks can see potential fraud, the faster they can stop it. This is even more desirable, as many of the fraudsters are working beyond the competence of banks’ detection systems. Again graph technology can help here – the time margins for detecting fraud are getting slimmer, increasing the call for real-time solutions which graphs are key to enabling.

Connected analysis The second advantage is that of connected analysis. Criminals attack when they see a security hole. Traditional technologies, while still providing a level of protection, are not designed to detect increasingly clever and complex fraud operations. Graph databases provide a unique ability to crack open a host of key fraud patterns, in real time, either in groups or on an individual basis. That’s because these sneakily hidden connections

March 2016

FEATURE

Figure 2: The graph data model.

become exposed when viewed by a system designed to manage connected data. Using real-time graph queries have proved a powerful tool for detecting multiplex fraud, such as the famous ‘Swiss Leaks’ scandal.3 Here, graph databases have been used to spot hidden patterns in a vast dataset that would have gone undetected otherwise. Back in 2014, Le Monde investigative reporters Gérard Davet and Fabrice Lhomme found themselves with access to a valuable set of data. The problem was to translate data from thousands of HSBC account holders located in more than 20 countries, with connections spread among an even larger number of files. Realising the size of the task in hand, the reporters approached the International Consortium of Investigative Journalists (ICIJ) for assistance – a move that ended up setting off one of the biggest digital crossborder journalistic collaborations ever. Using graph technology, the ICIJ was able to easily visualise the networks around clients and accounts. In the end, the Swiss Leaks revelations were shared with the public and regulators

March 2016

right across the globe – an example of investigative data journalism that’s led to the project being awarded the prestigious Data Journalism Award (Investigation of the Year category) 2015 by the Global Editors Network. Follow-up stories are still appearing, as even more connections are unearthed. No wonder that graph technology has now become an integral part of the ICIJ for all future data-driven probes.

Many types of fraud There are many types of fraud, however, and graphs work across them all. In the UK, for instance, insurance fraud attracts sophisticated criminal rings, which are very effective in circumventing fraud detection measures, Once again, graph databases can be a powerful tool in combating this kind of multi-party fraud. In a typical scenario, rings of fraudsters work together to stage ‘accidents’, complete with fake drivers, fake passengers, fake pedestrians, even fake witnesses, to deceive the insurer. The next frontier in insurance fraud detection is to use social network analy-

sis to uncover such activity. For example, connected analysis is capable of revealing relationships between people who are otherwise acting like perfect strangers. As in the bank fraud example, such social network analytics tends not to be a strength of relational databases, as discovering the ring requires joining a number of tables in a complex schema such as accidents, vehicles, owners, drivers, passengers, pedestrians, witnesses and providers, and joining these together multiple times in order to uncover the full picture. As such operations are so complex and costly, particularly for large data sets, this crucial form of analysis is often overlooked with existing technology – but here again, the graph approach offers a fresh way forward. Finally, there is e-commerce fraud conducted by a lone operator. Consider an online transaction with the following identifiers: user ID; IP address; geo location; a tracking cookie; and a credit card number. One would typically expect the relationships between these identifiers to be fairly close to one-to-one, with some variations due to shared machines, families sharing a single credit card number, and so on. However, as soon as the relationships begin to exceed a reasonable number, fraud is often the real cause. As in the first-party bank fraud and insurance fraud examples, graph databases are very helpful for carrying out pattern discovery in real time across just these kinds of data sets. By putting checks into place and associating them with the appropriate event triggers, this type of scheme can be uncovered before the fraudster is able to inflict significant damage. The conclusion has to be that graph databases are a powerful tool that should be considered by any and all financial services firms and by those providing e-commerce looking for proven new methods and technique to uncover cyber fraud rings and stopping advanced digital scams in real time.

Computer Fraud & Security

7

FEATURE About the author Emil Eifrem is CEO of Neo Technology and co-founder of the Neo4j project (http:// neo4j.com). Before founding Neo, he was the CTO of Windh AB, where he headed the development of highly complex information architectures for enterprise content management systems. Committed to sustainable open source, Eifrem sees his role at Neo as steering a balanced path between free availability of powerful graph database

solutions and enterprise-level options where mission-critical capability is required. He is a frequent conference speaker and a wellknown author and blogger on NoSQL and graph databases.

References 1. ‘The Financial Cost of Fraud’. PKF. Accessed Mar 2016. www.pkflittlejohn.com/the-financial-cost-offraud-2015.php.

Online recruitment services: another playground for fraudsters

2. ‘Bust-out fraud’. Experian, 2009. Accessed Mar 2016. www.experian. com/assets/decision-analytics/whitepapers/bust-out-fraud-white-paper.pdf. 3. Ryle, Gerard et al. ‘Banking giant HSBC sheltered murky cash linked to dictators and arms dealers’. ICIJ, 8 Feb 2015. Accessed Mar 2016. www. icij.org/project/swiss-leaks/bankinggiant-hsbc-sheltered-murky-cashlinked-dictators-and-arms-dealers.

Sokratis Vidros

Constantinos Kolias

Sokratis Vidros, University of the Aegean, Greece; Constantinos Kolias, George Mason University, US; Georgios Kambourakis, University of the Aegean It is increasingly common for someone actively seeking a job online to come across appealing but fake ads offering high wages, flexible hours, teleworking and career growth opportunities. Usually, job ads with such favourable conditions are examples of Online Recruitment Fraud (ORF) which attempt to collect unsuspecting candidates’ personal information. ORF is a relatively new field of variable severity that can escalate quickly to extensive scams. A survey conducted by FlexJobs in 2015, revealed that for every legitimate job posting, there were around 60 fraudulent ones, yet only 48% of applicants stated they were even aware of employment scams while searching for new career opportunities online.1 In addition, 7% of job seekers have been victims of employment scams at least once, despite warnings from the FBI and Better Business Bureau. Workable, a widelyused online recruiting software, reported that well-crafted fraudulent job ads for blue collar or secretarial positions in densely populated countries can collect up to 1,000 resumés per day.2 Interestingly, in 2012 a job seeker 8

Computer Fraud & Security

received more than 600 resumés in one day after posting a fake job ad on Craigslist in order to identify his competitors.3 That same year, the Australian Bureau of Statistics published a report about personal fraud stating that 6 million people were exposed to several forms of scam, including employment scams, during any given year.4 The report concluded that the Australian economy has been severely affected by this exposure.

Georgios Kambourakis

to demonstrate the huge opportunity that lay in building online recruiting tools. Dozens of companies followed and built online automated systems – Applicant Tracking Systems (ATS) – to help organisations recruit talent and job seekers to find jobs. These tools made the hiring process more immediate, accurate and cost efficient.

Moving to the cloud

“Online hiring services attracted the interest of scammers, whose malicious behaviour aiming to steal personal information both inflicts economic damage and harms the reputation of the ATS stakeholders”

Corporate hiring has recently moved to the cloud. LinkedIn was the first service

On the downside, the increasing adoption of ATS has also led to more incidents March 2016