Machine Learning Applied To Cybersecurity

New trends for risk detection based on Machine Learning

For all of us who are dedicated, in one way or another, to cybersecurity, it is clear that we are facing an escalation of arms between cybercrime, and those who defend us from them, the Blue Teams. It is not surprising; a 2019 study already shocked us when it was found that cybercrime already moves more money than drug trafficking, and the trend in 2020 and 2021 has been increasing. The professionalization of cybercrime only explains this, contrasted by the police authorities.

We can fall into the cliché of thinking that cybercrime is a kind of evil organization, in the Kingpin style in Spiderman. And although this cliché is correct in some cases, there is a large part of cybercrime that is not committed by malicious and criminal organizations but is perpetuated by companies or individuals who want to obtain an income illegitimately want to harm a company or an organization a person.

It is not uncommon to find cybercrime sponsored by a company that wants to increase the operating costs of its main competitor, creating traffic through bots to force it to scale its infrastructures in the cloud, and thus increase its cost; just as it is not strange to find employees, or former employees, taking advantage of the inside knowledge of the company to undermine their position, or harm it in any way.

Of course, there is also the better-known version of this story. A criminal organization impersonates a legitimate user and executes malicious actions on their behalf. This is possible thanks to the fragility of users, especially those negligent who do not apply the company’s security policies to their accounts or who use the same password in their social networks as in their corporate accounts, for example.

It is not too complicated to extrapolate password leaks from cloud services (social networks, accounts on gaming platforms, etc.) with professional accounts, and it is an increasingly common and dangerous entry vector.

And it is that, as you can guess, not everything is malware or ransomware. Not everything is chaos and destruction. There are many problems that are not as flashy as ransomware but can be even more damaging. Not surprisingly, a company takes an average of 280 days to identify and contain a data breach, according to a recent IBM study. There is, therefore, a lot of room for improvement when it comes to detecting and stopping data leaks.

Many of these leaks are caused by insiders, legitimate users of an organization who conspire against it for personal gain or to cause harm. Many times these insiders are not conscious users, but they are “puppets” moved by cybercriminals, generally compromised by poor security policies or by user negligence.

In order to be able to detect this type of attack sooner, we have to stop looking at what things are and start looking at how they behave. From the classic point of view of intrusion detection, a user who legitimately accesses the platform and has not had a strange behavior pattern from the point of view of authentication (for example, has not had 500 failed authentications in the last minute, is legitimate and there is nothing more to talk about. It is not a threat. But, as we have already anticipated before, there are legitimate users who behave maliciously. Therefore, we need to look at HOW a user behaves and not just how they authenticate.

And what technology is especially good at detecting behavior patterns? Artificial intelligence in general, and machine learning in particular. This is precisely the approach being taken in the OPOSSUM project.

The approach is simple in its conception but much more complex in its implementation. Basically, the OPOSSUM project acts as a reverse proxy between users and applications. In this way, it is in a privileged position to apply security checks, in the manner of a classic Web Application Firewall (WAF), but it is also in a privileged position to analyze how a user uses a certain application, since all communication between the user and the application, it necessarily goes through OPOSSUM.

An interesting novelty of this project is in the concept of context. Classically, cybersecurity solutions have focused on analyzing flat data. Let’s take an HTTP request as an example. For a WAF to determine if the request is malicious or not, it analyzes its content in isolation. This is effective in many cases but not for detecting abnormal behavior. It is true that there are products that take into account a set of data, for example, the last 50 requests, but even so, the data on which they are based remains scarce.

The OPOSSUM project, on the other hand, increases the context of the request, enriching the data on which the predictions will be made using external sources of threat intelligence.like Shodan, Spyse or Alienvault. These platforms add more information to the simple HTTP request, such as if that IP has been involved in security incidents or if the payload of a request contains a compromise indicator.

This enrichment is done in real-time, using the Apache Big Data stack (Hadoop, Kafka, Cassandra, etc.) as a technological base, as well as other interesting technologies, as shown in the following figure.