The Contribution of Machine Learning in Cyber security

Introduction

The benefits of artificial intelligence (AI) are now broadly acknowledged as a result of the increasing complexity of contemporary information systems and the resulting ever-increasing volume of big data. Particularly with the emergence of deep learning, machine learning (ML) technologies are already being used to address various real-world issues. Machine translation, travel and holiday suggestions, object identification and monitoring, and even varied applications in healthcare are fascinating examples of the practical successes of ML. Additionally, ML is correctly regarded as a technology enabler due to the significant potential it has demonstrated when used to autonomous vehicles or telecommunication networks (Zhang et al., 2022).

Machine Learning, is a key technology for both present and future information systems, and it is already used in many different fields. There is a huge gap between research and practise, but the application of ML in cyber security is still in its infancy. As a result of the current state of the art, which prevents recognising the function of ML in cyber security, this disagreement has its origins there. Unless its benefits and drawbacks are recognised by a large audience, ML’s full potential will never be realised.

Two independent methods—misuse-based and anomaly-based—can be used to detect cyber risks. The former, also known as signature- or rule-based, calls for identifying particular “patterns” that relate to a given danger on the grounds that subsequent threats will display the same patterns. The latter call for developing a concept of “normality” and seek to identify events deviating from it under the presumption that such deviations correlate to security incidents. These two methods of detection work in conjunction with one another: misuse-based approaches are very accurate but can only identify known threats; anomaly-based approaches tend to raise more false alarms but are more effective against new attacks (Elsisi et al., 2021).

The ability to use supervised or unsupervised ML algorithms is the distinctive feature of ML applications for cyber risk detection (schematically represented in Fig. 1). The former can serve as full detection systems but calls for labelled data that was developed under some degree of human oversight. The latter can only carry out auxiliary jobs and do not have a human in the loop. Labels may be simpler to obtain depending on the sort of data being analysed; for example, any layperson can tell a valid website from a phishing website, while it is more difficult to tell benign network traffic from malicious traffic.

fig1blogpa

Figure 1. Pros and Cons of Supervised and Unsupervised ML for Cyber Threat Detection.

 Machine Learning in Malware Detection

One of the most recognisable difficulties in cyber security is the struggle against malware. Since malware only affects one type of device, it can only be found by examining data at the host level, or through HIDS. Antivirus software can be viewed as a subset of HIDS, in fact. A particular malware version is designed for a certain operating system (OS). For more than 20 years, malware has targeted Windows OS the most due to its widespread use. Attackers are currently focusing their efforts on mobile devices running operating systems like Android (Annamalai, 2022).

Static or dynamic studies can both be used to detect malware. By only examining a given file, the former seek to identify malware without running any code. The latter concentrate on examining a piece of software’s behaviour while it is being used, typically by setting it up in a controlled environment and keeping an eye on its operations. Both static and dynamic assessments are shown schematically in Fig. 2, can acquire from ML.

fig2blogpa

Figure 2. Malware Detection via ML.

Machine Learning in Phishing Detection

One of the most frequent ways to infiltrate a target network is by phishing, which is still a serious danger to online security. Modern enterprises must prioritise the early identification of phishing efforts, which can be tremendously helped by ML. We specifically differentiate between two different uses of ML to detect phishing attempts: detection of phishing sites, where the aim is to identify web pages that are disguised to look like a legitimate website; and identification of phishing emails, which either point to a vulnerable website or stimulate a response that includes sensitive information (Geetha & Thilagam, 2021).

The primary distinction between these two methodologies is to the sort of data being analysed: although it is typical to examine an email’s text, header, or attachments, it is more normal to study a webpage’s URL, HTML code, or even visual representations for websites. Such applications are depicted schematically in Fig. 3.

fig3blogpa

Figure 3. Phishing Detection via ML.

Beyond Detection: Additional Roles of Machine Learning in Cybersecurity

There are numerous other functions in cyber security that ML can fill in addition to threat detection. Modern environments do indeed produce enormous amounts of data on a regular basis, and these data may originate from a variety of sources, including ML models. By using (extra) ML to analyse this data, it is possible to gain insights that raise the security of digital systems. Researchers can group all these complementing ML jobs into four tasks without losing generality: alert management, raw data analysis, risk exposure assessment, and cyber threat intelligence (Hameed et al., 2021). Schematic representation of machine learning and threat detection is given in Fig. 7.

fig7 blogpa

Figure 7. Additional tasks that can be addressed via ML in cybersecurity

The Future of Machine Learning in Cybersecurity

The state-of-the-art can be advanced in a countless number of ways, including by improving current performance, reducing known problems (such the inability to explain problems), and creating new ML-based cyber security applications (like integrating quantum computing).

6.1 Certification (Sovereign entities) – To ensure better transparency and reliability, regulatory bodies must enforce the development and adoption of standardized procedures that certify the performance and robustness of ML systems.

Data Availability (executives and legislation authorities) – To address the shortage of adequate data, companies should be more willing to share data originating in their environments, whereas regulation authorities should promote such disclosure by defining proper policies and incentives

Usable Security Research (scientific community) – The peer-review process should facilitate and enforce the inclusion of the material for replicating ML experiments. At the same time, such material should be evaluated to ensure its correctness potentially by a separate set of reviewers with more technical expertise.

Orchestration of Machine Learning (engineers) – Orchestrating complex systems that use (combinations of) ML and non-ML solutions is beneficial for cyber security. Hence, ML engineers and practitioners should clearly highlight how to combine all such components in order to maximize their practical effectiveness.

Conclusion

Information technology (IT) systems, including autonomous ones that are also actively exploited by hostile actors, are being used by modern civilization more and more. As a matter of fact, cyber threats are always changing, in the coming future attackers will have the means to seriously hurt or even kill people. Defensive mechanisms need to have the ability to quickly adapt to the changing settings and dynamic threat landscape in order to prevent such incidents and reduce the myriad hazards that can affect existing and future IT systems. To establish the groundwork for a greater deployment of ML solutions to safeguard present and future systems, this log aims to stimulate significant improvements of machine learning (ML) in the field of cyber security.

References

Annamalai, C. 2022. Factorials, Integers and Multinomial Coefficients and its Computing Techniques for Machine Learning and Cybersecurity. SSRN Electronic Journal. pp. 1–7.

Elsisi, M., Tran, M.-Q., Mahmoud, K., Mansour, D.-E.A., Lehtonen, M. & Darwish, M.M.F. 2021. Towards Secured Online Monitoring for Digitalized GIS Against Cyber-Attacks Based on IoT and Machine Learning. IEEE Access. (9). pp. 78415–78427.

Geetha, R. & Thilagam, T. 2021. A Review on the Effectiveness of Machine Learning and Deep Learning Algorithms for Cyber Security. Archives of Computational Methods in Engineering. (28)4,. pp. 2861–2879.

Hameed, S.S., Hassan, W.H., Abdul Latiff, L. & Ghabban, F. 2021. A systematic review of security and privacy issues in the internet of medical things; the role of machine learning approaches. PeerJ Computer Science. (7). pp. e414.

Zhang, Z., Ning, H., Shi, F., Farha, F., Xu, Y., Xu, J., Zhang, F. & Choo, K.-K.R. 2022. Artificial intelligence in cyber security: research advances, challenges, and opportunities. Artificial Intelligence Review. (55)2,. pp. 1029–1053.