not all incidents can be prevented. An incident response capability is therefore necessary for rapidly detecting incidents, minimizing loss and destruction, mitigating the weaknesses that were exploited, and restoring IT services
Incident response(or IR) is the portion of our security where we have to deal with, or respond, to adverse events that threaten security. While Business Continuity Planning can deal with outages caused by natural disasters, mechanical failures or similar events that can impact the business from performing; IR deals with events that impact the security of the organisation. Because performing incident response effectively is a complex undertaking, establishing a successful incident response capability requires substantial planning and resources. One of the best resources we can use are SANS Digital Forensics and Incident Response resources and the NIST 800-61r2 guidelines. The NIST guidelines would have provided the bulk of the research for this post as i went through it, understood it and simplify it here for easy reference. The full guidelines are here.
Before we go into IR lets go back to ITIL and the difference between an event and an incident. An event is any observable occurrence in a system or network. Events include a user connecting to a file share, a server receiving a request for a web page, a user sending email, and a firewall blocking a connection attempt. Adverse events are events with a negative consequence, such as system crashes, packet floods, unauthorised use of system privileges, unauthorised access to sensitive data, and execution of malware that destroys data. This guide addresses only adverse events that are computer security related, not those caused by natural disasters, power failures, etc. A computer security incident is a violation or imminent threat of violation of computer security policies, acceptable use policies, or standard security practices.
Types of incidents
There are several different types of incidents we can look at depending on what their goal is and how they are carried out. These can include;
Why do we need IR?
Having an incident response function becomes ever more vital as our society becomes more connected. With the exponentially growing amount of publicly known security vulnerabilities for all types of technology and the increasingly connected and interconnected organisation’s around the world, criminals have plenty of ways to attack your systems. Added to the amount of information and intellectual property companies store on their systems, the transactions and analytics carried out and the technical links between businesses across the supply chain, there are many ways a criminal or state organisation can benefit from compromising your networks. The regulatory, financial, reputational and productivity impacts could be catastrophic. The benefits of having Incident Response capabilities include;
- reduce the frequency of incidents by effectively securing networks, systems, and applications.
- communications often need to occur quickly, organisations should define communication guidelines so that the appropriate information is shared with the appropriate parties.
- Allowing more prioritised preparations for handling incidents by focusing on being incidents that use common attack vectors.
- External/Removable Media: An attack executed from removable media (e.g., flash drive, CD) or a peripheral device. Attrition: An attack that employs brute force methods to compromise, degrade, or destroy systems, networks, or services. Web: An attack executed from a website or web-based application. Email: An attack executed via an email message or attachment. Improper Usage: Any incident resulting from violation of an organisation’s acceptable usage policies by an authorised user, excluding the above categories. Loss or Theft of Equipment: The loss or theft of a computing device or media used by the organisation, such as a laptop or smartphone. Other: An attack that does not fit into any of the other categories.
- By emphasising the importance of incident detection and analysis throughout the organisation we can make sure unusual activities are investigated faster.
- Written guidelines for prioritising incidents reducing confusion and time lost during an incident while allowing for more effective resource allocation.
- Using the lessons learned process we gain value from incidents allowing us to iteratively improve our security.
So whats needed for establishing our incident response capability? In general our organisation should take a policy based approach to all aspects of information security to make us a well oiled machine! We can see this approach when we have high level description of the essential elements of information security, an understanding from users and system administrators about what they can and cannot do, and sanctions for infractions. We have this approach as it improves security by having everybody aware of what they must do and how they must do it. Incident response is no different and at a minimum it should include the following actions:
- Creating an incident response policy and plan
- Developing procedures for performing incident handling and reporting
- Setting guidelines for communicating with outside parties regarding incidents
- Selecting a team structure for an incident.
- Establishing relationships and lines of communication between the incident response team and other groups, both internal (legal department) and external (law enforcement agencies)
- Determining what services the incident response team should provide.
- Staffing and training for the incident response team
Incident Response process
There are 6 steps in the standard SANS Incident Response process.
Phase 1: Preparation.
Preparation phase comes before we have identified an attack. It is the stage where we start layering or defences and security controls to reduce the risk of a(successful) attack and where we start getting our policies and procedures created and distributed. Having all our staff trained up in what to do in an incident and ensuring any tools or infrastructure our incident response activity relies on are present and available is essential.
Phase 2: Identification.
At this point we have our Intrusion Detection, SIEM and other detective controls which will identify any attack taking place on your system. Here is where our analysts take in offences/tickets and investigate them to distinguish between false positives and legitimate incidents.
Phase 3: Containment.
If we reach this phase than an active attack is occurring. Before eradicating the attack and beginning our recovery we must stop it from spreading further throughout our estate. There are several strategies we can follow but whichever we apply we must be sure to keep a record of any actions taken. The strategies include;
- Shutting down a system
- Disconnect the infected asset from the network
- Change ﬁltering rules of ﬁrewalls
- Disabling or deleting compromised accounts
- Increasing monitoring levels
- Setting traps such as honeypot
- Striking back at the attacker’s system
Phase 4: Eradication.
At this point the attack is no longer spreading and we have an idea of the cause of the incident. We now proceed to eradicate that cause, for example eliminating the virus or worm. At this stage procedures are very important to ensure the eradication is completed as expected.
Phase 5: Recovery.
At this stage our goal is to return the compromised systems to their normal state. We should have recovery procedures in place tailored to potential incidents but the safest is a full rebuild of systems and restoring data from the last backup. Dont forget to patch impacted machines and again it is important to keep a record of any action taken and to keep users aware of the recovery status. This communication reduces confusion and rumours and allows us to advise them of major developments that may impact them. Adhering to local laws and internal policies in relation to media contact should also be considered with key employees being assigned this responsibility.
Phase 6: Lessons Learned.
The lessons learned phase is usually neglected but a post-mortem can be improvement to help us gain an exact understanding of the incident, its time line and the adequacy of our response. We can see what procedures worked and what did not, what the damage was and what we need to change going forward. This allows us to continually improve what we do and how we do it.
So how do we view our incident response as being successful? Good security should mean no incidents but everybody will suffer a breach at some point. So how do we measure success then? There are a few metrics we can use including;
- Number of incidents,
- Estimated financial loss – the more effective we are at incident response the less loss we will incur,
- After each incident our lessons learned phase should include honest self evaluation,
- Average time and resource required per incident,
- Documentation and procedure quality by team.
We will always suffer incidents but by having a solid methodology and procedures to guide us when it happens we can be in a better place to reduce the damage caused to us. It can take time for the teams involved to get comfortable with the process so war gaming and mock events can be vital to improving our efficiency. With this in place from Preparation through to Lessons Learned attackers will have a much harder time breaking through our defences and maintaining persistence.