APTs ♥ Your Cloud Data

In today's blog we're going to take a look at how harvest.ai has detected nation-state affiliated attacks with sophisticated Content Aware UEBA (user and entity behavioral analytics). These attacks are interesting as attackers utilized compromised user accounts on platforms such as Google for Work, Box, Dropbox and Office 365 to attempt to steal data.

One of the key characteristics of modern sophisticated attack is the use (or abuse) of legitimate tools by attackers and a distinct reduction in malware. In these attacks there is rarely a single piece of evidence to prove account compromise.

What challenges do these this create for organizations? It means that traditional enterprise security and even UBA solutions that are looking just at authentications and accesses without understanding content will have a hard time differentiating between what is just anomalous and what is malicious- a critical step towards stopping attacks before data can be stolen.

Here are some tactics we have seen in recent attacks:

  • Attackers use stolen credentials to search through email, personal and organization share drives to gain intel and move laterally towards their target
  • Attackers use the stolen credentials to access cloud platforms such as Google, Office 365, Box, Dropbox in addition to on-premise file shares
  • In some cases, attackers may tunnel through the user's endpoint, using their logged in credentials to defeat MFA (multi-factor authentication) and IP geolocation
  • Attackers access data during normal business hours
  • Sustained access over months
Over 8.2% of all business documents in an average Fortune 1000 are viewable by all users within the organization

Fortunately, 

  1. Attackers use stolen credentials to search through the organization's email, personal and network share drives to gain intel and move laterally (More on this below!)
  2. Attackers data access patterns are fundamentally different than the legitimate user accounts they have compromised
  3. Attackers often slip up over time and access from IPs not commonly used by the user account

Why can #1 above be a good thing? An attacker is going to do everything they can do to blend in with the noise and avoid detection, but at the end of the day they will need to access information in a way that is different than the target user. By understanding what kinds of information is important to the attacker and what kinds of information that each user and group typically accesses (we do this with natural language processing and AI), we can both narrow in on attacks quickly and reduce false positives.

APTxx example

In our first year, we've detected two (attributed) nation state affiliated attacks with our UBA and DLP analytics-- each of them involved compromising Cloud accounts such as Google for Work, Office 365, Box, etc. as part of their strategy. We believe the attacker behind the compromised account below is an APT that security firm Mandiant first discovered. Let's take a look at a (sanitized) compromised user's account for some insight into what sophisticated attackers are looking for and how we can catch them.

Our analytics (Macie) start with learning what kinds of data that each user and their peer groups access, as well as access patterns such as where they typically access from and how often. Here's a high-level overview of our compromised user account:

We can see quickly that our analytics have learned that the user typically accesses Finance and HR related documents, which are important to the business, but not critical. We can also see that the user has a "Bronze" data access categorization, which is the lowest risk of a Bronze, Silver, Gold, Platinum risk framework applied to each user by our analytics. As we'll see, a lower risk classification just means that the user account typically accesses less business critical data, it does not mean that an account compromise will not create risk for the organization. 

Our AI learns to classify new content and jargon across an organization, getting over one of the biggest roadblocks for traditional DLP

content aware behavioral analytics

When looking for changes in behavior of a user or group, we often see attackers use the compromised account's credentials to search for documents that ARE highly risky to the organization. These are also likely to be different kinds of documents than the user typically accesses. What kind of features can we look for to identify a compromised account? For a start:

  • The compromised account may start searching for data shared across the organization
  • This data will likely different than what the user or their peer group typically accesses
  • This data may be classified by our analytics as very important to the organization
  • Attacker may be going through new or different IP ranges

So we're looking at some pretty highly dimensional features, and it can be difficult to show how our analytics are working behind the scenes. To visualize this we'll use a graph model of the user's accesses and see if we can see where the compromise happened using Gephi.

In the past year, we've seen this user access 538 documents from 7 different ISPs. Edges are created between the documents when they were accessed within an hour of each other-- this allows us to link the nearly 50% of document accesses that don't have a recorded source IP address from the Cloud SAAS provider. The graph is colored by the ISP that the data was accessed from. Now we can apply a layout algorithm to group documents accessed at similar times together

Now that we've grouped documents accessed over similar timelines together, we can look for indications of compromise. For example: Is the user accessing from a risky ISP? What is the business value of docs the user is looking at? Is the user looking at other user's documents or their own? Fortunately our analytics are really fast at looking for anomalous changes across multiple features- and have identified a risky access below:

Results of a multi-dimensional analytic (content, business value, time, location, peer anomaly) to find accesses that match patterns of a compromised account searching across an organization to steal data

Results of a multi-dimensional analytic (content, business value, time, location, peer anomaly) to find accesses that match patterns of a compromised account searching across an organization to steal data

As you can see above, we have one distinct cluster above (.37% of all docs!) where the user matched on all of our criteria-  essentially the compromised account being used to access very important business data that is also different than what the user typically accesses and belongs to other users. Additionally, we see a few connections in there from Linode- a cloud compute provider that the attacker was bouncing their connection through. Gotcha! Now for the good stuff-- what is our nation-state affiliated actor looking for?

  The screen shot above shows accesses to some very sensitive content, and a huge change from the types of HR and Financial documents our user typically accesses. The attacker appears to be focused on Data Center Technology and Information Technology Topics. Additional changes that indicate compromise are that the documents accessed above are not owned by our user, and the large increase in volume of files accessed over a short period of time is also unusual for the user.

 

The screen shot above shows accesses to some very sensitive content, and a huge change from the types of HR and Financial documents our user typically accesses. The attacker appears to be focused on Data Center Technology and Information Technology Topics. Additional changes that indicate compromise are that the documents accessed above are not owned by our user, and the large increase in volume of files accessed over a short period of time is also unusual for the user.

Conclusion

Across Fortune 1000 environments, we see an average of over 8% of all business documents in Google for Work, Office 365, Box and Dropbox being accessible to all users within the domain, and over 1% of those documents being rated by our analytics as business critical-- making it critical that organizations take steps to protect the important content that they use in the cloud.

What did we see during the compromise?

  • A distinct shift from the user's typical access to HR Time-sheets and Vacation schedules to Data Center migration plans, system uptime schedules post account compromise was detected and alerted by our analytics
  • Across industries, we have seen attackers frequently be interested in IT and datacenter related content- likely to move from cloud knowledge repositories towards business specific applications that are being targeted