AWS embraces the cybersecurity data lake trend, first highlighted by Snowflake. For its part, the cloud giant intends to industrialize the concept. At its annual conference, AWS announced the preview of Amazon Security Lake, a solution designed to centralize all security data in a petabyte-scale data lake and to be able to analyze it from different tools used by security teams and developers.
This solution emerges from several observations and trends in the cybersecurity ecosystem. Security logs and events come from multiple systems and applications. Different monitoring tools, detection of vulnerabilities, threats, protection, etc., must correlate this data. And companies only deploy a single type of tool, but several, depending on the needs of the teams.
Software packages and data are dispersed between different systems that communicate poorly with each other. Worse, a single publisher can offer many products that do not rely on the same data wells. This work of unifying supervision and security data in a lake was necessary for players such as Splunk, Datadog, Elastic or Micro Focus.
The advantage for a publisher would be to become the single point of contact for the client, who would choose solutions from his catalog capable of querying data brought together in a single data lake. If gateways or connectors exist, the data remains – most of the time – in silos.
However, this calls for some form of consolidation. Some companies don’t want to get rid of their existing ones. One vendor’s solution may be good in one area, but the competitors may be better in another. And each of these providers has set up its data system.
In The Footsteps Of Snowflake
Some specialists, including Omer Singer, head of the cybersecurity strategy at Snowflake, have argued for developing a data lake dedicated to cybersecurity agnostic market solutions. from 2019. Snowflake is, therefore, one of the first publishers of a data warehouse/data lake cloud to position itself last June as the supplier of a “data cloud” dedicated to cybersecurity.
The solution is offered in partnership with Securonix, Tenable, Orca, Panther, and Lacework, which already use Snowflake to store their data. Snowflake Ventures has invested in several cybersecurity vendors, including Securonix, Lacework, and Panther. Two reasons for this. First, Frank Slootman, the company’s CEO and chairman, is also an investor.
Second, it particularly resonates with a broader trend in IT: the verticalization of solutions by industry or domain. Finally, supervision, detection, and protection systems are based on databases, warehouses, data lakes, columns, rows, key-value pairs, documents, search tools, analysis, or more machine learning algorithms. This is the conviction of Elastic, which started from enterprise search and also became the publisher of a SIEM.
“Security is a data issue,” said Mandy Andress, CISO of Elastic at MagIT at ElasticOn 2022 in Amsterdam. And, fundamentally, a data problem is an infrastructure problem: storage, computing, and network. These issues interest AWS; they are at the center of its activity.
Since security solution publishers and Snowflake rely primarily on its IaaS and PaaS services and its data stores to run their products, this gives the cloud giant a firepower that only its most direct competitors can claim. At its re: Invent 2022 event, spokespersons for Amazon’s cloud subsidiary repeated ad nauseam that “security is at the core” of everything it does and that “90% of the functionality” it ad “comes from customer feedback.
” However, Security Lake seems strongly inspired by Snowflake’s offer. Publicly, Omer Singer does not complain about it. “Announcements from re: Invent provide superb validation of the security data lake,” Omer Singer said on LinkedIn. “This model plays a big role in increasing security programs.”
If AWS takes the wave a little later than its competitor, Rod Wallace evades this question of the kinship of such an approach. “We talk to our partners and our customers. All of them told us they would like to use a data lake for security data,” he says.
Normalize Security Data “Automatically.”
AWS is converging several services, editors, and frameworks to implement this Security Lake. Security Lake is based on an Amazon S3 object storage pool in a customer’s AWS account(s) behind their VPC (Security Lake can also be used in multi-region deployments). The supplier wishes to allow the use of the storage tier modes (hot, warm, cold). It is possible to organize your archiving policies or to trust the intelligent tiering function of S3.
“Clients also want to get rid of the retention policies of publishers, who sometimes only store their logs for 30 days”, justifies Rod Wallace. In addition to AWS services, Security Lake can already ingest data from fifteen tools, AWS services (Route 53, CloudTrail, VPC, S3, etc.), and third parties (Crowdstrike, Cisco Security, CyberArk, Falco, etc.). Security logs and events can be ingested in various ways depending on where they come from using Lambda functions and the data catalog Glue.
If the ingestion is not accessible, Rod Wallace ensures that AWS does not apply ingress costs on data extracted from third-party on-premise solutions or applications. “The management of data movement costs is transparent: they manage their costs at their own pace.” There needs to be more than just placing the data in the same table format and storing it in object stores. How they are represented should be unified.
The cloud giant had already prepared the ground. In August 2022, he launched, in collaboration with Splunk, the open-source Open Cybersecurity Schema Framework (OCSF) initiative. Around the table are around twenty publishers, including IBM, Okta, Zscaler, JupiterOne, Crowdstrike, Palo Alto, Trend Micro, IronNet, Splunk, Tanium, Sumo Logic, Securonix, and Salesforce.
As a reminder, OCFS is a variant of the ICD data schema originally imagined by Symantec. Once embedded in the connectors of the security tools, it is possible to ingest standardized data meeting the same standard. “OCFS is trying to solve a problem that many cybersecurity vendors face when talking to their customers,” “Each software has its data model, and it is tough for security teams to decode them. Security Lake is the first data lake to use OSCF,” he announces.
With Security Lake, this involves a phase of normalizing logs to the OCFS schema as part of data ingestion. “The cost of this operation will be minimal.” This normalized information is stored in Parquet format in S3 buckets. It can be queried through Athena, OpenSearch, or AWS partner tools, including IBM Security QRadar XDR or Splunk and Datadog SIEMs.
In this preview, solutions from 37 vendors and partners can already write or read data, serve as a data source, and a way to process it, or both. Quicksight, the BI platform from AWS, can be used to create dashboards and reports for the various teams involved. Later, customers could develop machine-learning models and train them on their logs.
Finally, Lake Formation, the AWS governance brick presented last year, is used to manage retention policies, roles, and access to data by the various people authorized to do so. The cloud giant also wants to offer features, such as hiding personal or sensitive data in the logs.
Centralize First, Analyze Later
Snowflake quickly put into perspective what it would be possible to do with the features of its platform. For example, the publisher highlighted its isolated data-sharing environments and the possibility of creating applications running as close to its engine as possible.
AWS does have integrations with advanced analytics tools planned, but the preview will first need to prove that security data ingestion, normalization, and management work well in Security Lake.
Read Also: How To Compress Files And Create Encrypted Archives