Another Challenge of Big Data Analytics: Data Leak and Spill (Part 1)

In their recent article on Big Data Management and Trends, Gartner identifies Enterprise Data as one of the key challenges facing organizations. The challenge is consolidating data from disparate sources across the extended enterprise and transforming it into critical business intelligence.

“You have many data disparate sources – from your enterprise’s ‘dark data’ and partner, employee, customer and supplier data to public, commercial and social media data – that you need to link and exploit to its fullest value.”

The extended enterprise is comprised of disparate data sets, across heterogeneous applications and devices. How can organizations harness all these data sources in one centralized analytics engine? COTS applications are claiming to do precisely this—one example being SAP Business Warehouse, which aggregates data from SAP and other applications and allows users to run comprehensive reports.

However, a different challenge looms on the horizon as organizations rush toward big data analytics—one that is far less talked about: How do how organizations control access to reporting data once it is mined from disparate applications and devices? After all, the same compliance regulations and corporate governance policies should apply to data, no matter where it is consumed: in enterprise applications, on partner networks, or after use when it is consolidated in analytics tools and displayed reporting interfaces.

The technical challenge is harder than it seems. The rules that govern how data should be access, shared, and used is always embedded in the business context of applications where that data originates. When data is mined and aggregated, this critical business context is left behind. How do organizations know how data should be controlled, especially when data is mined from across the extended enterprise?

For instance, assume that a business object is classified as restricted (due to an export compliance or other regulation) in the application where it was created and stored (say, SAP). Controls can be instrumented in that application to ensure proper access and usage. However, when that data is aggregated into reporting and analytics tools, how do you identify restricted data? Do data-level classifications persist from the originating application? Or is data stripped of crucial business context when it is mined and aggregated?


To make the problem worse, most analytics tools allow users to export adhoc reports into PDF or Excel format—so sensitive and restricted data can be widely distributed. The problem goes beyond the set of users who have access to your reporting and analytics applications, in other words. Sensitive data can go into to these analytics tools undetected, then be exported out for broad distribution.

While many analytics tools have basic access controls, it is unclear whether they are robust enough to address this challenge. An effective solution would need to be able to:

  • Retain original data-level classification information when data is mined and aggregated from disparate applications and sources
  • Block access, filter report views, or block export of report information, based on data-level classifications.
  • Apply rights protection to exported report files based on the data they contain, so files will be distributed and accessed in accordance with rules and regulations.

In part 2 of this series, we take a closer look at each of these requirements.

by Rajesh Rengarethinam Senior Software Engineer Manager at NextLabs

Leave a Reply

Your email address will not be published. Required fields are marked *