Data classification is a critical part of any information security and compliance program. It involves identifying the types of data that an organization stores and processes, and the sensitivity of that data, based on sets of rules. For example, data classification is often used to identify data regulated by compliance standards like HIPAA or GDPR.
Data classification offers multiple benefits. It is invaluable for effectively prioritizing your security controls and ensuring proper protection of your most critical assets — for example, you might encrypt all documents that are classified as “restricted.” It facilitates risk management by helping organizations assess the value of their data and the impact that would be caused if certain types of data were lost, misused or compromised. Data classification also streamlines legal discovery and drives user productivity by making data easier to find.
Finally, it essential to ensuring compliance with regulations and passing audits, in both the public and private sectors, by helping organizations protect the privacy of regulated data, such as cardholder data (PCI DSS), health records (HIPAA) or EU residents’ personal data (GDPR). Unfortunately, according to the Netwrix 2020 Data Risk and Security Report, 66% of CISOs and compliance officers are not sure if they store regulated data only in secure locations — even though most of them work in organizations subject to PCI DSS (51%) and GDPR (45%).
“66% of CISOs and compliance officers are not sure if they store regulated data only in secure locations — even though most of them work in organizations subject to PCI DSS (51%) and GDPR (45%)”
Source: 2020 Data Risk & Security Report
Organizations usually design their own data classification models and categories. For instance, U.S. government agencies often define three data types, Public, Secret and Top Secret, while organizations in the private sector usually start by classifying data as restricted, private or public. The best practice is to define an initial data classification model, and later add more granular levels based on your specific data, compliance requirements and other business needs.
Handpicked related content:
In this article, we will review how to approach data classification based on which regulations and standards your organization is subject to.
Data Classification for Regulations that Protect Personally Identifiable Information (PII)
Personally, identifiable information (PII) is data that could be used to identify, contact or locate an specific individual or distinguish one person from another. PII is often defined as a person’s first name or first initial and last name in combination with one or more of the following data elements:
Federal statutes protecting PII include:
To classify PII, it is necessary to determine the following:
The United States General Accounting Office estimates that the identity of 87% of the Americans can be determined using a combination of the person’s gender, date of birth and ZIP code. When taken separately, these details might not seem terribly sensitive. However, if a breach of those three elements would likely also compromise the individual’s name, home address, SSN or other personal data, those elements should be considered sensitive.
To satisfy the information security requirements of the Federal Information Security Management Act (FISMA) law, the Computer Security Division of National Institute of Standards and Technology developed Special Publication 800-53, Security and Privacy Controls for Information Systems and Organizations (NIST 800-53). NIST 800-53 details security and privacy controls for federal information systems and organizations, including how agencies should maintain their systems, applications and integrations in order to ensure confidentiality, integrity and availability. NIST 800-53 is mandatory for all federal agencies. It’s also useful for organizations in the private sector and those seeking to become contractors for any federal agency.
To pass a NIST compliance audit, organizations must categorize their information and information systems by security category with the purpose of applying necessary cybersecurity resources. NIST recommends using three categories — low impact, moderate impact and high impact— which indicate the potential adverse impact of unauthorized disclosure of the data by a malicious internal or external actor concerning agency operations, agency assets or individuals.
The categorization starts with identification of the information types. Each information type gets the provisional impact value (low, moderate or high) for each security objective (confidentiality, integrity and availability). After the value is adjusted to all information types, each information system is assigned with the final security impact level. NIST employs the concept of a “high watermark” when categorizing a system, which means that the overall system is categorized at the highest level across confidentiality, integrity and availability requirements. Thus, if at least one information type is categorized as high, the information system gets the highest impact level.
NIST 800-53 applies to data in systems used to provide services for citizens or administrative and business services. NIST doesn’t give an exact list of information types; rather, it offers recommendations for reviewing information types of interest and considering their classification. Thus, each agency selects their own combination of elements belonging to information types. For example, NIST suggests that the “Planning and Budgeting” information type may include elements like budget formulation, capital planning, tax and fiscal policy, which in general may have a low-impact level on confidentiality, integrity and availability. However, each agency is encouraged to review special factors that might affect impact levels, such as premature public release of a draft budget.
ISO/IEC 27001 is an international standard for the establishment, implementation, maintenance and continuous improvement of an information security management system (ISMS). This voluntary standard is useful for organizations across all industries. During an ISO 27001 audit, organizations need to show that they have a good understanding of what their assets are, the value of each, data ownership, and scenarios of internal use of data.
ISO/IEC 27001 doesn’t specify an exact list of regulated information; it leaves that to each organization. The first step is to determine the scope of the data environment and perform a review all in-scope data. The scope must consider the internal and external threats, interested parties’ requirements, and dependencies between the organization’s activities.
Information classification is critical to ISO 27001 compliance, since the objective is to ensure that information receives an appropriate level of protection. The ISO standard requires companies need to perform information asset inventory and classification, assign information owners, and define procedures for acceptable data use.
The framework doesn’t define a data classification policy and which security controls should applied to the classified data. Rather, section A.8.2 gives the following three-step instructions:
The text of the EU’s General Data Protection Regulation (GDPR) does not use the terms “data inventory” or “mapping,” but these processes are essential to protect personal data and manage a data security program that complies with the data privacy law. For example, data inventory is the first step in complying with the requirement to manage records of processing activities, including establishing the categories of data, the purpose of processing, and a general description of the relevant technical solutions and organizational security measures.
Handpicked related content:
Companies need to review all data assets and understand which of them contain an individual’s personal data. Specifically, the Data Protection Impact Assessment (DPIA) requirement mandates an inventory of all processes that involve the collection, storage, use or deletion of personal data, as well as an assessment of the value or confidentiality of the information and the potential violation of privacy rights or distress individuals might suffer in the event of a security breach.
The GDPR defines personal data as any information that can identify a natural person, directly or indirectly, such as:
To comply with the GDPR, originations need to incorporate controls like data discovery, data profiling, taxonomies for data sensitivity, and data asset cataloging. To classify data, companies may need to consider the following:
The record-keeping requirements for GDPR compliance are very similar to those described above for ISO 27001 compliance, so following the approach of the ISO 27001 helps companies meet GDPR requirements as well.
The Payment Card Industry Data Security Standard (PCI DSS) certification was developed to encourage securing of cardholder data. It facilitates the broad adoption of consistent data security measures globally through a set of requirements administered by the PCI SSC. PCI DSS compliance requirements include technical and operational measures designed to alleviate vulnerabilities and secure personal consumer financial information like credit and debit card data used in payment card transactions.
Payment card information is defined as a credit card number (also referred to as a primary account number or PAN) in combination with one or more of the following data elements:
Data classification is requested in terms of regular risk assessment and security categorization process. Cardholder data elements should be classified according to their type, storage permission and required level of protection in order to ensure that security controls apply to all sensitive data as well as confirm that all instances of cardholder data in the environment are documented and that no cardholder data exists outside of the defined card holder environment.
According to the Netwrix 2020 Data Risk and Security Report, 75% of financial organizations that classify data can detect data misuse in minutes, while those who don’t mostly need days (43%) or months (29%). This highlights the importance of data classification for PCI DSS compliance purposes.
“75% of financial organizations that classify data can detect data misuse in minutes, while those who don’t mostly need days (43%) or months (29%)”
Source: 2020 Data Risk & Security Report
The HIPAA Security Rule establishes baseline administrative, physical and technical safeguards for ensuring the confidentiality, integrity and availability of electronic protected health information (PHI and ePHI). PHI is similar to personally identifiable information, as discussed above. PHI is considered as any individually identifiable health information, including:
ePHI is defined as any protected health information that is stored in or transmitted by electronic media. Electronic storage media include computer hard drives, as well as removable or transportable digital memory media like optical disks and digital memory cards. Transmission media include the internet or private networks. Common examples of ePHI include:
The HIPAA Privacy Rule requires organizations to ensure the integrity of ePHI and protecting it from being altered or destroyed in an unauthorized manner. Therefore, each covered entity or business associate should inventory their ePHI and identify the risks to its confidentiality, availability and integrity. The organization must identify where the ePHI is stored, received, maintained or transmitted. Organization can gather this data by reviewing past projects, performing interviews, and reviewing documentation.
HIPAA classification guidelines require grouping data according to its level of sensitivity. Classification of data will aid in determining baseline security controls for the protection of data. Organizations can start with a simple three-level data classification:
The major compliance regulations have a lot in common when it comes to data classification. In general, organizations should follow this process: