Exact Data Match
Detection of sensitive information types can be done using various methods.
- Built-in sensitive information types - Ready to use, quick to deploy, over 200+
- Custom sensitive information types - If the preconfigured sensitive information types don't meet your needs, you can create your own custom sensitive information types that you fully define using regular expressions, predefined functions, keyword dictionaries, keyword lists or you can copy one of the built-in ones and modify it.
- Trainable classifiers - Machine learning models analyse data to identify sensitive information, many pretrained classifiers that are ready to use or can be created manually as needed.
- Named entity recognition - They're complex dictionary and pattern-based classifiers that you can use to detect person names, physical addresses, and medical terms and conditions
- Document Fingerprinting - Document fingerprinting makes it easier for you to protect the information by identifying standard forms that are used throughout your organization.
- Optical character recognition (OCR) - OCR scanning enables Microsoft Purview to scan content in images for sensitive information.
- Manually applied labels - A graphical identification of items in your organization that have a sensitivity label, a retention label, or have been classified
All the above methods have some issues when it comes to False positives (FPs) and False Negatives (FNs).
FPs - is when the service matches a rule which it shouldn't
FNs - is when the service doesn't catch a rule when it should have.
FPs may have the following affect in an organization
Too many FPs can result in too many alerts for the IT teams to review, With too many matches and alerts you may miss the ones that are really important. Users may be prevented from performing legitimate actions.
Tuning the rules to make them less aggressive to take care of false positive may result in false negatives.
Remember, false positives can lead to unnecessary panic, while false negatives might leave you blind to real threats. Finding the right balance is key!
These issues are addressed by EDM.
These issues are addressed by EDM.
Exact Data Match - With EDM-based classification, you can create custom Sensitive Information Types (SITs) that specifically reference exact values within a sensitive information database. This database can be refreshed daily and accommodate up to 100 million rows of data. As employees, patients, and clients change over time, your custom SITs remain relevant and up-to-date.
We will delve into Exact Data Match today and explore its potential to help us achieve our goal of classifying and protecting sensitive information specific to an organization
What is EDM-Based Classification:
- Enables custom Sensitive Information Types (SITs) that refer to exact values in a sensitive information database.
- The database can be refreshed daily and hold up to 100 million rows of data.
- As employees, patients, and clients come and go, your custom SITs remain current and relevant.
- All EDM-based SITs are created from scratch.
- You define these SITs to detect items with exact values in the sensitive information database.
Characteristics of Exact Data Match (EDM):
- Dynamic and Easily Refreshed
- Fewer False Positives
- Structured Sensitive Data
- Enhanced Security
- Integration with Microsoft Cloud Services
EDM SITs can be used in:
Please be aware of the locations that are supported with EDM. Please refer this for detailed information.
- Microsoft Purview Data Loss Prevention
- Auto-labeling (service and client side)
- Microsoft Purview Insider Risk Management policies
- Microsoft Purview eDiscovery
- Microsoft Purview Insider Risk Management
- Microsoft Defender for Cloud Apps
EDM can't be used for general sensitive information type. You must have a source for what you want to detect.
EDM is available in the following regions.
Asia Pacific, Australia, Brazil, Canada, Europe, France, Germany, India, Japan, Korea, Norway, South Africa, Switzerland, United Arab Emirates, United Kingdom, United States, US DoD, US GCC, US GCCH
Subscription and license requirements:
Subscription and License requirement |
Permissions |
Office 365 E5 |
Global Administrator |
Microsoft 365 E5 |
Compliance Administrator |
M365 Information and Protection |
Exchange Administrator |
Office 365 Advance compliance |
|
Exact Data match Architecture
How EDM works
Microsoft classification engine has exact data match classifier extension it is used to detect the contents that needs EDM lookup and calls EDM service for the lookup.
EDM works mainly in three stages.
Detect a Candidate: Look for a candidate using the ready to use or custom sensitivity information type on which it's based.
Database lookup to match : It will then query the columns in the datastore to find the exact data match along with the matching rows.
Check proximity : Look for the text around the match for the supporting elements.
- For EDM to work, the service should be able to check the exact match against the company specific data. Does that mean that the sensitive data is in the open? well, not really!
- you encrypt the data via a hash function that includes a randomly generated or self-supplied salt value.
- Only the hashed values are uploaded to the service, so your sensitive data is never in the open.
- Hash functions are one-way mathematical functions that take an input and produce a fixed-length output, commonly known as a hash value or hash code.
- Once data is hashed, it cannot be directly reversed to obtain the original input.
- Even a small change in the input results in a completely different hash value so no other text matches those hashes.
- A hash can be "salted" by adding a fixed value to each string before hashing in order to make the transformation unique to the customer.
How hashing works with EDM
Well that all for now. In my future blogs we will deep dive into EDM components and how to configure EDM SIT, common issues, how to troubleshoot EDM, so stay tuned!
Comments
Post a Comment