A recent dive into challenges faced from privacy compliance requirements unearthed an interesting patent. The unearthing of this new patent on the block came from the need of anonymizing data for several reasons including compliance (PCI DSS, German Data Privacy Law [BDSG], UK Data Privacy Act).
With inputs from experts in the domain here is an overview of this unique encryption technology.
Simply randomizing the data won’t help if the data drives a process like indexing in a database – the introduction of database indexing problems, or destruction of the integrity of the data in the de-identification process may render it useless as far as a business application or statistical analysis is concerned even if the process should not access specific identifiable fields. So, if you need to de-identify or depersonalize yet still use the data, there needs to be a more thought put into it to ensure the process cannot be reversed and to ensure that the use of the de-identified data can be preserved.
For example, if we have two databases, both indexed on a national ID number containing e.g. health records but we need to de-identify the personal data, yet retain the referential integrity across internal database tables in addition to preserving record relationships across two independent databases, and replace the personal data with data that is still meaningful for analysis but truly de-identified, then we need to preserve referential integrity across all the de-identified databases.
What if we need this process to be selectively reversible for e.g. a e-discovery request or specific analysis where real data is needed? Traditionally, data owners would have to keep some kind of mapping table – an ugly solution as it simple moves the problem of de-identification to another place.
There are some new techniques that permit de-identification of data – in a secure fashion. Specifically Format Preserving Encryption (FFX/FFSEM mode AES) which can:
* Encrypt/De-identify data without changing length, structure, type, format, referential integrity of data (again optional if this is essential or if de-association of record relationships is actually required) on the fly
* Provide the dual purpose – de-identification or data protection in place (no database schema changes) or protection of data in live systems with a single technique
* reversible or non reversible by policy
* Supports native mainframe, open systems and legacy out of production systems.
In this paper on Fiestal Finite Set Encryption (FFSEM), the author elaborates on the Pseudo-Random Permutation used in this mode and the security of the construction. The author is quoted, “In many applications, such as encryption of credit card numbers, it is desirable to encrypt items from an arbitrarily sized set onto that same set. Unfortunately, conventional cipher modes such as ECB, CBC, or CTR are unsuitable for this purpose. Feistel Finite Set Encryption Mode (FFSEM) allows encryption of a value ranging from 0..n with resultant ciphertext in that same range. This mode can be used to encrypt fields where the expansion associated with a block cipher is undesirable or the format of the data must be preserved.”
An academic paper on the subject summarizes – “Format-preserving encryption (FPE) encrypts a plaintext of some specified format into a ciphertext of identical format—for example, encrypting a valid credit-card number into a valid creditcard number. The problem has been known for some time, but it has lacked a fully general and rigorous treatment.We provide one, starting off by formally defining FPE and security goals for it.We investigate the natural approach for achieving FPE on complex domains, the “rank-then-encipher” approach, and explore what it can and cannot do. We describe two flavors of unbalanced Feistel networks that can be used for achieving FPE, and we prove new security results for each. We revisit the cycle-walking approach for enciphering on a non-sparse subset of an encipherable domain, showing that the timing information that may be divulged by cycle walking is not a damaging thing to leak.”
Now the interesting bit….is it implemented and is it commercial?
Full patent description is available here
And of course, at the first look, Wikipedia also did not help understanding it much unless I read the background and the two papers