Are We Disregarding Privacy Rules Because They Are Hard? Part 3 of 3

Are We Disregarding Privacy Rules Because They Are Hard? Part 3 of 3

Shouldn’t This Be Easier By Now?

hcfa1500 redactEventually, someone in Information Technology or Database Administration gets asked to extract data from a PHI rich line of business system or data warehouse but deliver it as de-identified data.  Almost any data extraction approach allows for data to be masked, redacted, suppressed or even randomized in some way.  This type of functionality can give us de-identified but often useless data for testing, analytics or development.

Since my company, The EDI Project™ was founded in 2001, we have been asked to de-identify or anonymize data for testing and development work many times.  Each time we have written custom code to do so for each project.  This code is never transferable to another customer environment and must be re-done for every scenario.  If we were doing this every time, we thought there has to be other companies who are having the same problem.

It turns out, there are tools on the market to address extracting data from a line of business system or data warehouse and anonymize the data so it is useful and not just de-identified into useless “John Doe” records.

For example, one of the largest integration engines on the market offers this functionality as a $250,000 add on to their existing, very expensive suite of products.  It is complicated to learn and use and must have custom code added if multiple systems are required to be anonymized the same way (e.g. enrollment, eligibility and claims data have to have matching but anonymized names and dates of birth).

There are other tools in this space that sniff out vast data stores for PHI and attempt to automagically de-identify the data.  Usually this is a masking or data redaction type approach, but even when it is not, many fields are marked as “suspect PHI” and left for human review.  I can’t blame them either.  While Patient Name fields or Date of Birth are easy enough to identify, free form fields can be a nightmare.  Either way, these tools are usually very expensive and often leave the job half done.

There are a lot of cases where a certain files like EDI 837 Claims or maybe an enrollment database has to be de-identified for a test system.  Perhaps it is an ongoing extract of data from a data warehouse for an analytics study.  This is where most of the time, the work is either not done (exemption granted), or custom code is deployed (expensive / time consuming).  But technology is supposed to be faster, better and cheaper isn’t it?

Since we are the guys who are often asked to do the work looked at our experience in extraction of health care data to design a tool we would want to use.  No compromises.  We wanted easy to learn and use, powerful to handle big data environments without being a bottleneck to any extraction work.  Finally, it would be able to anonymize data across multiple sources so that the matching but de-identified data maintained record integrity (i.e. all the records for one patient in the PHI data sources had corresponding records in the de-identified data sources).  Oh yeah – and since the main project being done is already expensive enough, the tool should be inexpensive.

People have been using ETL (Extract, Transform, Load) tools for decades and are familiar with how they work.  Thinking about the “T” in “Transform”, a common thing to do would be to change a date from MMDDYYYY format to DDMMYYYY format.  This type of common transformation logic doesn’t have to be rewritten every time you extract from a new source.  The integrator just picks it from a list when doing mapping work.  Anonymizing PHI should be that simple as well.

Functions and drop downs need to be available to anonymize every kind of PHI and handle it according to the special properties for that type of data.  Names are anonymized differently than zip codes.  More specifically, the anonymization routine for a Date of Birth (DOB) is handled differently than a Date of Service (DOS).  The software should know that already and not need to be defined by the integration team or subject matter expert.

As a result, we developed and launched our own Anonymization Engine called “Don’t Redact!™”.  We’re integrators and so we built the tool an integrator would want to get this done quickly and easily.  It can be learned by someone who has experience with integration tools in an afternoon and your first sizeable anonymization effort can be deployed in a day or so after learning the ropes.

Under the spirit of no compromises and disruptive technology, the Don’t Redact!™ Anonymization Engine is $25,000.

While The EDI Project™ is a professional services organization and we would be happy to deploy the software for you or set up your first live anonymized environment, the tool is well thought out and easy enough you won’t need any services at all.

Want to find out more?

Part 1: Minimum Necessary or Optional   

Part 2: A False Choice. . . 


Are We Disregarding Privacy Rules Because They Are Hard? Part 2 of 3

Are We Disregarding Privacy Rules Because They Are Hard?   Part 2 of 3

A False Choice

heavy_factory_workerImagine you work at a health insurance company.  Your title is “Claims Examiner” and you spend each day deciding if bills sent from doctors for the insurance company’s members should be paid.  You must be sure the treatments match the diagnosis, the member is eligible for the payment and the amount being asked for is correct. This work is performed in a “Claims System”.  Claims Systems are one of the first widespread uses of computers in business and have been around for 40 years.  This is the lifeblood of a health insurance company and seemingly all their other systems are related to it.  The data the Examiner uses to pay or adjust the bills doesn’t need to be obscured in any way because it is part of TPO (treatment, payment or health care operations).

A covered entity may disclose PHI (Protected Health Information) to facilitate treatment, payment, or health care operations (TPO) without a patient’s express written authorization. Any other disclosures of PHI (Protected Health Information) require the covered entity to obtain written authorization from the individual for the disclosure. However, when a covered entity discloses any PHI, it must make a reasonable effort to disclose only the minimum necessary information required to achieve its purpose.

When we talk about Privacy and Security of data, even though Claims Systems have the most information about a patient / member, they are rarely if ever the place where a breach of PHI (Protected Health Information) takes place.  Instead, breaches happen at the edges.  New systems being stood up, test / development systems, ancillary data stores for things like analytics or other systems, seem to be the place where PHI breaches tend to happen.  In most cases however, these systems really should not have had PHI at all.

So why did these systems have PHI to begin with?  Usually it is because an exemption was created.

This isn’t a story of malice, indifference or even incompetence.  It is a story of real life choices that are all very reasonable.

Imagine a new system being brought on line for claims or another vital function.  There are outside vendors and subject matter experts helping employees to ensure the environment will be capable and reliable when it replaces the existing system.  But if all the data being used to test is simple and looks like this:

 “John Doe, DOB 1/1/1950, DOS 1/1/2018, 15 Minute Office Visit, Common Cold”

the team will never uncover all the potential problems that come with complicated, real world scenarios.

While the organization knows where the PHI is in the data, sometimes just de-identifying the real data in such a way can be a six-month project on its own.  How would one test if the system would be able to find duplicates if names are randomly replaced in the test data?  How can a test Examiner check eligibility if the names in the eligibility file are randomly replaced in a different way than in the test claims data?  If dates are randomized, how would claims be paid for Dates of Service (DOS) that occur before Date of Birth (DOB)?

Usually an exemption is granted for the testing of the new system that allows previously run, real world PHI data to be used.  This is very reasonable of course and the systems and environments are all secured as they should be.  Either way, this is the type of place a breach happens.   A port is left open, test data is left on a remote machine, or any number of other ways things can happen to even careful, conscientious people.

Whether for test or development systems or for an analytics project that is delayed or never happens while the PHI is scrubbed, this represents a false choice.  We have been dealing with this problem formally for 20 years and realistically even before people started mis-spelling the HIPAA acronym.  Technology is getting faster, better and cheaper all the time.

So why is this so hard? 

FULL DISCLOSURE: My company, The EDI Project™ has developed a tool to address this problem and I’m not a disinterested party in my recommendation.

Link to Part 1: Minimum Necessary or Optional? 

Link to Part 3: Shouldn’t This Be Easier By Now? 

Are We Disregarding Privacy Rules Because They Are Hard? Part 1 of 3

Are We Disregarding Privacy Rules Because They Are Hard? Part 1 of 3

Minimum Necessary or Optional?

One of the things that continues to excite me about the world of healthcare informatics is the opportunity to reduce the cost of care while providing better care and overall better outcomes.  Often people think in terms of zero sum game where reducing the cost of care always reduces care and outcomes.  But the promise of technology is that it can make us more efficient; a man can dig a hole faster with a shovel with more precise dimensions than with his bare hands.


Having the right tool for the right job is important. . . 


Much attention has been paid of late to re-admission rates for hospitals.  Hospitals stays are expensive and if a patient is sufficiently recovered from whatever put them there to begin with, they are usually eager to get home to continue to recover in a more familiar environment.  Both parties – the hospital and the patient – often want the stay to end as soon as possible.

But if the patient is released too early, it is always bad news.  At best, they must be re-admitted – often through the emergency room process.  Worse, they could relapse and not make it back to the hospital at all.  Outcomes for patients who are released too early are both worse and more expensive than if they had stayed in the hospital instead of being released.

Certainly, trusting our doctors is a first step, but they are often very busy and under the same pressures to release a patient discussed above.  There are simply too many variables to be perfect at this when practicing medicine.  While experience gives the doctor his most potent weapon she can only draw from the experience available to them.  Patterns do exist, however, that are indicators of good situations to use additional caution when deciding to release.  No one doctor could ever amass enough experience to recognize them all though.

Today, there are powerful analytic tools available that can take massive amounts of data and sift through looking for patterns that simply would not or could not be seen otherwise.  Rather than take a sample scenario and examine the data to see if that scenario is more likely to result in a readmission, these tools are capable of comparing millions or billions of situations to each other at the same time.  The result is finding co-morbidities or patterns of care that no one could have ever thought to test out on their own.

These types of comparisons were computational fairy tales just a few years ago but can be done today because of advancements in parallel processing.  The bad news is no matter how good the tools are, they are only as good as the data they have to examine in the first place. . . What if no one can get the data?

Minimum Necessary is the process that is defined in the HIPAA regulations:  When using or disclosing protected health information or when requesting protected health information from another covered entity, a covered entity must make reasonable efforts to limit protected health information to the minimum necessary to accomplish the intended purpose of the use, disclosure or request. 


Next: Part 2A False Choice. . .  

Part 3: Shouldn’t This Be Easier By Now?