Currently job searching? Give yourself an edge by developing a personal project using my free 5-page project ideation guide.
The quickest way to tick off a data engineer, after pushing an unannounced change to an API, is to mess with our personal data. Though, frankly, the carelessness with which data is mishandled at many institutions should shock and upset even those with the most basic technical knowledge. The delta between understanding what happened vs. what should happen in data breach scenarios is why stories like Sephora getting fined $1.2 million or Google's $400 million penalty barely moves the needle with the average consumer, client or patient.
Unfortunately for orgs with lax data policies, occasionally the victim of negligence is informed enough to 1) know what occurred and 2) know what to do about it.
In this case, it was the mishandling of my family member's information by a medical institution in the southeast U.S. and a boiler-plate, "we didn't mean to" response that set me off. Months later, it inspired me to pen this cautionary tale; though instead of advising "this may happen to you" I have to warn, with the continued inattention to data governance, "this will most likely happen to you."
For context, I accompanied a family member to a walk-in clinic in a hospital during the holidays for a mild, but annoying, seasonal illness. A month later, they received a call from the clinic referencing a bill in a different name intended for the family member. The bill was sent to a prior address. Additionally, details of the bill and visit were shared with an email address not associated with the primary patient.
If this happens to you, a medical institution will tell you, as they did me, that this was a "system error" or "our mistake" and attempt to downplay the incident, hoping you're uneducated about your rights as a data consumer.
You should know that 1) it's likely more complex than a "system error." And 2) it's more than a mistake. This is a data breach in violation of state data governance statutes (if applicable) and HIPPA. And, let's be clear. Just because "data" is in the phrase "data breach" a data breach doesn't always have to be the result of a technical fault. This was one hospital leader's argument when they refuted our claim of a data breach: "It can't be a data breach if it's not a system or technical error."
As I explained to this individual, a data breach can also be a failure of conduct. For instance, if an audit finds that an employee has "accidentally" stored several gigabytes of customer data locally or has sent sensitive information via an email attachment.
The point of failure, in this case, was also likely human error. When we think about malpractice and worst case scenarios in U.S. hospitals, it's often in some Black Mirror-type hacking scenario. Some black-hat takes a critical system offline or, as bad actors have done to several U.S. school districts, they'll lock staff out, only releasing data in exchange for cryptocurrency. And while coordinated hacks can do real monetary and collateral damage, focusing entirely on outside actors as the cause negates a party sitting right in front of us. Sitting and typing, to be precise.
A study in the Journal of Organizational and End User Computing found that 90% of spreadsheets with more than 150 human-entered rows contain at least one error. Add to that another party, the patient, hand recording their own data on intake forms, and you increase your chances for incorrectly ingesting information. I'm not in healthcare, but I feel better seeing more and more medical facilities requiring sign-in on tablets because that data only has to be transcribed once before it is stored.
In my case, my family member who signed in physically wrote an updated email address, phone number and physical address and other information we in data infrastructure consider to be personally identifiable information (PII) on an intake form. They then handed the form back to the receptionist who set it aside. While it's likely that the mistake happened sometime between that paper being hand-entered by a front office individual or being handed off to data entry personnel, and this saga might truly end at a typo, there's one more concerning possibility I considered.
The goal of anyone creating and maintaining data infrastructure is to ingest as much (hopefully accurate) data as possible to extract the best insights. This means that if there are gaps in information, both engineers and automated systems will do their best to infer what's missing using conditional logic, i.e. "If attribute x doesn't exist in table z, then replace with attribute y." Or, for a more concrete example, I ingest data that shows me how much my writing earns historically. If a piece of writing doesn't earn anything in a given month, my SQL-powered view defaults to last month's earnings.
In this case, to populate the "email" value, which is a necessary parameter to facilitate patient communication, when the system saw a blank or NULL in email, but saw that an older email existed, it fulfilled its duty to provide comprehensive data and simply defaulted to the older value.
The problematic word is "older." While consumers have begrudgingly accepted that bulk data collection is here to stay, one of the components of data ingestion that still makes people uncomfortable is the possibility of indefinite storage. Under statutes like the EU's General Data Protection Regulation (GDPR), companies are required to only store information for the length it's relevant to analysis. Though not yet required under U.S. federal law, to be compliant with GDPR, some U.S. companies have developed internal retention policies, usually of no more than 5 years. For context, the information in this use case was stored since this individual was a minor, over a decade ago.
Long-term storage of medical records is helpful, of course, in a diagnostic sense. I'd certainly want a provider to have a record of visits and medications prescribed, for example. But I'm not sure that it's necessary to store PII for that long. Phone numbers and emails change. And no method of payment is going to be good 10+ years later. Addresses and consent to share records also changes as a patient ages and is no longer required to provide a parent or guardian's information, for example. Either a human or a "system" inferring anything without proper context or validation can lead to unpleasant and possibly dangerous outcomes.
As much as it may seem to be the case, this isn't an "I told you so" to the particular institution that committed this breach. We resolved the matter.
It's more of a reaction to the fact that some institutions, especially those we trust with our most precious information, can still be uneducated and apathetic when it comes to storing personal data. Just as we advocate for our treatment and health in medical facilities, we, as patients, also must advocate for proper treatment of our data.
Because while you'll see a lot of commercials for medical malpractice attorneys, it'll be years before a slick lawyer interrupts your TV viewing and asks: "Hurt in a data breach?"
I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.