TL;DR: manipulations previously confined to stock markets are infecting every other corner of humanity. Politics. Insurance. Recruiting. What you read. Who you meet. Where you eat. What you eat. All of it, distilled into data feeds, at Internet scale. And we do not yet have an effective response.


While the role of data (and by proxy, math and statistics) has obviously grown in importance (machine learning, anyone?), it’s worth reminding ourselves that most of the world does not think quantitatively. (And we certainly don’t make decisions without significant contributions from intuitive, emotional, System 1 thinking). Only one-third of American adults receive college degrees at all, and fewer than 20% of those are in technical fields. We have more business majors than all of our engineers, math/statistics, and science majors combined. Less than 3% of the workforce works in computing or mathematical roles. The dominant quantitative skillsets in the modern economy relate exclusively to finance and accounting; there are more bookkeepers than software developers, more CEOs than statisticians and mathematicians. On the whole, our society doesn’t think in bias and statistical error; we live in a world of first-order effects. Direct cause and effect.

Derivatives exist, though; for that, we have calculus and options trading and machine learning. The small cohort of people who understand these systems live inside a complex bubble, and a thin veneer of humans do their best to translate findings and outcomes for the larger world. And so: where do “data ethics” belong? What is the moral compass for scientists and engineers that create, manipulate, and distribute data? How do you distinguish between right and wrong ways of translating highly technical and nuanced information for use by society? How should consumers of data, use that data?

We live in a stock market

As a familiar analogy, consider the stock market. Regulations, laws, and penalties influence how knowledge can and cannot be used in public markets. Employees and managers create the ‘data’; financial analysts and executives ’translate’ it for the public in high-stakes conference call dances; and the rest of us ‘consume’ the data. Households and holders of 401(k)s rely (for the most part) on the guidance of upstream actors. We have laws and defensive systems in place because centuries of past experience have shown a healthy amount of economic devastation once or twice per century.

But even so, because so much money is on the table, these markets face constant assaults and manipulation (per NASDAQ: “the best way to think about manipulation is to accept it as part of the market structure”). Wealth and power can corrupt the urges of the noblest among us, and so in an effort to protect society from charlatans we built guardrails: whistleblowing, insider trading regulations, anti-bribery laws, certification boards that serve as gatekeepers to the flow of information, enforcement agencies like the SEC. But that is not enough, and so we also create collective insurance like the FDIC and SIPC that provide some marginal coverage against the thieves who find holes in the framework.

Still had Enron, though. And Bernie Maddoff. And the S&L banking crisis of the 80’s. And the banking crisis of ‘07-08. And stock price manipulations. And countless undetected profiteering efforts achieved through advanced knowledge of significant data. Which leads to several points:

  1. All illegal (immoral? that’s an absolutism/relativism question) market actions leverage information. aka: data and its manipulation or suppression or privileged distribution.
  2. Not that long ago, the only significantly profitable data was essentially financial data. The universe of data is much broader now, which means the range of opportunities for “market manipulation via data asymmetry” (i.e. what you can achieve by possessing knowledge unavailable to others) is also much broader.
  3. The price of that data is ridiculously inexpensive, compared to a generation ago. Which means (1) more “bad actors” can participate, and (2) the “big, bad actors” have upgraded the scale of their ambitions.
  4. Everyone is now both an Internet-scale data producer and data consumer. We make decisions all day long based on the information flowing through our data ecosystems. This has perhaps always been true, but advanced computing now allows unknown actors to map between what you see and what you do.
  5. This dynamic of data-driven manipulation already infests essentially all industries, to different degrees. The American medical insurance pyramid thrives entirely because of the obtuse nature of data about costs and benefits. Online advertising (“adtech”) suffers billion-dollar-sized manipulation every year. Music streaming revenues constantly combat “bot streaming.” Sellers on Amazon live or die based on their 5-Star review scores, which motivates extreme and creative review score manipulations. We sacrifice 6% of every residential real estate transaction because the National Association of Realtors, for over 100 years now, has monopolized and defended access to pricing and inventory data.

This all gets a bit heady pretty easily, so a simplification: the actors that traded on financial data to consolidate wealth, can now trade on societal data to consolidate power. We are each an individual stock, being bought and sold on a financial exchange.

Why bother with a Code of Ethics?

Judging by the efficacy of humanity’s oldest moral guidebooks, it seems obvious that having a Ten Commandments of Data won’t discourage people who crave power from seeking it. Similarly, I’d argue that a Code isn’t about the individual, it’s about warning the collective: that bad actors exist, and (assuming you’re not one of them) there are things we should all commit to if we want to defend ourselves against those who don’t have our best interests at heart.

The nature of data has changed. Everyone is painfully aware of the role a credit score plays in their financial lives; people need to understand that scores are now being developed for every possible facet of life. Unlike credit scores, however, we don’t currently have any explicit or protected rights for our day-to-day data-lives, and that’s extremely disempowering. At a minimum, everyone should be intensely concerned about three things:

  1. Who owns your data? More often than not, it’s not you…the data is being “licensed” to you, but is owned by somebody else.
  2. Who controls access to your data? Since you’re not the data owner, do you have any rights related to who can see your data and when? (Probably not.) If somebody “scores” you, do you have a right to know about that? To see your score? To correct inaccuracies?
  3. Who is using your data? How frequently are others using data that describes you? What are they using it for?

Core principles of any Code of Ethics (including data)

The Association for Computing Machinery has a pretty well-thought-out Code of Ethics for its members. It includes affirmations of:

  • Privacy
  • Do No Harm
  • Honesty
  • Confidentiality
  • Quality of Work
  • Transparency
  • Security

Many other industries (both technical and non-technical) have implemented domain-specific codes of ethics (e.g., AMA, ASCE, SPJ, NAHQ, NAR). They share these same themes.

Data scientists and other technical practitioners involved in creating data about people have a similar duty. We must choose whether to behave like the SEC or the NSA: is our role to advocate for others, or to extract maximum intelligence? More often than not, business leaders lean into strategies of “get all the data, all the time, with maximum permissions, for maximum financial benefit.” That’s the data equivalent of a full-body x-ray, a live-cam in your Realtor listing, a camcorder in your attorney’s office. I used to find fault in their philosophy, but have come to acknoweldge that the competitive pressures of the market can leave a company vulnerable when competitors have access to information you do not.

What happens next?

Hell, I don’t know. I know what I hope happens next: that the pendulum swings back toward the citizen and the individual, that privacy is acknowledged as a critical expression of liberty, that enforcement mechanisms and penalties emerge that can push against the market forces driving businesses to spy on their customers, that we all become more literate about how the world trades on our behavioral data.

The current trajectory looks nothing like that, whatsoever. For any of that to happen requires:

  • A change in the data we collect; or,
  • A change in the data we consume.

If you’re a data “practitioner” involved in this pipeline, and you believe that we need to change the direction we’re headed in, you can have an influence on this right now.