2d 3d 3d-scanning 3d-sensor 420 420-6gw-hy 420-cae-hy 420-g 420-gef-hy 420-gel-hy 420-gen-hy 420-gep-hy 420-gep-hy-obligatoire 420-ges-hy 5rj 5rj-android 5rj-javase acceptance-testing aecgis agile ai airplay ajax alpine-linux analytics andengine android android-5-0 android-studio android-update-architecture angular angularjs api app-v ar arcade architecture arduino asp-net asynchronous-programming audio audio-analysis augmented-reality baas backbone-js banana-pi banana-pro banq bash battery bayes bb-8 bcjr bdd beast best-practices bi big-data bintray bluetooth boost-asio build-tool bytebuffer c camera cegep cg2 chess circuit citrix clojure clojurescript cloud cms cntk code-review code-structure collision-detection command-query-separation common-lisp completablefuture completionstage continuous-integration convolutional-coding couchbase cqrs cqs css css3 custom-language data-analysis data-center data-reporting data-storage data-story data-visualization database date-time dbms ddos deep-learning deep-search dependency-injection design-patterns devops dimensionality-reduction django docker dom drivers drone drum dsl e-commerce e-learning ebook ecmascript ecmascript-6 edgehtml efficacite-organisationnelle elasticsearch elearning elixir elk-stack embeded-systems encryption enterprise-search entity-framework erlang es2015 escher esp8266 event-driven examen excel exercices-java exfat express f facebook fat32 filechannel flask fonts for-dummies fpga functionnal-programming game-dev garbage-collector genetic-algorithm geospatial-analysis gimp git github go gof google google-analytics google-apps google-cloud googlecalendar gpio gps gpu gradle graph-database gui gvoice-texts hadoop haskell hci heroku hibernate high-availability hotspot-vm html html5 http-2 ide ifttt immutable-os intellij-idea internet-security ionic ios itil java java-9 java-ide java-module javascript javase jaxb jdbc jdk8 jeu jinja jit jmeter jms jpa jquery jsf json jta junit jvm kali-linux kibana kids kinect kotlin kubernetes laptop latex law-of-demeter ldpc learning legal libgdx linq linux load-balancing load-testing logic-programming logstash machine-learning magento mahout mathematiques maven mean meteo meteor-framework micro-framework microservices microsoft-azure midi mit-scratch mobile-app mongodb monitoring moodle ms-access ms-excel multithread music-instrument music-production musique mvc mvvm mxnet mysql neo4j netty network-as-a-service network-routing neural-networks neuro newsql nfc nixie-tube node-js nosql ntfs oauth open-source opencv opengl opengl-es openstack optimisation ospf otka outdoor-robot ov2460 pares-com pattern-matching pcie pdf pedagogie pentaho performance persistance php physique physique-quantique picat polarized-lenses powershell predictive-analytics privacy prolog pupillometry puppet puredata python qa quantum-computing quantum-gravity quantum-time r-language rails raspberry-pi react reactive-programming real-time refactoring regression-tree repl rest robot ros rpg rsa ruby rust salesforce san scala science scratch-jr scribus scrum search-engine security selenium selenuim-testing-tool semanticweb sensor seo serial-port serrurier serverless service service-manual servlet sitecore soap solar-system-simulator solaris solid solr solus spa spark spark-ml spdy specification sphero splunk spring spring-boot sql sql-server sqlite sre srp statistics statistiques stephanedenis-s-blurblog storm swift-2 tableau-publiic tdd telephonyapi tensorflow test test-driven-development thread threat-analysis time-banking travis-ci typography ubuntu uml unit-tests unity-3d unreal-game-engine usb user-story uwp virtualization-platform visual-studio visualstudio viterbi vmware vr vrealize vsphere wcf wcms wearable web web-design web-framework web-scraping webdriver webview windows windows-10 windows-server wine wireless wsdl wxpython xamarin xen xenapp xml zurb

We need to build machine learning tools to augment machine learning engineers

Follow this topic

We need to build machine learning tools to augment machine learning engineers

As the use of analytics proliferate, companies will need to be able to identify models that are breaking bad.

January 11, 2018
Crowd (source: Pixabay)

Check out the machine learning sessions at the Strata Data Conference in London, May 21-24, 2018. Hurry—best price ends February 23.

In this post, I share slides and notes from a talk I gave in December 2017 at the Strata Data Conference in Singapore offering suggestions to companies that are actively deploying products infused with machine learning capabilities. Over the past few years, the data community has focused on infrastructure and platforms for data collection, including robust pipelines and highly scalable storage systems for analytics. According to a recent LinkedIn report, the top two emerging jobs are “machine learning engineer" and “data scientist." Companies are starting to staff to put their data infrastructures to work, and machine learning is going become more prevalent in the years to come.

data model building and deployment
Figure 1. Slide by Ben Lorica.

As more companies start using machine learning in products, tools, and business processes, let’s take a quick tour of model building, model deployment, and model management. It turns out that once a model is built, deploying and managing it in production requires engineering skills. So much so that earlier this year, we noted that companies have created a new job role—machine learning (or deep learning) engineer—for people tasked with productionizing machine learning models.

deploy and manage data models in production
Figure 2. Slide by Ben Lorica.

Modern machine learning libraries and tools like notebooks have made model building simpler. New data scientists need to make sure they understand the business problem and optimize their models for it. In a diverse region like Southeast Asia, models need to be localized, as conditions and contexts differ across countries in the ASEAN.

optimizing a business metric
Figure 3. Slide by Ben Lorica.

Looking ahead to 2018, rising awareness of the impact of bias, and the importance of fairness and transparency, means that data scientists need to go beyond simply optimizing a business metric. We will need to treat these issues seriously, in much the same way we devote resources to fixing security and privacy issues.

machine learning security and privacy bugs
Figure 4. Slide by Ben Lorica.

While there’s no comprehensive checklist one can go through to systematically address issues pertaining to fairness, transparency, and accountability, the good news is that the machine learning research community has started to offer suggestions and some initial steps model builders can take. Let me go through a couple of simple examples.

Get O'Reilly's weekly data newsletter

Imagine you have an important feature (say, distance from a specific location) of a machine learning model. But there are groups in your population (say, high and low income) for which this feature has very different distributions. What could happen is that your model would have disparate impact across these two groups. A relevant example is a pricing model introduced online by Staples: the model suggested different prices based on location of users.

disparate impact
Figure 5. Slide by Ben Lorica.

In 2014, a group of researchers offered a data renormalization method to remove disparate impact:

renormalization to remove disparate impact
Figure 6. Slide by Ben Lorica, with HT to

Another example has to do with error: once we are satisfied with a certain error rate, aren’t we done and ready to deploy our model to production? Consider a scenario where you have a machine learning model used in health care: in the course of model building, your training data for millenials (in red) is quite large compared to the number of labeled examples from senior citizens (in blue). Since accuracy tends to be correlated with the size of your training set, chances are the error rate for senior citizens will be higher than for millenials.

disproportionate error rate
Figure 7. Slide by Ben Lorica.

For situations like this, a group of researchers introduced a concept, called "equal opportunity", that can help alleviate disproportionate error rates and ensure the “true positive rate" for the two groups are similar. See their paper and accompanying interactive visualization.

So, at least for association “bugs" we have a few items that we should be checking for:

unwarranted associations
Figure 8. Slide by Ben Lorica, with HT to

Discovering unwarranted associations will require tools to augment our data scientists and machine learning engineers. Sometimes the output space for your models will be too big for manual review and inspection. In 2015, Google Photos included an automatic image tagging utility that failed badly in certain situations. Google was strongly criticized for it (and rightfully so), but to their credit, they stepped in and came up with a fix in a timely fashion. This is an example where the output space—the space of possible “tags"—is large enough that things can easily go undetected. Machine learning engineers could have used QA tools that surface possible problems they can review manually, before deploying this model in production.

discovery of association bugs failure
Figure 9. Slide by Ben Lorica, tweet by @jackyalcine.

The original checklist for deploying and managing models in production contains items that are related to some of the issues I’ve been discussing:

  • Monitoring models: In many cases model performance degrades necessitating periodic retraining. Besides monitoring ML or business metrics, it’s also reasonable to include tools that can monitor for unwarranted associations that may start creeping up.
  • Mission-critical apps: As machine learning gets deployed in critical situations, the bar for deployment will get higher. Model reproducibility and error estimates will be needed.
  • Security and privacy: Models that are fair and unbiased may come under attack and start behaving unpredictably. Users and regulators will also start demanding that models be able to adhere to strict privacy protections.

Let’s take the checklist for machine learning engineers and add some first steps to guard against bias.

checklist guarding against bias
Figure 10. Slide by Ben Lorica.

This is for a single model (or a single ensemble of models). As we look ahead, we know that companies will start building machine learning into many products, tools, and business processes. In reality, machine learning engineers will be responsible for many, many models in production:

many models in production
Figure 11. Slide by Ben Lorica.

How do we help our machine learning engineers identify models that are breaking bad? Note that this is similar to a problem that we’ve encountered before. Companies have been building tools (observability platforms) to help them monitor web pages and web services, and some of the bigger companies have been monitoring many time series. In 2013, I wrote about the tools Twitter was using at the time to monitor hundreds of millions of time series.

identify models that are breaking bad
Figure 12. Slide by Ben Lorica, graphic by, used with permission.

As companies deploy hundreds, thousands, and millions of machine learning models, we need tools to augment our data scientists and machine learning engineers. We will need to use machine learning to monitor machine learning! At the end of the day, your staff of experts will still need to look through issues that arise, but they will need at least some automated tools to help them handle the volume of models in production.

use machine learning to monitor models at scale
Figure 13. Slide by Ben Lorica, graphics by, used with permission.

In 2018, we need to treat model fairness, transparency, and explainability much more seriously. The machine learning research community is engaged in these issues, and they are starting to offer suggestions for how to detect problems and how to alleviate problems that arise. Because companies are beginning to roll out machine learning in many settings, we need to build machine learning tools to augment our teams of data scientists and machine learning engineers. We need our human staff to remain at the frontlines, but we need to give them tools to cope with the coming tsunami of models in production.

Related content:

Article image: Crowd (source: Pixabay).
  1. inShare.155

Ben Lorica

Ben Lorica

Ben Lorica is the Chief Data Scientist at O'Reilly Media, Inc. and is the Program Director of both the Strata Data Conference and the O'Reilly Artificial Intelligence Conference. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services.


Lights on a board

Oil, Gas, and Data

High-performance data tools in the production of industrial power

Side lever engine

Designing great data products

The Drivetrain Approach: A four-step process for building data products.

Railway tracks

The next 10 years of Apache Hadoop

Doug Cutting, Tom White, and Ben Lorica explore Hadoop's role over the coming decade.


Data: Emerging Trends and Technologies

How sensors, fast networks, AI, and distributed computing are affecting the data landscape

comments powered by Disqus