Data is the food for AI. For machine Learning, or supervised learning, the golden labels are key for the models to recognize the pattern within the data. However, in the real-world data, it is usually hard to get large amount of labeled data, for example, search revelance, news topics, autopilot, etc. Recently, Angrew Ng gave a talk on MLOps: From Model-centric to Data-centric AI, where he mentioned the Idea from Big Data to Good Data. Good data is defined consistently and cover of important cases. It has timely feedback From production data and is sized appropriately.

So now the question…

ML Code Dive is a series of articles to deep dive machine learning algorithms without any ML libraries. Purely with Python and numpy! It not only allows you to better understand the details of the ML Models, but also helps you ace the Data Scientist/Machine Learning Engineer interviews! You can find all the codes here.

K-Nearest Neighbors, aka KNN, is one of the simplest machine learning models. …


Recently, reinforcement learning has been successfully applied in different problems like self-driving cars, trading and finance, and playing video games. In this paper, we solve a well-known robotic control problem — the lunar lander problem using Deep Q-Learning under OpenAI Gym’s LunarLander-v2 Environment. The winning agent can achieve over 266 average rewards for 100 test episodes. The paper will also show that different hyper-parameters, like batch size, learning rate and update size, affect both training speed in episodes and performance in rewards.


Lunar Lander problem is the task to control the fire orientation engine to help the lander land in…

Photo by eggbank on Unsplash

TL;DR: the article shows a framework to solve a ML System design Case Study, and we use UberEats Recommendation as an example to show the thinking process.

This is a Machine Learning System Design case study: how do You design the recommendation system For UberEats?

The framework for the ML System Design can be divided into several sections as follows:

  • Goals and Objectives
  • Requirements
  • High-Level Design
  • Evaluation
  • Further Improvements

The very first step is to define the goal and objectives. The goal can be both long term and short term based, and if there’s a conflict between them, it’s always a bonus to discuss the tradeoff. As for objectives, keep in mind the it is possible to have multiple objectives, and if that is the case, communicate…

The precision, recall, F1-score may not really work…

The article will describe the traditional metrics for binary classification, why they don’t work, the new cost function, and the implementation of the new loss function in Logistic Regression with sklearn and DNN with TensorFlow.

Three types of a problem statement on recommendation systems.


When we start to do a data science project, the first thing that we should always do is define the problem, or translate the business problem into a data science problem. It’s not only about dividing the big project into small parts, but also representing how to think about the problem, which may have varying performance in our final solution.

The recommendation system is because of information overload, and we can call it an information filter system. It greatly influences what we interact with the world: shopping (Amazon, Best Buy), music(Spotify)…

Power personalization with deep learning and pre-trained product embeddings


Deep learning has become a hot topic these years and it benefits a lot in different industries like retail, e-commerce, finance, etc. I am working at a global retail & e-commerce company with millions of product and customers. In my daily work, I use the power of data and deep learning to provide personalized recommendations for our customers, and recently I tried the embedding-based approach, which performs very well compared with our current algorithms.

In this blog, I will describe the Embedding technique that I developed, and how to implement it in the large-scale machine learning system. Basically, I trained…

In the big data world, more and more people hope to be a data scientist, utilizing the power of data and machine learning to solve real world problems. Currently it’s almost a year since I graduated and became a data scientist in an e-commerce company. I realized that there is a deep gap between data mining related academic program and the real work in the industry. So, in this post, I try to summarize five abilities from my work experience, which I think are important to differentiate data scientists.

Understanding customer is the key for the success for an e-commerce website, and using data-driven approach is one of the best ways to better understand our customers’ needs and wants, and then provide a better shopping experience for our customers. So here comes the key question: how to represent a customer in a data science way? In other ways, how to use the customer data to represent a customer, and feed them in a machine learning/deep learning model? A good representation of customers can be applied in many places:

  • customer segmentation
  • customer lifetime value
  • predictive models like CTR prediction, CV…

Louis Wang

Data Scientist, Machine Learning Engineer, Deep learning blogger.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store