What I learned after completing pet-project during one weekend

Dmytro Kisil
6 min readJul 8, 2020

From forming an idea to ready for users service

Photo by tabitha turner on Unsplash

TLDR: You can found all the code here.

Just saw Tensorflow tutorial about the adversarial example(link) and want to implement similar service but with technology, which I don’t use a whole stack — GCP, GKE, Docker.

Idea — make similar functionality, but — allow a user to upload own photos and change class index from labrador(as in the tutorial) to dynamically changed based on uploaded photo top prediction. And make epsilon (value, which regulates how much perturbations add to an image) interactive parameter, so the client could change this value and get a new image without reloading page.

For the backend part Flask would suit the best — project wasn’t a complicated thing — we don’t even need to create a database, user registration, etc.

For the frontend part, I choose Dash — because I familiar with it and know that implement all the callbacks should take too long. When googling about Kubernetes in GCP found that helpful tutorial(link) which I would use for deploying part of the project.

So Idea just formed and all that remains — realize idea as a service, which will allow users to choose their photo and download it version with perturbations.

All code commands, which I used many times during the project, you could found in HowToUse.txt file.

PROJECT_ID is an environment variable, which is set to the id of your project(you got it after the creation of a new project in Google).

During project Imagenet class names with ids were required(where we from uploaded photo found label from a detected Imagenet class and must find its class id for creating perturbations). After searching found two examples, one of them suits the best.

Implementation

Making service took around 16 hours. At this time users can upload photos and see changes in it when choosing different epsilon parameters. But, download button works ugly — it returns resized image and doesn’t update the image after the first change of epsilon.

Things, which should be fixed for better usage(took another almost 10 hours for all next week):

Docker optimization

docker image was huge — 3.5 GB, so trying to reducing it size.

Made a huge mistake — having virtualenv, which placed at venv folder and which uploaded to a docker image(why, how can you missed that?)(1.0gb). Learned how to properly use .dockerignore to avoid including venv in a docker image.

Use python slim image instead of full(850 Mb), applied example which removes cache from image (another 300 Mb), rewrite Dockerfile so installing requirements wouldn’t rerun for unchanged packages at each docker build. Also, use Tensorflow CPU (350 Mb) image instead of full — the machine would have only CPU, so we don’t need to install GPU version.

After optimization Docker now builds much faster if remove — no-cache-dir flag (because of skipping all unchanged packages in requirements.txt) and have 1.0gb size.

How optimized Dockerfile look like(feel free to use it/ If docker size not critical, you can remove — no-cache-dir and don’t waste time on install all packages again and again on each docker build):

ARG CODE_VERSION=”3.8-slim”
ARG PROJECT_DIR="protect-your-photo-from-recognition"
FROM python:${CODE_VERSION}
LABEL mantainer=”Dmitriy Kisil <email: logart1995@gmail.com>”
COPY ./requirements.txt ./${PROJECT_DIR}/requirements.txt
WORKDIR /${PROJECT_DIR}
RUN pip3 install — no-cache-dir -r requirements.txt
COPY . /${PROJECT_DIR}
EXPOSE 8050
CMD [“python3”, “app.py”]

For example, this particular image build took around 1 minute and have 1.0 GB size without use cache. And 1 second(!) using cache (and having 1.3 GB in size). For this project, I used the first approach without cache.

Improve comfort when using docker-compose

version: '3'

services:
web:
image: 'gcr.io/${PROJECT_ID}/protect-your-photo-from-recognition:v1'
build: .
environment:
# See prints from docker console
PYTHON_UNBUFFERED: 1
# Add for hot-reload
FLASK_APP: "app"
FLASK_ENV: "development"
ports:
- "8050:8050"
volumes:
- .:/protect-your-photo-from-recognition

A couple of things I should be noticing here, which you might found useful. Often in active development, you might crash something. To know, what you doing wrong, the use of print() would be very helpful. But by default, when the container is up, you can’t see any of print() from the code in the console. This behavior can be improved by setting PYTHON_UNBUFFERED.

Another thing is again, in active development phase many changes to code are made in a tiny amount of time. When container is up, you need to stop it and start again to see how changes affect your service. How to do that? Default behavior is: press Ctrl+c to stop container and run again with docker-compose up. You can improve this process a little when working with Flask by setting FLASK_ENV. Now, when you made changes in code, you need saved file, where changes have been made and click on console— and container will be restarted.

Change opencv2 to PIL

Another thing — I can’t build an image with opencv2 and python3-slim. After struggling a lot, decided to remove opencv2. And this decision requires to make significant changes in code because almost all operations with image were using opencv2. This lead to chosen PIL as an alternative of opencv2 — Tensorflow even provides a few functions in keras.preprocessing.image module, which works with PIL great and converts from NumPy arrays to PIL.Image and back.

Thanks to that, I created a bunch of snippets to convert images from one type of data to another using TF and opencv2 and using TF and PIL.

Note — converting to dtype float32 needed for model.predict() and uint8 — for showing image as base64 string in some HTML code. Added 4 dimension is also a requirement for making a model prediction because mobilenetv2 requires 4-dimensional array as input.

Fixing bugs

Almost at the end of project discovered that Dash supports more than one output in callback functions — so I rewrote few functions to reduce the number of calls between them.

Found that initial way to create image after click epsilon slider leads to a problem, when all further changes to slider don’t affect the downloaded image in a proper way (client need to update downloaded link to get an expected photo). So added removing previous version of image and create image inside function, which make perturbations.

Another improvement is to save shapes of downloaded photos and make this data accessible inside download function: it’s allowed to resize photos (which have shapes (224,224,3) after mobilenet prediction) back to initial shapes. Now you can upload photos with any size and be able to download back photos with perturbations and with the same shapes as before.

Budget optimizing

Initially, Google suggests using an n1-standard-1 machine with 1 CPU and 3.75 GB of RAM which costs around 25$ per month. After looking at node utilization, found that app only uses 800 MB of memory, so tried another machine. E2-small costs 12$(shared 2vCPU, 2 GB of RAM) and achieves almost the same performance. E2-micro (shared 2vCPU, 1GB of RAM) with 1 node wasn’t even started, so tried to enable autoscaling and increase nodes to 2 with a minimum set to 1 node (can’t set 0 because cluster needs at least one worked node). But with E2-micro project doesn’t launch, so choose E2-small. Tried g1-small also — but project wasn’t even launched with three nodes up.

Other

Also, learn how to do rolling updates in GKE. For each improvement built an image with a new version and perform rolling update — so service doesn’t expect a downtime during uploads a new image.

Summary

Yeah! Now service is fully available and worked as expected! You can check it using this link.

Leave a comment below, I want to know what you think about project, idea, or implementation. Have a nice day!=)

--

--