{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title: \"Benchmark GPU vs CPU with TensorFlow\"\n", "date: 2021-02-24\n", "type: technical_note\n", "draft: false\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmark GPU vs CPU & multi-host vs single host\n", "---\n", "\n", "This notebook can be used to benchmark performance using CPU, a single GPU or many GPUs.\n", "\n", "

Tested with TensorFlow 2.4.0

\n", "\n", "

\n", "

Machine Learning on Hopsworks\n", "

\n", "

\n", "\n", "![hops.png](../images/hops.png)\n", "\n", "## The `hops` python module\n", "\n", "`hops` is a helper library for Hops that facilitates development by hiding the complexity of running applications and iteracting with services.\n", "\n", "Have a feature request or encountered an issue? Please let us know on github.\n", "\n", "### Using the `experiment` module\n", "\n", "To be able to run your Machine Learning code in Hopsworks, the code for the whole program needs to be provided and put inside a wrapper function. Everything, from importing libraries to reading data and defining the model and running the program needs to be put inside a wrapper function.\n", "\n", "The `experiment` module provides an api to Python programs such as TensorFlow, Keras and PyTorch on a Hopsworks on any number of machines and GPUs.\n", "\n", "An Experiment could be a single Python program, which we refer to as an **Experiment**. \n", "\n", "Grid search or genetic hyperparameter optimization such as differential evolution which runs several Experiments in parallel, which we refer to as **Parallel Experiment**. \n", "\n", "ParameterServerStrategy, CollectiveAllReduceStrategy and MultiworkerMirroredStrategy making multi-machine/multi-gpu training as simple as invoking a function for orchestration. This mode is referred to as **Distributed Training**.\n", "\n", "### Using the `tensorboard` module\n", "The `tensorboard` module allow us to get the log directory for summaries and checkpoints to be written to the TensorBoard we will see in a bit. The only function that we currently need to call is `tensorboard.logdir()`, which returns the path to the TensorBoard log directory. Furthermore, the content of this directory will be put in as a Dataset in your project's Experiments folder.\n", "\n", "The directory could in practice be used to store other data that should be accessible after the experiment is finished.\n", "```python\n", "# Use this module to get the TensorBoard logdir\n", "from hops import tensorboard\n", "tensorboard_logdir = tensorboard.logdir()\n", "```\n", "\n", "### Using the `hdfs` module\n", "The `hdfs` module provides a method to get the path in HopsFS where your data is stored, namely by calling `hdfs.project_path()`. The path resolves to the root path for your project, which is the view that you see when you click `Data Sets` in HopsWorks. To point where your actual data resides in the project you to append the full path from there to your Dataset. For example if you create a mnist folder in your Resources Dataset, the path to the mnist data would be `hdfs.project_path() + 'Resources/mnist'`\n", "\n", "```python\n", "# Use this module to get the path to your project in HopsFS, then append the path to your Dataset in your project\n", "from hops import hdfs\n", "project_path = hdfs.project_path()\n", "```\n", "\n", "```python\n", "# Downloading the mnist dataset to the current working directory\n", "from hops import hdfs\n", "mnist_hdfs_path = hdfs.project_path() + \"Resources/mnist\"\n", "local_mnist_path = hdfs.copy_to_local(mnist_hdfs_path)\n", "```\n", "\n", "### Documentation\n", "See the following links to learn more about running experiments in Hopsworks\n", "\n", "- Learn more about experiments\n", "
\n", "- Building End-To-End pipelines\n", "
\n", "- Give us a star, create an issue or a feature request on Hopsworks github\n", "\n", "### Managing experiments\n", "Experiments service provides a unified view of all the experiments run using the `experiment` module.\n", "
\n", "As demonstrated in the gif it provides general information about the experiment and the resulting metric. Experiments can be visualized meanwhile or after training in a TensorBoard.\n", "
\n", "
\n", "![experiments.gif](../images/experiments.gif)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def wrapper():\n", " import tensorflow as tf\n", " \n", " # Wrapper for keras_applications, you can import any model you want to try (like ResNet50)\n", " from tensorflow.keras.applications import ResNet50\n", "\n", " import numpy as np\n", " \n", " from hops import tensorboard\n", " \n", " # Utility module for getting number of GPUs accessible by the container (Spark Executor)\n", " from hops import devices\n", "\n", " batch_size = 8 # Number of samples to process on each GPU every iteration\n", " \n", " # Image dimensions\n", " height = 224\n", " width = 224\n", " channels = 3\n", " num_classes = 1000\n", " \n", " num_iterations = 5000 # Number of iterations, increase to run longer\n", " \n", "\n", " log_dir = tensorboard.logdir()\n", " \n", " # Read synthetic data (can be replaced with real data)\n", " def input_fn():\n", " data = np.random.random((batch_size, height, width, channels)).astype(np.float32)\n", " labels = np.random.random((batch_size, num_classes))\n", " dataset = tf.data.Dataset.from_tensor_slices((data, labels))\n", " dataset = dataset.repeat(num_iterations)\n", " dataset = dataset.batch(batch_size)\n", " return dataset \n", " \n", " tf.keras.backend.set_learning_phase(True)\n", "\n", " \n", " # Define distribution strategy\n", " strategy = tf.distribute.MirroredStrategy()\n", "\n", "\n", " with strategy.scope():\n", " model = ResNet50(weights=None, input_shape=(height, width, channels), classes=num_classes)\n", "\n", " optimizer = tf.keras.optimizers.RMSprop(0.2)\n", " model.compile(loss='categorical_crossentropy', optimizer=optimizer)\n", "\n", "\n", " callbacks = [\n", " tf.keras.callbacks.TensorBoard(log_dir=log_dir),\n", " tf.keras.callbacks.ModelCheckpoint(filepath=log_dir),\n", " ]\n", " model.fit(input_fn(), \n", " verbose=0,\n", " epochs=3, \n", " steps_per_epoch=5,\n", " validation_data=input_fn(),\n", " callbacks=callbacks\n", " )\n", " model.evaluate(input_fn())\n", " " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finished Experiment \n", "\n", "('hdfs://rpc.namenode.service.consul:8020/Projects/demo/Experiments/application_1594231828166_0163_3', {'metric': None, 'log': 'Experiments/application_1594231828166_0163_3/output.log'})" ] } ], "source": [ "from hops import experiment\n", "experiment.launch(wrapper, local_logdir=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "PySpark", "language": "python", "name": "pysparkkernel" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "mimetype": "text/x-python", "name": "pyspark", "pygments_lexer": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }