{"id":29316,"date":"2022-03-30T11:10:07","date_gmt":"2022-03-30T11:10:07","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=29316"},"modified":"2022-12-23T06:20:00","modified_gmt":"2022-12-23T06:20:00","slug":"what-is-caffe-and-how-it-works-an-overview-and-its-use-cases-2","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/what-is-caffe-and-how-it-works-an-overview-and-its-use-cases-2\/","title":{"rendered":"What is Caffe and How it works? An Overview and Its Use Cases"},"content":{"rendered":"<h3>History &amp; Origin of Caffe<\/h3>\n<p>Deep learning is the new big trend in machine learning. It had many recent successes in computer vision, automatic speech recognition and natural language processing.<\/p>\n<p>The goal of this blog post is to give you a hands-on introduction to deep learning. To do this, we will build a Cat\/Dog image classifier using a deep learning algorithm called convolutional neural network (CNN) and a\u00a0Kaggle dataset.<\/p>\n<p>This post is divided into 2 main parts. The first part covers some core concepts behind deep learning, while the second part is structured in a hands-on tutorial format.<\/p>\n<p>In the first part of the hands-on tutorial (section 4), we will build a Cat\/Dog image classifier using a convolutional neural network from scratch. In the second part of the tutorial (section 5), we will cover an advanced technique for training convolutional neural networks called transfer learning. We will use some Python code and a popular open source deep learning framework called Caffe to build the classifier. Our classifier will be able to achieve a classification accuracy of 97%.<\/p>\n<p>By the end of this post, you will understand how convolutional neural networks work, and you will get familiar with the steps and the code for building these networks.<\/p>\n<h3>What is Caffe<\/h3>\n<p>Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors.\u00a0Yangqing Jia\u00a0created the project during his PhD at UC Berkeley. Caffe is released under the\u00a0BSD 2-Clause license.<\/p>\n<p><strong>Expressive architecture<\/strong>\u00a0encourages application and innovation. Models and optimization are defined by configuration without hard-coding. Switch between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.<\/p>\n<p><strong>Extensible code<\/strong>\u00a0fosters active development. In Caffe\u2019s first year, it has been forked by over 1,000 developers and had many significant changes contributed back. Thanks to these contributors the framework tracks the state-of-the-art in both code and models.<\/p>\n<p><strong>Speed<\/strong>\u00a0makes Caffe perfect for research experiments and industry deployment. Caffe can process\u00a0<strong>over 60M images per day<\/strong>\u00a0with a single NVIDIA K40 GPU*. That\u2019s 1 ms\/image for inference and 4 ms\/image for learning and more recent library versions and hardware are faster still. We believe that Caffe is among the fastest convnet implementations available.<\/p>\n<h3>How Caffe works aka Caffe architecture?<\/h3>\n<p>Classification using a machine learning algorithm has 2 phases:<\/p>\n<ul>\n<li>Training phase: In this phase, we train a machine learning algorithm using a dataset comprised of the images and their corresponding labels.<\/li>\n<li>Prediction phase: In this phase, we utilize the trained model to predict labels of unseen images.<\/li>\n<\/ul>\n<p>The training phase for an image classification problem has 2 main steps:<\/p>\n<ol>\n<li>Feature Extraction: In this phase, we utilize domain knowledge to extract new features that will be used by the machine learning algorithm.\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Histogram_of_oriented_gradients\">HoG<\/a>\u00a0and\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Scale-invariant_feature_transform\">SIFT<\/a>\u00a0are examples of features used in image classification.<\/li>\n<li>Model Training: In this phase, we utilize a clean dataset composed of the images&#8217; features and the corresponding labels to train the machine learning model.<\/li>\n<\/ol>\n<p>In the predicition phase, we apply the same feature extraction process to the new images and we pass the features to the trained machine learning algorithm to predict the label.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-29317 size-large\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/03\/machine-learning-training-prediction-2-1024x353.png\" alt=\"\" width=\"760\" height=\"262\" \/><\/p>\n<p>The main difference between traditional machine learning and deep learning algorithms is in the feature engineering. In traditional machine learning algorithms, we need to hand-craft the features. By contrast, in deep learning algorithms feature engineering is done automatically by the algorithm. Feature engineering is difficult, time-consuming and requires domain expertise. The promise of deep learning is more accurate machine learning algorithms compared to traditional machine learning with less or no feature engineering.<\/p>\n<div>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-29318 size-large\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2022\/03\/traditional-ml-deep-learning-2-1024x275.png\" alt=\"\" width=\"760\" height=\"204\" \/><\/p>\n<\/div>\n<h3>Use case of Caffe<\/h3>\n<p>Caffe is being used in\u00a0<b>academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia<\/b>. Yahoo! has also integrated Caffe with Apache Spark to create CaffeOnSpark, a distributed deep learning framework.<\/p>\n<h3>Feature and Advantage of using Caffe<\/h3>\n<ul>\n<li>Once you venture outside of Caffe\u2019s comfort zone, i.e. convnets, its usability drops significantly. For example, try doing RNN for language modeling in Caffe.<\/li>\n<li>All the machinery e.g. protobuf, layers etc. comes in the way once you try to define your own layer types.<\/li>\n<li>Only a few input formats and only one output format, HDF5 (although you can always run it using its Python\/C++\/Matlab interface and getting output data from there).<\/li>\n<li>Multi-GPU training is partially supported but not all the different ways of doing that such as model\/data parallelism etc.<\/li>\n<li>Many interesting modifications are only available in patches submitted by different people.<\/li>\n<\/ul>\n<h3>Best Alternative of Caffe<\/h3>\n<div class=\"co8aDb\" role=\"heading\"><b>Top 10 Alternatives to Caffe<\/b><\/div>\n<div class=\"RqBzHd\">\n<ul class=\"i8Z77e\">\n<li class=\"TrT0Xe\">Keras.<\/li>\n<li class=\"TrT0Xe\">DeepPy.<\/li>\n<li class=\"TrT0Xe\">NVIDIA Deep Learning GPU Training System (DIGITS)<\/li>\n<li class=\"TrT0Xe\">TFLearn.<\/li>\n<li class=\"TrT0Xe\">Torch.<\/li>\n<li class=\"TrT0Xe\">Clarifai.<\/li>\n<li class=\"TrT0Xe\">Microsoft Cognitive Toolkit (Formerly CNTK)<\/li>\n<li class=\"TrT0Xe\">AWS Deep Learning AMIs.<\/li>\n<\/ul>\n<h3>Best Resources, Tutorials and Guide for Caffe<\/h3>\n<ol>\n<li><strong><a href=\"https:\/\/www.devopsschool.com\/\">DevOpsSchool<\/a><\/strong><\/li>\n<li><a href=\"https:\/\/www.scmgalaxy.com\/\"><strong>ScmGalaxy<\/strong><\/a><\/li>\n<li>Edureka<\/li>\n<li>Simplilearn<\/li>\n<\/ol>\n<\/div>\n<h2>Free Video Tutorials of\u00a0 Caffe<\/h2>\n<figure class=\"wp-block-embed wp-block-embed-youtube is-type-video is-provider-youtube epyt-figure\"><div class=\"wp-block-embed__wrapper\"><iframe loading=\"lazy\"  id=\"_ytid_99054\"  width=\"760\" height=\"427\"  data-origwidth=\"760\" data-origheight=\"427\" src=\"https:\/\/www.youtube.com\/embed\/Ax9f1zQ_2l8?enablejsapi=1&autoplay=0&cc_load_policy=0&cc_lang_pref=&iv_load_policy=1&loop=0&rel=1&fs=1&playsinline=0&autohide=2&theme=dark&color=red&controls=1&disablekb=0&\" class=\"__youtube_prefs__  no-lazyload\" title=\"YouTube player\"  allow=\"fullscreen; accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen data-no-lazy=\"1\" data-skipgform_ajax_framebjll=\"\"><\/iframe><\/div><\/figure>\n<figure class=\"wp-block-embed wp-block-embed-youtube is-type-video is-provider-youtube epyt-figure\"><div class=\"wp-block-embed__wrapper\"><iframe loading=\"lazy\"  id=\"_ytid_63624\"  width=\"760\" height=\"427\"  data-origwidth=\"760\" data-origheight=\"427\" src=\"https:\/\/www.youtube.com\/embed\/p4ohWMhWdrI?enablejsapi=1&autoplay=0&cc_load_policy=0&cc_lang_pref=&iv_load_policy=1&loop=0&rel=1&fs=1&playsinline=0&autohide=2&theme=dark&color=red&controls=1&disablekb=0&\" class=\"__youtube_prefs__  no-lazyload\" title=\"YouTube player\"  allow=\"fullscreen; accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen data-no-lazy=\"1\" data-skipgform_ajax_framebjll=\"\"><\/iframe><\/div><\/figure>\n<h3>Interview Questions and Answer for Caffe<\/h3>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q1<\/i>:<\/span> What is the difference between\u00a0<em>Machine Learning<\/em>\u00a0and\u00a0<em>Deep Learning<\/em>?<br \/>\nAnswer<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_6Rgy\">\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li><strong><em>Machine Learning<\/em><\/strong>\u00a0depends on humans to learn. Humans determine the hierarchy of\u00a0<em>features<\/em>\u00a0to determine the difference between the data input. It usually requires more structured data to learn.<\/li>\n<li><strong><em>Deep Learning<\/em><\/strong>\u00a0automates much of the\u00a0<em>feature extraction<\/em>\u00a0piece of the process. It eliminates the manual human intervention required.<\/li>\n<li><strong><em>Machine Learning<\/em><\/strong>\u00a0is less dependent on the amount of data as compared to deep learning.<\/li>\n<li><strong><em>Deep Learning<\/em><\/strong>\u00a0requires a lot of data to give high accuracy. It would take thousands or millions of data points which are trained for days or weeks to give an acceptable accurate model.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q2<\/i>:<\/span> What are\u00a0<em>Ensemble<\/em>\u00a0methods and how are they useful in Deep Learning?\n<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_VIUx\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li><strong><em>Ensemble<\/em><\/strong>\u00a0methods are used to increase the\u00a0<em>generalization<\/em>\u00a0power of a model. These methods are applicable to both deep learning as well as machine learning algorithms.<\/li>\n<li>Some ensemble methods introduced in neural networks are\u00a0<strong><em>Dropout<\/em><\/strong>\u00a0and\u00a0<strong><em>Dropconnect<\/em><\/strong>. The improvement in the model depends on the type of data and the nature of neural architecture.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q3<\/i>:<\/span> What advantages does\u00a0<em>Deep Learning<\/em>\u00a0have over\u00a0<em>Machine Learning<\/em>?<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_WUUYm\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li><strong><em>Deep Learning<\/em><\/strong>\u00a0gives a\u00a0<em>better performance<\/em>\u00a0compared to machine learning if the dataset is large enough.<\/li>\n<li><strong><em>Deep Learning<\/em><\/strong>\u00a0does not need the person designing the model to have a lot of\u00a0<em>domain understanding for feature introspection<\/em>. Deep learning outshines other methods if there is\u00a0<em>no feature engineering done<\/em>.<\/li>\n<li><strong><em>Deep Learning<\/em><\/strong>\u00a0really shines when it comes to\u00a0<em>complex problems<\/em>\u00a0such as\u00a0<em>image classification<\/em>,\u00a0<em>natural language processing<\/em>, and\u00a0<em>speech recognition<\/em>.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q4<\/i>:<\/span> Why does the performance of\u00a0<em>Deep Learning<\/em>\u00a0improve as more data is fed to it?<br \/>\n<span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_0eQes\">\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li>One of the best benefits of\u00a0<strong>Deep Learning<\/strong>\u00a0is its ability to perform\u00a0<strong><em>automatic feature extraction<\/em><\/strong>\u00a0from raw data.<\/li>\n<li>When the number of data fed into the learning algorithm increases, there will be more\u00a0<em>edge cases<\/em>\u00a0taken into consideration and hence the algorithm will learn to make the right decisions in those edge cases.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q5<\/i>:<\/span> What is the difference between\u00a0<em>Deep Learning<\/em>\u00a0and\u00a0<em>Artificial Neural Networks<\/em>?\n<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_ZLkWs\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li>When researchers started to create\u00a0<strong><em>large<\/em><\/strong>\u00a0artificial neural networks, they started to use the word\u00a0<strong>deep<\/strong>\u00a0to refer to them.<\/li>\n<li>As the term\u00a0<em>deep learning<\/em>\u00a0started to be used, it is generally understood that it stands for artificial neural networks which are\u00a0<strong>deep<\/strong>\u00a0as opposed to\u00a0<strong>shallow<\/strong>\u00a0artificial neural networks.<\/li>\n<li><strong><em>Deep Artificial Neural Networks<\/em><\/strong>\u00a0and\u00a0<strong><em>Deep Learning<\/em><\/strong>\u00a0are generally the\u00a0<em>same thing<\/em>\u00a0and mostly used interchangeably.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q5<\/i>:<\/span> What is the difference between\u00a0<em>Deep Learning<\/em>\u00a0and\u00a0<em>Artificial Neural Networks<\/em>?<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_ZLkWs\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li>When researchers started to create\u00a0<strong><em>large<\/em><\/strong>\u00a0artificial neural networks, they started to use the word\u00a0<strong>deep<\/strong>\u00a0to refer to them.<\/li>\n<li>As the term\u00a0<em>deep learning<\/em>\u00a0started to be used, it is generally understood that it stands for artificial neural networks which are\u00a0<strong>deep<\/strong>\u00a0as opposed to\u00a0<strong>shallow<\/strong>\u00a0artificial neural networks.<\/li>\n<li><strong><em>Deep Artificial Neural Networks<\/em><\/strong>\u00a0and\u00a0<strong><em>Deep Learning<\/em><\/strong>\u00a0are generally the\u00a0<em>same thing<\/em>\u00a0and mostly used interchangeably.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q6<\/i>:<\/span> How would you choose the\u00a0<em>Activation Function<\/em>\u00a0for a Deep Learning model?\n<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_9hcmY\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li>If the output to be predicted is\u00a0<em>real<\/em>, then it makes sense to use a\u00a0<strong>Linear Activation function<\/strong>.<\/li>\n<li>If the output to be predicted is a\u00a0<em>probability<\/em>\u00a0of a binary class, then a\u00a0<strong>Sigmoid function<\/strong>\u00a0should be used.<\/li>\n<li>If the output to be predicted has\u00a0<em>two classes<\/em>, then a\u00a0<strong>Tanh function<\/strong>\u00a0can be used.<\/li>\n<li><strong>ReLU function<\/strong>\u00a0can be used in many different cases due to its computational simplicity.<\/li>\n<\/ul>\n<div>\n<div class=\"text-center\">\n<div class=\"lazyload-wrapper\"><img decoding=\"async\" class=\"img-fluid img-max\" src=\"https:\/\/gblobscdn.gitbook.com\/assets%2F-LvBP1svpACTB1R1x_U4%2F-LvNWUoWieQqaGmU_gl9%2F-LvO3qs2RImYjpBE8vln%2Factivation-functions3.jpg?alt=media&amp;token=f96a3007-5888-43c3-a256-2dafadd5df7c\" \/><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q7<\/i>:<\/span> How does\u00a0<em>Ensemble Systems<\/em>\u00a0help in\u00a0<em>Incremental Learning<\/em>?<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_jTEO\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li><strong>Incremental learning<\/strong>\u00a0refers to the ability of an algorithm to learn from new data that may become available after a\u00a0<em>classifier<\/em>\u00a0has already been generated from a previously available dataset.<\/li>\n<li>An algorithm is said to be an\u00a0<strong>incremental learning algorithm<\/strong>\u00a0if, for a sequence of training datasets, it produces a sequence of hypotheses where the current hypothesis describes all data that have been seen thus far but depends only on previous hypotheses and the current training data.<\/li>\n<li><strong>Ensemble-based systems<\/strong>\u00a0can be used for such problems by training an additional\u00a0<em>classifier<\/em>\u00a0(or an additional ensemble of classifiers) on each dataset that becomes available.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q9<\/i>:<\/span> How to know whether your model is suffering from the problem of\u00a0<em>Vanishing Gradients<\/em>?<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_EhRrD\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<ul>\n<li>The model will improve\u00a0<em>very slowly<\/em>\u00a0during the training phase and it is also possible that training stops\u00a0<em>very early<\/em>, meaning that any further training does not improve the model.<\/li>\n<li>The weights closer to the output layer of the model would witness more of a change whereas the layers that occur closer to the input layer would not change much (if at all).<\/li>\n<li>Model weights\u00a0<em>shrink exponentially<\/em>\u00a0and become\u00a0<em>very small<\/em>\u00a0when training the model.<\/li>\n<li>The model weights become\u00a0<code>0<\/code>\u00a0in the training phase.<\/li>\n<\/ul>\n<div class=\"my-1 px-2 py-2 rounded hovered \">\n<div class=\"row justify-content-center align-self-center\">\n<div class=\"col justify-content-center align-self-center my-auto\"><span class=\"_12I73W text-muted\"><i class=\"highlight\">Q10<\/i>:<\/span> How to know whether your model is suffering from the problem of\u00a0<em>Exploding Gradients<\/em>?<\/div>\n<\/div>\n<\/div>\n<div class=\"_2VQuOG _3NT6Zz\">\n<div class=\"_3lkg\">\n<div class=\"mb-2\"><span class=\"h5 highlight font-weight-bold\">Answer<\/span><\/div>\n<div class=\"_2qpc-u\">\n<div class=\"_vRK7Jy _1CFMrL\">\n<p>There are some subtle signs that you may be\u00a0<em>suffering from exploding gradients<\/em>\u00a0during the training of your network, such as:<\/p>\n<ul>\n<li>The model is unable to get traction on your training data (e g.\u00a0<em>poor loss<\/em>).<\/li>\n<li>The model is\u00a0<em>unstable<\/em>, resulting in large changes in loss from update to update.<\/li>\n<li>The model loss goes to\u00a0<code>NaN<\/code>\u00a0during training.<\/li>\n<\/ul>\n<p>If you have these types of problems, you can dig deeper to see if you have a problem with exploding gradients. There are some less subtle signs that you can use to confirm that you have exploding gradients:<\/p>\n<ul>\n<li>The model weights quickly become very large during training.<\/li>\n<li>The model weights go to\u00a0<code>NaN<\/code>\u00a0values during training.<\/li>\n<li>The error gradient values are consistently above\u00a0<code>1.0<\/code>\u00a0for each node and layer during training.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>History &amp; Origin of Caffe Deep learning is the new big trend in machine learning. It had many recent successes in computer vision, automatic speech recognition and&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[2],"tags":[],"class_list":["post-29316","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/29316","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=29316"}],"version-history":[{"count":1,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/29316\/revisions"}],"predecessor-version":[{"id":32431,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/29316\/revisions\/32431"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=29316"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=29316"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=29316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}