1 and models for ANNs were developed in 1943
1 IntroductionThe idea of the Articial Neural Network (ANN) has been around since 1943 when the rstarticial neuron was created by Warren McCulloch and Walter Pitts 9. The inspiration forANNs comes from our own brains, which consist of roughly 100 billion neurons. Each of theneurons work independently and are connected to several thousand of the neurons surroundingit. They receive continuous signals from each other, which are integrated or summed togetherin some way and if a certain threshold is reached they will re a signal in response 4. ANNsare built to mimic this exact structure where a large amount of independent processing unitsare combined to form a powerful tool that has the capability to learn.Conventional computers solve problems with algorithms by going through instructions and fol-lowing them step by step. That limits the problems that we are able to deal with to those thatwe already know how to solve. A signicant reason for using ANNs is their ability to solveproblems that we don’t yet know the answer to, as well as being able to nd patterns and drawconclusions from complex and imprecise data 1.In recent years ANNs have experienced a resurgence due in part to the implementation ofnew training techniques and the availability of aordable and powerful processors 2 as well asthe abundance of data from the internet. Today ANNs nd usage in wide-ranging elds likeGoogle’s search engine, Tesla’s self-driving cars or Microsoft’s Cortana.2 HistoryThe rst articial neuron and models for ANNs were developed in 1943 by Warren McCullochand Walter Pitts 9. Their research formed the foundation for ANN research. In 1949 DonaldO. Hebb developed an unsupervised learning method that became known as Hebbian learning5. Following this achievement the Perceptron by Frank Rosenblatt in 1958 attracted a lot ofattention as well. Due to a paper published by Marvin Minsky and Seymour Papert in 1969 thatstated the limitations of ANNs at the time 10 research stagnated and interest in ANNs was lost.Only when Paul Werbos created the backpropagation algorithm in 1975 did the interest pickup again. It solved the problem pointed out in Minsky’s and Papert’s paper and sped up thetraining of ANNs.3 Architecture of ANNsThere are various dierent architectures for ANNs, but they all follow the same principle. Anetwork consists of a set of neurons that have weighted connections to each other.3.1 A single neuronThe most fundamental part of an ANN is the neuron also called node. It takes inputs from asource (e.g. a specic pixel value) or from other nodes and these inputs have weights attachedto them.By calculating the sum of the inputs in association with their assigned weights (also called trans-fer function) and passing that value through an activation function the node generates a value.If that value exceeds a certain threshold the node will generate an output. Figure 1 shows agraphical depiction of the mechanics of a neuron.1The threshold is a unique value to each node, that can be changed during a learning proce-dure. The activation function is used to receive a non-linear output from the neuron. There arethree activation functions that are used most frequently. The sigmoid function that turns theinput value to a value between 0 and 1. The tanh function that turns the input value to a valuebetween -1 and 1 and the ReLU function that replaces negative values with zero 6.Figure 1: Depiction of a single neuron. Source: https://commons.wikimedia.org/wiki/File:ArtificialNeuronModel_english.png3.2 Feedforward Neural NetworksIn a feedforward neural network (FNN) the neurons are grouped in layers. An input layer, nhidden layers and an output layer. The input layer take in information from the outside andthey only pass on the information to the hidden layers. The hidden layers are invisible to theoutside. They process the input they get from the input layer and pass them on to the outputlayer. The output layer presents the result to the outside world.Each neuron in a layer is only connected to neurons of the next layer in the direction of theoutput layer. Therefore information only moves forward and the layers do not form a cycle 184.108.40.206 Single-layer perceptronThe simplest form of a FNN is a single-layer perceptron (SLP). It consists of only one trainableweight layer and an input layer. There are no hidden layers. SLPs can only be used to learnlinearly separable patterns.3.2.2 Multilayer perceptronA multilayer perceptron (MLP) as shown in Figure 2 consists of an input layer, one ore morehidden layers and an output layer. MLPs are also knows as deep neural networks, because oftheir hidden layers. In MLPs each layer usually contains a bias node which always has an outputof 1 and is connected to each node of the following layer. It is used to provide every node witha trainable value. Unlike the SLP the MLP can learn non-linear functions. Learning algorithmslike backpropagation are used to train MLPs.2Figure 2: Multilayer perceptron with a hidden layer and a bias nodes.3.3 Recurrent Neural NetworksIn a reccurent neural network (RNN) neurons can inuence themselves in some way. RNNs donot always have specicly dened input or output neurons. We can dierentiate between directrecurrences, indirect recurrences and lateral reurrences.In networks with direct recurrences the neurons are connected to themselves and can there-fore strengthen or suppress themselves. Indirect recurrence networks allow connections to theinput layer, which means that neurons inuence themselves indirectly by refeeding their outputto the network. With lateral recurrence the neurons are connected within their own layer andthey usually suppress neighbouring neurons and strengthen themselves, so that only one neuronof a layer will re 6.3.4 Convolutional Neural NetworksConvolutional neural networks (CNN) are a form of deep FNNs or otherwise known as MLPs.Their development is what started this recent new interest in ANNs. When researchers wonmultiple contests with their usage between 2009 and 2012, like the ImageNet competiton 7,beating other algorithms by a signicant margin companies started to implement them into theirsystems.Typical FNNs don’t care about any spatical structure. So in a 2D picture every pixel is treatedindependently without taking into account that some pixels are located in a neighbouring area.CNNs on the other hand use that information by learning local features. This reduces theamount of necessary pre-processing that is required signicantly.A CNN consists of convolutional layers and pooling layers. Convolutional layers are the mainpart of CNNs they consist of a set of independent lters. They apply a convolution operationon the input they receive. The output of a set of lters is called a feature map. Any neuron ina convoultional layer only processes data of a small area unlike conventional FNNs where oneneuron is connected to every neuron in the next layer. Pooling layers reduce the size of the inputthey receive usually by taking the mean or the maximum value within each area 12.33.5 Hopeld Neural NetworksThe inspiration for the Hopeld neural network (HNN) were the particles in a magnetic eldwhere every particle would inuence the particles around it to the same degree. Therefore allneurons in HNNs inuence each other symmetrically.The HNN can be seen as a network with associative memory. Meaning that a HNN can betrained to memorize certain states and given an input will converge towards one of those states.The neurons in HNNs have binary outputs and the weights connecting two neurons are sym-metrical. Unlike recurrent networks neurons in HNNs have no connections to themselves.HNNs are usually trained with unsupervised training methods like Hebbian learning 6.4 LearningAs previously mentioned ANNs have the capability to learn and solve problems. This fact iswhat makes ANNs so interesting. But it is important to choose the right learning method forevery particular problem to achieve a satisfying result. There are several ways to train an ANNand those methods are further divided into three general approaches that are called learningparadigms.4.1 Supervised learningIn supervised learning the ANN receives sets of inputs together with the desired outputs.Through activation of the Network with an input the output can be compared with the cor-rect solutions and adjustments can be made to the weights. After completing the training thenetwork should be able to arrive at correct solutions for unknown, similar input patterns 220.127.116.11 Gradient descentThe gradient descent is the most basic learning algorithm. It is a supervised training methodthat is essentially an algorithm that minimizes a function. Starting at a initial value (weight)the algorithim nds the parameters that minimize the function by iteratively going from thestarting point in the negative direction of the function gradient.While the gradient descent algorithm can be used to solve a lot of problems it does have someshortcomings. Since functions are often complex the gradient descent can land on a local mini-mum and get stuck and there is no solution to this problem. We do not know if we have foundan optimal minimum so if the found minimum is acceptable the training is considered successful.Flat areas in the function mean that the gradient is small and that leads to very small stepsbeing taken by the algorithm which can slow down training. At the same time large gradientscan cause large steps that could lead to a good minimum being missed 18.104.22.168 BackpropagationBackpropagation of errors or backpropagation for short is based on the gradient descent learningalgorithm. Therefore all of the advantages and disadvantes of the gradient descent algorithmapply to backpropagation as well. The backpropagation algorithm was a considerable break-through for neural network research, as previous learning algorithms were slow and in partineective.Backpropagation is learning from mistakes and works by initially selecting random weights.After that input is fed to the network and the output is compared to the desired output. Any4errors are passed back to the previous layer and the weights are adjusted accordingly. Thisprocess is repeated until the error is under an acceptable threshold 4.4.2 Unsupervised learningUnsupervised learning is the learning method that is closest to how real neural networks wouldlearn. The sets used for training consist only of input patterns and the network tries to ndsimilarities and patterns that aren’t previously labeled 22.214.171.124 Hebbian learningHebbian learning is an unsupervised learning method created by Donald O. Hebb 5. It is oftensummarized as “neurons wire together if they re together” 8 and often forms the basis forother learning methods.Hebbian learning is the simplest rule for adjusting weights between neurons. If two neuronsactivate together the weight increases and the connection becomes stronger, but if two neuronsactivate seperately the weight is reduced and the connection becomes weaker 6.4.3 Reinforcement learningIn reinforcement learning the network receives feedback, that is either negative or positive. Itis comparable to a reward and a punishment system. Unlike supervised learning there are notraining sets and the network interacts with an environment via an agent instead. This processis derived from cognitive science and psychology. The goal of reinforcement learning is to solvea problem like avoiding an obstacle. The network interacts with the environment and receivesa feedback on how well it is doing, but it does not get any guidance on how to actually solvethe problem. Therefore reinforcement learning is a trial and error method and it is a very slow.The training is considered done when the sum of the rewards is as how as possibe over a longperiod of time.A reinforcement learning system is comprised of three parts. A policy, a reward function anda value function. Additionally it can have a model of the environment. The policy determinesthe action that the agent will take in a certain situation. The reward function determines thereward the agent receives for an action it has taken. It does so by taking in a state and givingthe agent a reward based on the assessment of that state. If it is good it will return a highreward and a low reward or punishment otherwise. 6.4.4 Online and oine learningLearning can be split further into online and oine learning. In online learning adjustments willbe made according to the errors after every input. In oine learning the adjustments happenonly after a complete input set has been passed through and the errors have accumulated 6.5 Challenges for ANNs5.1 HardwareANNs require signicant amounts of computational processing power as well as large amouuntsof memory and storage. This has been an immense problem since the the concept was rstthought of and it was one of the key points in Minsky’s and Papert’s paper 10. But due toconstantly improving hardware and the introduction of GPU usage for ANNs training timeshave been cut down signicantly 2.56 Applications of ANNsANNs are often used for prediction and forecasting due to their ability to analyze large amountsof data. Some examples of this would be sales forecasting, medical diagnosis, nancial marketpredictions and risk management. They are also used for pattern recognition 11 and sequencerecognition.6.1 ANNs in AstronomyVery recently NASA in cooperation with Google announced the discovery of another planetorbiting the star Kepler-90. This was done with the help of deep learning and CNNs thathelped them sift through the enourmous amounts of data that they have collected so far 12.6.2 ANNs in GamesANNs have been used to automate game playing. The most prominent example is Google DeepMind’s AlphaGo that uses deep neural networks and has fairly recently been used to beat theworld Go champion.6.3 ANNs in MedicineANNs are currently widely researched in various areas in medicine e.g. radiology, urology,cardiology, oncology etc. They have been used to detect dierent forms of cancer like breastcancer and lung cancer and often achieve accuracy of over ninety percent 3.7 ConclusionANNs are an old concept but they have always been relevant and now they are more importantthan they have ever been. Their ability to learn and solve tasks that we don’t have a solutionto makes them incredibly useful and is the reason why they are used in a very wide range of elds.There are several dierent types of networks and learning methods that have to be chosencarefully, to be the best pick for each specic problem, but if done correctly they become a verypowerful problem solving tool. There are still limitations to what an ANN can do but withconstantly improving computational processing power and great interest from researchers thepotential of ANNs seems nearly limitless.