Squashing function:
Given some data , our goal is to minimize the loss function :
The issue is that there is no closed-form solution for , but fortunately, is convex in . Thus, we can use local search to find the minimum of this function, via gradient descent.