Off the convex path

↧

Image may be NSFW.
Clik here to view.

How to Escape Saddle Points Efficiently

July 19, 2017, 3:00 am

A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see...

View Article

Unsupervised learning, one notion or many?

June 26, 2017, 9:00 pm

Unsupervised learning, as the name suggests, is the science of learning from unlabeled data. A look at the wikipedia page shows that this term has many interpretations:(Task A)Learning a distribution...

View Article

Image may be NSFW.
Clik here to view.

Do GANs actually do distribution learning?

July 6, 2017, 11:00 pm

This post is about our new paper, which presents empirical evidence that current GANs (Generative Adversarial Nets) are quite far from learning the target distribution. Previous posts had introduced...

View Article

Image may be NSFW.
Clik here to view.

Generalization Theory and Deep Nets, An introduction

December 8, 2017, 10:00 am

Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become interested in the generalization mystery: why do trained deep nets perform well on...

View Article

Image may be NSFW.
Clik here to view.

Proving generalization of deep nets via compression

February 17, 2018, 8:00 am

This post is about my new paper with Rong Ge, Behnam Neyshabur, and Yi Zhang which offers some new perspective into the generalization mystery for deep nets discussed in my earlier post. The new paper...

View Article

Image may be NSFW.
Clik here to view.

Can increasing depth serve to accelerate optimization?

March 2, 2018, 5:00 am

“How does depth help?” is a fundamental question in the theory of deep learning. Conventional wisdom, backed by theoretical studies (e.g. Eldan & Shamir 2016; Raghu et al. 2017; Lee et al. 2017;...

View Article

Image may be NSFW.
Clik here to view.

Limitations of Encoder-Decoder GAN architectures

March 12, 2018, 3:00 am

This is yet another post about Generative Adversarial Nets (GANs), and based upon our new ICLR’18 paper with Yi Zhang. A quick recap of the story so far. GANs are an unsupervised method in deep...

View Article

Image may be NSFW.
Clik here to view.

Deep-learning-free Text and Sentence Embedding, Part 1

June 17, 2018, 3:00 am

Word embeddings (see my old post1 and post2) capture the idea that one can express “meaning” of words using a vector, so that the cosine of the angle between the vectors captures semantic similarity....

View Article

Image may be NSFW.
Clik here to view.

Deep-learning-free Text and Sentence Embedding, Part 2

June 25, 2018, 3:00 am

This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings. The previous post described the the SIF embedding, which uses a simple...

View Article

Image may be NSFW.
Clik here to view.

When Recurrent Models Don't Need to be Recurrent

July 27, 2018, 1:00 am

In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks were once the tool of choice, now models like the...

View Article

Image may be NSFW.
Clik here to view.

Simple and efficient semantic embeddings for rare words, n-grams, and...

September 18, 2018, 2:00 am

Distributional methods for capturing meaning, such as word embeddings, often require observing many examples of words in context. But most humans can infer a reasonable meaning from very few or even a...

View Article

Image may be NSFW.
Clik here to view.

Understanding optimization in deep learning by analyzing trajectories of...

November 7, 2018, 4:00 am

Neural network optimization is fundamentally non-convex, and yet simple gradient-based algorithms seem to consistently solve such problems. This phenomenon is one of the central pillars of deep...

View Article

Image may be NSFW.
Clik here to view.

The search for biologically plausible neural computation: A...

December 3, 2018, 3:30 pm

This is the second post in a series reviewing recent progress in designing artificial neural networks (NNs) that resemble natural NNs not just superficially, but on a deeper, algorithmic level. In...

View Article

Image may be NSFW.
Clik here to view.

Contrastive Unsupervised Learning of Semantic Representations: A...

March 19, 2019, 12:00 pm

Semantic representations (aka semantic embeddings) of complicated data types (e.g. images, text, video) have become central in machine learning, and also crop up in machine translation, language...

View Article

Is Optimization a Sufficient Language for Understanding Deep Learning?

June 3, 2019, 3:00 am

In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task at hand, and then optimizing this function using some variant of...

View Article

Image may be NSFW.
Clik here to view.

Landscape Connectivity of Low Cost Solutions for Multilayer Nets

June 16, 2019, 3:00 pm

A big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions —those with training cost almost zero— even starting from a random...

View Article

Image may be NSFW.
Clik here to view.

Understanding implicit regularization in deep learning by analyzing...

July 10, 2019, 10:00 am

Sanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value of the training objective does not reliably capture...

View Article

Image may be NSFW.
Clik here to view.

Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)

October 3, 2019, 3:00 am

(Crossposted at CMU ML.)Traditional wisdom in machine learning holds that there is a careful trade-off between training error and generalization gap. There is a “sweet spot” for the model complexity...

View Article

Image may be NSFW.
Clik here to view.

Exponential Learning Rate Schedules for Deep Learning (Part 1)

April 24, 2020, 3:00 am

This blog post concerns our ICLR20 paper on a surprising discovery about learning rate (LR), the most basic hyperparameter in deep learning.As illustrated in many online blogs, setting LR too small...

View Article

Image may be NSFW.
Clik here to view.

An equilibrium in nonconvex-nonconcave min-max optimization

June 24, 2020, 3:00 am

While there has been incredible progress in convex and nonconvex minimization, a multitude of problems in ML today are in need of efficient algorithms to solve min-max optimization problems. Unlike...

View Article