Loading

- Backprop and systolic arrays
- TensorFlow meets PyTorch with new Eager mode
- Optimizing deeper networks with KFAC in PyTorch.
- Queues in TensorFlow
- ICLR 2015
- Stochastic Gradient Methods 2014
- Deep Learning Internship at Google, Summer 2014
- Summer Intern opening
- The Average Font
- Interesting papers coming up at NIPS'11
- Shapecatcher
- Google1000 dataset
- b-matching as improvement of kNN
- Google Internship in Vision/ML
- Don't test for exact equality of floating point numbers
- notMNIST dataset
- Making self-contained Unix programs with CDE
- Google+ ML people
- Embracing non-determinism
- Machine Learning opportunities at Google
- Neural Networks making a come-back?
- Another ML blog
- Going to Google
- Linear Programming for Maximum Independent Set
- Perils of floating point arithmetic

https://medium.com/@yaroslavvb/backprop-and-systolic-arrays-24e925d2050

medium post

Medium post. (Im getting too much comment spam on Blogger, so Ill probably use medium/something else from now on, and just link here)

I did an introduction to Queues talk at TensorFlow meetup in SF yesterday. Here are the slides and the notebook: https://github.com/yaroslavvb/stuff/tree/master/queues_talk

Some ICLR posters that caught my eye: [larger image] Very simple to implement idea that gives impressive results. They force two groups of units to be uncorrelated by penalizing their cross covariance. When the first group is also forced to model classes, the second group automatically models the "style". The problem if separating out "style" has been studied for a while, see Tenenbaums "

Last week I attended Stochastic Gradient Methods workshop held at UCLAs IPAM . Surprisingly, theres still quite a bit of activity and unsolved questions around what is essentially, minimizing a quadratic function. In 2009 Strohmer and Vershinin rediscovered an algorithm used for solving linear systems of equations from 1970 -- Kaczmarz method, and showed that this algorithm is a form of

We have a couple of internship openings for someone to train deep neural nets find extract interesting things in StreetView imagery. The ideal person would come and push the envelope of whats possible with large amount of training data (billions of labeled image examples for some tasks), and large amount of computation power data (essentially unlimited when you parallelize). If you are

We are looking for a summer intern to apply Deep Learning techniques to the problem of reading text in the wild. More details here

I came across this post post where the author created a font by averaging together all fonts on his machine. I thought it would be cool to do the same for all fonts on the internet -- heres the average of about 375k distinct fonts Its interesting that shapes are clearly seen even though fonts on the web are quite noisy, heres a random sample of things that make up the A above

Theres a number of accepted papers whose camera-ready versions have been posted already. Here are the ones I found interesting. Ill give further update on these after the conference. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, P. Krähenbühl, V. Koltun Fast and Accurate k-means For Large Datasets, M. Shindler, A. Wong, A. Meyerson Hashing Algorithms for

Heres a cool tool I stumbled across reading John Cooks blog -- Shape Catcher looks up Unicode value from a drawing of a character. Apparently it uses Shape Context features. This motivated me to put together another dataset, unlike notMNIST this focuses on the tail end of Unicode, this is 370k bitmaps representing 29k Unicode values, grouped by Unicode Unicode 370k

This is a dataset of scans of 1000 public domain books that was released to the public at ICDAR 2007. At the time there was no public serving infrastructure, so few people actually got the 120GB dataset. It has since been hosted on Google Cloud Storage and made available for public download http://commondatastorage.googleapis.com/books/icdar2007/README.txt http://

Below is an illustration of b-matching from (Huang,Jebara AISTATS 2007) paper. You start with a weighted graph and the goal is to connect each v to k us to minimize total edge cost. If vs represent labelled datapoints, us unlabeled and weights correspond to distances, this works as a robust version of kNN classifier (k=2 in the picture) because it prevents any datapoint from exhibiting too

My group has intern openings for winter and summer. Winter may be too late (but if you really want winter, ping me and Ill find out feasibility). We use OCR for Google Books, frames from YouTube videos, spam images, unreadable PDFs encountered by the crawler, images from Googles StreetView cameras, Android and few other areas. Recognizing individual character candidates is a key step in OCR

A discussion came up on Guido von Rossums Google Plus post. It comes down to the fact that 2.1 is not exactly represented as a floating point number. Internally its 2.0999999999999996, and this causes unexpected behavior. These kinds of issues often come up. The confusion is caused by treating floating point numbers as exact numbers, and expecting calculations with them to produce results

Ive taken some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts. Here are some examples of letter "A" Judging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case -- logistic regression on top of stacked auto-encoder with fine-tuning gets

In the old days you could statically link your program and run it on another Unix station without worrying about dependencies. Unfortunately static linking no longer works, so you need to make sure that your target platform has the right libraries. For instance, in order to get Matlab compiled code running on a server, you have to copy over libraries and set environment variables as specified

Google+ seems to have a fair number of Machine Learning people, I was able to track down 50 people Ive met at conferences by starting at Andrew McCallums circles. If you add me on Google Circles Ill assume you came from this blog and add you to my "Machine Learning" circle

Computers are supposed to be deterministic. This is often the case for single processor machines. However, as you scale up, guaranteeing determinism becomes increasingly expensive.Even on single processor machines you are facing non-determinism on semi-regular basis. Here are some examples Bugs + poor OS memory control that allows programs to read uninitialized memory. A recent example for me was

Google is hiring and there are lots of opportunities to do Machine Learning-related work here. Kevin Murphy is applying Bayesian methods to video recommendation, Andrew Ng is working on a neural network that can run on millions of cores, and thats just the tip of the iceberg that Ive discovered working here for last 3 months.There is machine learning work in both "researcher" and "engineer"

Five years ago I ran some queries on Google Scholar to see trends on the number of papers that mention particular phrase. The number of hits for each year was divided by the number of hits for "machine learning". Back then it looked like NNs started gaining in popularity with invention of back-propagation in 1980s, peaked in 1993 and went downhill from there.Since then, theres been several

I just noticed that Justin Domke has a blog -- Hes one of the strongest researchers in the field of graphical models. I first came across his dissertation when looking for a way to improve loopy-Belief Propagation based training. His thesis gives one such idea -- instead of maximizing the fit of an intractable model, and using BP as intermediate step, maximize the fit of BP marginals directly.

Ive accepted an offer from Google and will be joining their Tesseract team next week.I first got interested in OCR when I faced a project at my previous job involving OCR of outdoor scenes and found it to be a very complex task, yet highly rewarding because its easy to make incremental progress and see your learners working.Current state-of-the-art OCR tools are not at human level of reading,

Maximum independent set, or "maximum stable" set is one of classical NP-complete problems described in Richard Karps 1972 paper "Reducibility Among Combinatorial Problems". Other NP-complete problems often have a simple reduction to it, for instance, p.3 of Tony Jebaras "MAP Estimation, Message Passing, and Perfect Graphs" shows how MAP inference in an arbitrary MRF reduces to Maximum Weight

A recent discussion on stackoverflow brought up the issue of results of floating point arithmetic being non-reproducibleA reader asked what one could do to guarantee that result of floating point computation is always the same, and Daniel Lichtblau, a veteran developer at the kernel group of WRI replied that "it is impossible with current hardware and software"One problem is that IEEE 754