I think GeForce TITAN is great and is widely used in Machine Learning (ML). In ML, single precision is enough in most of cases.
More detail on the performance of the GTX line (currently GeForce 10) can be found in Wikipedia, here.
Other sources around the web support this claim. Here is a quote from doc-ok in 2013 (permalink).
For comparison, an “entry-level” $700 Quadro 4000 is significantly slower than a $530 high-end GeForce GTX 680, at least according to my measurements using several Vrui applications, and the closest performance-equivalent to a GeForce GTX 680 I could find was a Quadro 6000 for a whopping $3660.
Specific to ML, including deep learning, there is a Kaggle forum discussion dedicated to this subject (Dec 2014, permalink), which goes over comparisons between the Quadro, GeForce, and Tesla series:
Quadro GPUs aren't for scientific computation, Tesla GPUs are. Quadro
cards are designed for accelerating CAD, so they won't help you to
train neural nets. They can probably be used for that purpose just
fine, but it's a waste of money.
Tesla cards are for scientific computation, but they tend to be pretty
expensive. The good news is that many of the features offered by Tesla
cards over GeForce cards are not necessary to train neural networks.
For example, Tesla cards usually have ECC memory, which is nice to
have but not a requirement. They also have much better support for
double precision computations, but single precision is plenty for
neural network training, and they perform about the same as GeForce
cards for that.
One useful feature of Tesla cards is that they tend to have is a lot
more RAM than comparable GeForce cards. More RAM is always welcome if
you're planning to train bigger models (or use RAM-intensive
computations like FFT-based convolutions).
If you're choosing between Quadro and GeForce, definitely pick
GeForce. If you're choosing between Tesla and GeForce, pick GeForce,
unless you have a lot of money and could really use the extra RAM.
NOTE: Be careful what platform you are working on and what the default precision is in it. For example, here in the CUDA forums (August 2016), one developer owns two Titan X's (GeForce series) and doesn't see a performance gain in any of their R or Python scripts. This is diagnosed as a result of R being defaulted to double precision, and has a worse performance on new GPU than their CPU (a Xeon processor). Tesla GPUs are cited as the best performance for double precision. In this case, converting all numbers to float32 increases performance from 12.437s with nvBLAS 0.324s with gmatrix+float32s on one TITAN X (see first benchmark). Quoting from this forum discussion:
Double precision performance of Titan X is pretty low.