Does the training algorithm recognise that I don't care about transparent pixels? Or will it perform better if I superimpose the characters over different backgrounds?
The more "noise" you give your images on the parts of the training data then the more robust it will be, but yes the longer it will take to train. This is however where your negative sampels will come into action. If you have as many negative training samples as possible with as many ranges as possible then you will create more robust detectors. THat being said, if you have a particular use case in mind then I would suggest skewing your training sets slightly to match that, it will be less robust but much better in your application.
Should I include images where each character is shown with different prefixes and suffixes, or should I just treat each character individually?
If you want to detect individual letters, then train individually. If you train it to detect "ABC" and you only want "A" then it is going to start getting mixed messages. Simply train each letter "A", "B" etc and then your detector should be able to pick out each individual letter in larger images.
Should I include images where the character is scaled up and down? I gather the algorithm pretty much ignores size, and scales everything down for efficiency anyway?
I don't believe this is correct. AFAIK the HAAR algorithm cannot scale down a trained image. So if you train all your images on 50x50 letters but the letters in your images are 25x25 then you won't detect them. If you train and detect the other way round however you will get results. Start small, let the algorithm change the size (up) for you.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…