Given sufficient constraints regarding the pattern (range of number of periods in the image? constant symmetry group? repetitive at all?), you may be able to find the pattern's unit cell by correlation.
Then think of stacking the tiles (new axis). Calculate the median for each color channel along the new axis. Take the 2D image of median values as template and spread it back over the original image.
Calculate the differences. Large differences indicate Lego.
Possible refinement: Remove the outliers (Lego), estimate and finally remove a trend in the differences due to variation of lighting/vignetting.
EDIT:
It works quite well even with only two tiles: I am able to look at two tiles at the same time by controlling the convergence angle of my eyes (without loss of focus) so that my visual cortex does the correlation and kind of error detection (non-matching parts). The Lego pieces appear to pop out.
EDIT 2:
I tried the same with your second image (edges). The correlation works well (lock-in to the right convergence angle) and clusters of differences are kind of marked, but without the color and low-frequency information, no objects pop out.
Thus, edge detection shall not be the first step, except perhaps to increase the precision in coping with perspective and distortion. The concurrent estimation of pattern period and distortion field is a problem solved in stitching. To do it concurrently may not be necessary for your problem (fixed camera position, fixed setting of focus and zoom).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…