I guess that I simply view this behavior differently. All of the cases that are shown here appear to be "singular" in the sense that the are 0 derivatives of the function on the set of interest. This is known to cause trouble, even for the fast convergence of Newton's method.
If you plot acer's example as the graphs of the functions f[a](x,y) = (x-y)^2+a intersecting with the xy-plane in three dimensions, this becomes clearer. (The level sets, lines of intersection, disappear on one side of a=0.) The difference between this and g[a](x,y)=x-y is stark. (The level sets, lines of intersection, do not disappear near a=0 for g[a](x,y)=0.)