@Christopher2222 I don't expect Laurent to tip his hand :) High end GPUs actually have about 512 cores (shaders), organized into blocks of 16-48 cores (per streaming multiprocessor). All cores in a multiprocessor should be executing the same code. The multiprocessors interleave up to 32 threads, so 16384 simultaneous threads are about the maximum today.
The GPUs in notebooks and most desktops are considerably less powerful at the moment. For example, my notebook has a GeForce 320M (MCP89) with 48 cores. It does about 120 GFLOPs single precision, whereas a Core i7 would do around 90. The newer mobile GPUs are twice as fast, and you can see the performance is starting to pull away from CPUs.
The catch is that you have to use single precision floats (24-bit mantissa) to get good performance. There's a strong case for moving floating point algorithms to GPUs now and using iterative methods to gain precision. For general computer algebra there is a slightly longer window because CPUs are doing 64-bit integer arithmetic.
That's only for the algorithms where GPUs make sense however. Basically, dense algorithms. Dense polynomials and linear algebra, graphs, simulations, etc. For anything sparse or structured GPUs are hard to use, and you want threads on a multicore cpu. The multithreading going into Maple now is focused on those cases because it's a good investment now and it won't be obsolete later, although I expect GPUs to push the applicability of dense algorithms out very far. I'm sure in 10 years we'll be shocked by what we can compute. We'll be there.