I am looking to return all the substrings of a string that match a particular pattern (regexp). This is a very common text processing operation.
StringTools[RegMatch] returns boolean about whether the given pattern exists somewhere in the string. That's a start, but it is not efficient to hunt around for where the match is in order to extract the characters.
StringTools[Search] and [SearchAll] can return position information, but they work on exact matches of the pattern space, not on regular expressions. I could pre-generate all the possible matching sequences of up to some length and search for those exactly, but that is inconvenient at best and pretty much useless if there is a .* or .+ in the regular expression (as the space of potential matches grows exponentially.)
StringTools[RegSub] can be used to find one match by using something like ".*(REALPATTERN).*" = "\1" but since this does not extract any position information, one would at the very least then have to StringTools[Search] of the exact match to determine the position, then take a substring of the original string, and continue searching. And if you are going to do that, you might as well just use StringTools[RegMatch] with an optional third argument of a symbol to return the matched characters, instead of fooliing around with deleting everything else using RegSub.
Have I overlooked a possibility?
Example of use:
I would like to search "variable %1 needs to be of the form %3+2.9 in order to use routine %2 to find the fizturl", extracting the "%1", "%3", "%2". This would be regular expression "%[0-9]+". Notice that there might be other digits in the string that I do not want to extract, so I cannot just search for digits. I also cannot just remove all characters "[^0-9%]" as that would leave digits and would, for example, merge together the %3+2.9 into %329 leaving post-processing to extract the true pattern matches.
For what I am doing at the moment, I do not need to know the positions those occured at, just the matches.