Question: Efficient Set Membership Test

Is there any useful preprocessing -- or a better data structure -- for efficient set membership testing?

I have a list X of 1.5 million hashes (30-digit compressions of arrays) and want to find an element which belongs to both this set and another set. The other set Y has 270 billion elements so I will generate just a fraction of these (a sample size of a few hundred million should be enough to give a collision). Will membership testing be faster if I first sort X? Or is the sort operation too expensive to see any visible benefit?

Please Wait...