I would be perfectly happy if this were only implemented for new breaches and if it didn't tell me the exact tagged e-mail address.
Under these conditions you would only need to canonicalize email addresses as they come in and the rest of the code would work as-is. Convert to lowercase, strip generic subfields, add a special case for gmail dots and yahoo hyphens, store it and that would be it.
I would be perfectly happy if this were only implemented for new breaches and if it didn't tell me the exact tagged e-mail address.
Under these conditions you would only need to canonicalize email addresses as they come in and the rest of the code would work as-is. Convert to lowercase, strip generic subfields, add a special case for gmail dots and yahoo hyphens, store it and that would be it.