Search for pwned.hibp@gmail.com and (if this was a real account) pwnedhibp@gmail.com and you'd get different results. Yet for all intents and purposes it's all your data that had been leaked as they are the same account.
Would it be possible to define some canonicalizers for very popular email providers that remove superfluous things such as periods from the address both in the breach corpus and when searching?
I realize that there could be some user that likes to leverage things like
pwned.hibp+salesConference@ to track down which party exposed some breach, but I feel like that use case is not as broadly useful as giving people a full breach list.
Lead with an example:
Search for pwned.hibp@gmail.com and (if this was a real account) pwnedhibp@gmail.com and you'd get different results. Yet for all intents and purposes it's all your data that had been leaked as they are the same account.
Would it be possible to define some canonicalizers for very popular email providers that remove superfluous things such as periods from the address both in the breach corpus and when searching?
I realize that there could be some user that likes to leverage things like
pwned.hibp+salesConference@ to track down which party exposed some breach, but I feel like that use case is not as broadly useful as giving people a full breach list.
References:
- https://gmail.googleblog.com/2008/03/2-hidden-ways-to-get-more-from-your.html