Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as email@example.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was firstname.lastname@example.org that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.
Rick Aspden (insanityinside) commented
Since the introduction on limits on email addresses to search on domains, as I use a variety of +tags on my mail to track where I've been pwned from (using a Google Workspace account, not that it hugely matters of the email host), the system thinks I have 14 email addresses at my domain that have been pwned, when in reality it's maybe 3 (one of which is an alias for myself!). I totally understand getting people to pay up if they're using a domain for commercial purposes, but when I'm specifically "creating" these address as potential honeypots, it's not too helpful!
Al. C. commented
I mean: if you send an email to your gmail.com address that is "email@example.com" or you send an email to "firstname.lastname@example.org" or you send to "email@example.com" the recipient is absolutely the same. Using haveibeenpwned i noticed that writing the mail in the 2 different form, as exemplified before, you have different results. For example: the first mail can be reported as "pwned" and the second and the third not, it might be confusing for users.
I hope my example is clear, i strongely suggest to improve the "search" field for gmail.com addresses, in order to not apply as discrimination the ".".
Ben Blank commented
As someone who has habitually used +addresses since creating my Gmail account, this would be a very desirable feature for me (as would Gmail's "optional dot", thanks to poorly-written email scraping bots). Using a suffixed address for each account I create (e.g. "[me]+firstname.lastname@example.org" for this site) has allowed me to track bad actors who share my email address without my consent.
I understand that there are some technical challenges here, as both "+" and "." aren't indicated to have special behavior in any standard I'm aware of. Any email host which assigns special meaning to them is therefore an exception. This could potentially be managed by maintaining a list of hosts which are known to assign special meaning to characters, though that would be an additional (if hopefully small) maintenance burden for HIBP.
Alternatively, the plus sign specifically could be assumed to always have the special meaning of introducing an "irrelevant" postfix. While it's certainly not true of *all* email hosts, there are even today a large number of site which disallow that character appearing in an email address at all, either because they incorrectly believe that it's an invalid character or simply due to encoding issues. Add to that the fact that Gmail's use of the character has become popular among other hosts as well and it may be reasonable to simply assume that "+" only appears in email addresses as a special character.
Ben Andrews commented
I use - syntax instead of +, but same basic conecpt. The challenge is that now that Have I Been Pwned's new monetization model is based on accounts-per-domain, this strategy has run up my number of pwned accounts beyond the free tier, even though it's a domain used only by a single person; each data breach is a separate account, and my domain has 19 breached accounts already as a result. Soon, I'll be in the $15/month enterprise tier, which I don't think was the intention.
It would be great if addresses with + or - in them were collapsed for billing purposes. As it stands, I've lost access to data about my account unless I sign up for one of the commercial tiers.
Kay CeeWot commented
I have used +aliases for 6-8 years, and have seen only two items from an identifiable source return to me from a nefarious party.
Kaley Schenk commented
I one of the main things I use plus aliasing for is to identify potential data breaches or sketchy information sharing practices based on what email address is receiving scam/phishing emails. This would be an excellent extension of that function.
I currently do not use plus aliasing because it is not supported by HIBP. If this was supported I would start to use + when signing up for online services. I believe the percentage of people using this feature in their usernames would increase if it was supported by HIBP.
Lee Brotherston commented
Just to add my 2c....
Although the numbers are low in the samples that you look at, I would suggest that the comment below regarding these being likely to be people with security responsibilities is likely to hold true.
I think that this is a fairly ubiquitous feature of mail providers these days (with the notable exception, I think, of the default Exchange setup) even if it's not adopted by users that often.
I'm sure not sure what the underlying stack is to gauge how much effort would be involved (e.g. a change to an SQL query vs needing to retrospectively update a bunch of metadata that's generated at import time, etc) but I would suggest that there are a couple of implementation routes for this:
- Add this to the search facility, that email@example.com will search for both the provided address as well as firstname.lastname@example.org
- Leave search as-is, but setup +stuff in the notifications much like whole domain notifications. This way, presumably, no need to retrospectively update data, rather just handle this during the import of new data.
Michael H commented
The sooner this is implemented, the better.
Mike Williams commented
While the absolute number of people affected is small, consider that they are more likely to be people with security responsibilities elsewhere and as such are valuable vectors.
I did an inventory of my gmail "+" and "." variants. I'm currently using 67 variants of my email address.
Joe Kirwin commented
Lead with an example:
Search for email@example.com and (if this was a real account) firstname.lastname@example.org and you'd get different results. Yet for all intents and purposes it's all your data that had been leaked as they are the same account.
Would it be possible to define some canonicalizers for very popular email providers that remove superfluous things such as periods from the address both in the breach corpus and when searching?
I realize that there could be some user that likes to leverage things like
pwned.hibp+salesConference@ to track down which party exposed some breach, but I feel like that use case is not as broadly useful as giving people a full breach list.
+1 for a needed feature. Though using "+..." and/or additional dot(s) is a small (but growing) percentage of email usage, breaches involving those accounts may be a disproportionately higher risk to security-savvy folks. They are the most likely to use these techniques but may also be juicier targets.
CB has an excellent solution for this below. Cleaning up the input addresses on entry into the HIBP database before hashing, would handle the most variations. With the addition of recording which +tags were removed, and positions of periods, the data would be comprehensive with very little compromise.
It may make sense to hash both versions. E.g. "HIBP found an exact match" and "the following variations were found in breach database" are both useful.
I would also expand this request to simply being able to search @domain.com
You can use the feature to generate emails on the fly without them having an associated account, similar to the + syntax.
Unfortunately, that would also mean searching potentially hundreds of addresses one at a time to see if they've been compromised.
If this idea gets implemented, please make it work only for "validated" emails. I don't want people to be able to type my email and see every variation of it. Hopefully we get this feature in the future =) Thank you Troy for all your hard work.
It's interesting that this is the most requested feature by far, but the FAQ makes it sound like it's unimportant. If we're all using this website, it's a given that we're more security and privacy aware than others and we will use all tools available to us, such as the plus tag and using different spacing (such as: my-email, my.email, myemail, m.y.e.mail).
It would also be nice to see an example of the international phone number for those of us that are not familiar with that format
I am not sure if this is a duplicate, but here goes... It would be nice I could provide a base address (like email@example.com) and HIBP reported hits for:
1) any + variant of the base address (first-last+aNYstRing@gmail.com)
2) any valid dot format (i.e., firstname.lastname@example.org and variants)
3) can handle user supplied dots in the base name without disabling #2 (i.e.,email@example.com)