Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as troyhunt+foo@hotmail.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was troyhunt+bar@hotmail.com that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.
-
wrengr commented
I use plus addressing all the time. Even though plus addresses only show up rarely in the breaches, imo they're still important to support queries for (because misses are just as important as hits). It seems like an easy enough thing to implement, given that RFC5233 specifies the behavior and that hibp already has infrastructure for handling multi-queries for domain searches.
Since it was mentioned in the edit, my wife doesn't use plus addressing but she regularly uses periods to achieve a similar effect. This one's a lot trickier to rule on, since different email domains handle it differently. But given the popularity of gmail, I think it's worth supporting, it's just not clear how high a priority it should be nor how much work it'd take to do well.
-
jason commented
I own my own domain and run my own mailserver. I hand out unique IP addresses for myself to websites so I limit the damage due to email address leaks. The change to per address pricing makes it prohibitively expensive for me as an individual to check my leaked emails. It would be great to find a way for people in my situation to use this service.
-
Jonathan Dowland commented
Sadly my pre-existing use of + addresses (100s, one per interaction) and HIBP’s new pricing structure has locked me out of HIBP’s free tier. I would be prepared to pay for HIBP but treating plus addresses as distinct users puts me in the top pricing tier which is not practical.
-
Joe Atzberger commented
Refer to RFC 5233 section 4: "Subaddress comparisons".
The HIBP implementation for this would be plain. Instead of the search index being the literal address value, it should be the normalized *deliverable* address. E.G., instead of "person+foobar@example.com", the index would contain "person@example.com".
Search input should be normalized the same way and therefore find all subaddress variations.
There is some variation in delimiter between providers, but pragmatically, there are only a few major predominant styles. At a minimum, the UI could prompt "X+Y@z.co" looks like a subaddress, do you want to include "X@z.co" in your search?
-
Scott commented
I want to raise one concern regarding the lack of this feature. If your email (gmail) account itself is breached, a malicious actor can use ANY alias they can think of to send and validate email access. If combined with quick deletion, its very probably the user will never even know (ever) of its existence.
That alone should warrant this feature.
That said, We could also ask Google to provide clear auditing on all email aliases ever used/seen. But I doubt they would do that. It could easily become a "spam" problem inside their gmail DB.
Just like they offer the last 10 IP/login accesses, they could include the last 7 days of email aliases used in (inbound) emails. There are a lot of emails, and so making this a time window list ensures a user has the opportunity to see entries before they disappear. Not sure Google is taking FR's for gmail, but maybe some folks could push for something like this?
-
Kirsten de Waard commented
I use aliasses all the time, but it is hard to check if my mail adresses have neen hacked, because it requires the entire mailadress ( >50 in my case) .
-
Rick Aspden (insanityinside) commented
Since the introduction on limits on email addresses to search on domains, as I use a variety of +tags on my mail to track where I've been pwned from (using a Google Workspace account, not that it hugely matters of the email host), the system thinks I have 14 email addresses at my domain that have been pwned, when in reality it's maybe 3 (one of which is an alias for myself!). I totally understand getting people to pay up if they're using a domain for commercial purposes, but when I'm specifically "creating" these address as potential honeypots, it's not too helpful!
-
Al. C. commented
I mean: if you send an email to your gmail.com address that is "john.doe@gmail.com" or you send an email to "johndoe@gmail.com" or you send to "j.o.h.n.doe@gmail.com" the recipient is absolutely the same. Using haveibeenpwned i noticed that writing the mail in the 2 different form, as exemplified before, you have different results. For example: the first mail can be reported as "pwned" and the second and the third not, it might be confusing for users.
I hope my example is clear, i strongely suggest to improve the "search" field for gmail.com addresses, in order to not apply as discrimination the ".".
-
Ben Blank commented
As someone who has habitually used +addresses since creating my Gmail account, this would be a very desirable feature for me (as would Gmail's "optional dot", thanks to poorly-written email scraping bots). Using a suffixed address for each account I create (e.g. "[me]+uservoice@gmail.com" for this site) has allowed me to track bad actors who share my email address without my consent.
I understand that there are some technical challenges here, as both "+" and "." aren't indicated to have special behavior in any standard I'm aware of. Any email host which assigns special meaning to them is therefore an exception. This could potentially be managed by maintaining a list of hosts which are known to assign special meaning to characters, though that would be an additional (if hopefully small) maintenance burden for HIBP.
Alternatively, the plus sign specifically could be assumed to always have the special meaning of introducing an "irrelevant" postfix. While it's certainly not true of *all* email hosts, there are even today a large number of site which disallow that character appearing in an email address at all, either because they incorrectly believe that it's an invalid character or simply due to encoding issues. Add to that the fact that Gmail's use of the character has become popular among other hosts as well and it may be reasonable to simply assume that "+" only appears in email addresses as a special character.
-
Ben Andrews commented
I use - syntax instead of +, but same basic conecpt. The challenge is that now that Have I Been Pwned's new monetization model is based on accounts-per-domain, this strategy has run up my number of pwned accounts beyond the free tier, even though it's a domain used only by a single person; each data breach is a separate account, and my domain has 19 breached accounts already as a result. Soon, I'll be in the $15/month enterprise tier, which I don't think was the intention.
It would be great if addresses with + or - in them were collapsed for billing purposes. As it stands, I've lost access to data about my account unless I sign up for one of the commercial tiers.
-
Kay CeeWot commented
I have used +aliases for 6-8 years, and have seen only two items from an identifiable source return to me from a nefarious party.
-
Kaley Schenk commented
I one of the main things I use plus aliasing for is to identify potential data breaches or sketchy information sharing practices based on what email address is receiving scam/phishing emails. This would be an excellent extension of that function.
-
Kyle commented
I currently do not use plus aliasing because it is not supported by HIBP. If this was supported I would start to use + when signing up for online services. I believe the percentage of people using this feature in their usernames would increase if it was supported by HIBP.
-
Lee Brotherston commented
Just to add my 2c....
Although the numbers are low in the samples that you look at, I would suggest that the comment below regarding these being likely to be people with security responsibilities is likely to hold true.
I think that this is a fairly ubiquitous feature of mail providers these days (with the notable exception, I think, of the default Exchange setup) even if it's not adopted by users that often.
I'm sure not sure what the underlying stack is to gauge how much effort would be involved (e.g. a change to an SQL query vs needing to retrospectively update a bunch of metadata that's generated at import time, etc) but I would suggest that there are a couple of implementation routes for this:
- Add this to the search facility, that user@domain.com will search for both the provided address as well as user+*@domain.com
- Leave search as-is, but setup +stuff in the notifications much like whole domain notifications. This way, presumably, no need to retrospectively update data, rather just handle this during the import of new data. -
Michael H commented
The sooner this is implemented, the better.
-
Mike Williams commented
While the absolute number of people affected is small, consider that they are more likely to be people with security responsibilities elsewhere and as such are valuable vectors.
-
Technophile commented
I did an inventory of my gmail "+" and "." variants. I'm currently using 67 variants of my email address.
-
Joe Kirwin commented
Lead with an example:
Search for pwned.hibp@gmail.com and (if this was a real account) pwnedhibp@gmail.com and you'd get different results. Yet for all intents and purposes it's all your data that had been leaked as they are the same account.
Would it be possible to define some canonicalizers for very popular email providers that remove superfluous things such as periods from the address both in the breach corpus and when searching?
I realize that there could be some user that likes to leverage things like
pwned.hibp+salesConference@ to track down which party exposed some breach, but I feel like that use case is not as broadly useful as giving people a full breach list.References:
- https://gmail.googleblog.com/2008/03/2-hidden-ways-to-get-more-from-your.html -
Technophile commented
+1 for a needed feature. Though using "+..." and/or additional dot(s) is a small (but growing) percentage of email usage, breaches involving those accounts may be a disproportionately higher risk to security-savvy folks. They are the most likely to use these techniques but may also be juicier targets.
-
P commented
CB has an excellent solution for this below. Cleaning up the input addresses on entry into the HIBP database before hashing, would handle the most variations. With the addition of recording which +tags were removed, and positions of periods, the data would be comprehensive with very little compromise.
It may make sense to hash both versions. E.g. "HIBP found an exact match" and "the following variations were found in breach database" are both useful.