Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as troyhunt+foo@hotmail.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was troyhunt+bar@hotmail.com that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.

-
Mike commented
Well, if anyone would know it'd be you. Thanks for your willingness to engage!
-
Mike, you'd be surprised at how mainstream the HIBP user base is, largely because of how much press it gets in the general media. But even if I was off by a factor of 10 (which I'm almost certainly not), in an incident like River City Media, the percentage of people using this pattern rounds to 0% even with 2 decimal points of precision!
I understand this is important to the people using it, but I need to look at the impact from the effort and at present, it remains near non-existent.
-
Mike commented
Troy, I'd argue that your user base is not represented by the data in breaches. Obviously a very small percentage of people in the world use a + in their email (as evidenced by your research). But, I'd wager that a much larger percentage of people using HIBP do.
I assume your hope is that even the most technologically illiterate users come to HIBP. However, I imagine that most users are already security conscious and don't fall into that group.
-
Since Antonios has left a comment and I've also just loaded the largest data set ever into HIBP, I thought I'd add a current figure to the discussion here:
0.0038% was the percentage of people with a + in their email address in the River City Media spam list. 1 in every 26k people is a hard ROI to justify when there's a fair bit of work to invest!
I'll keep monitoring the use of this pattern, but as of now, it remains *exceptionally* rare.
-
Antonios Chariton commented
From your blogs I think you are using a Key, Value data structure, which means when a query comes, your data store needs an exact *key* to find the value (if it has been breached or not). That's probably the best data structure for HIBP since it can scale infinitely, however it will not allow you to query troy+*@hunt.com.. I guess the only way to address that is to either canonicalize the data as you add it, by removing everything after "+" (or ".", or "-"), which means this will only work with new data sets, or change the table schema / contents of "Value", which is very unlikely to happen.. Another solution would be to create a new "table" with all e-mails with "+", ".", or "-", and then query both when someone requests information, only that this time you format the "Value" of those "Keys" accordingly.. Although it may seem like a lot of work, the earlier it is done, the better it will be as it will include more datasets..
-
Henrik commented
Unfortunatley there are some online-services that don't accept emailadresses with + sign. I had problems with 2 services in the last month.
-
To Kem's question, we're *always* talking tiny percentages. I just checked the last set of data I loaded which was a spam list and only 0.009% of emails used the + syntax.
This is something I still want to add folks, but it'll be to the benefit of a tiny percentage of the community.
-
Kem Jones commented
Tony,
I'm in the 0.03% and have been for years. It's been a fantastic way to identify abusers of my email address. I'd love to see this feature implemented and would be glad to help any way I can.
The 0.03% stats appear to be from November 2014. Do you have more recent stats? (Today is November 29, 2016.) I'm curious if more people have caught on to this technique these days...
Thanks,
Kem -
Since there's a comment in here about "go read the RFC", here's the RFC that describes subaddressing: https://tools.ietf.org/html/rfc5233
-
Wout Mertens commented
I would be perfectly happy if this were only implemented for new breaches and if it didn't tell me the exact tagged e-mail address.
Under these conditions you would only need to canonicalize email addresses as they come in and the rest of the code would work as-is. Convert to lowercase, strip generic subfields, add a special case for gmail dots and yahoo hyphens, store it and that would be it.
-
Anonymous commented
Would it not be possible to just add a, "This email address has tags," checkbox to the search, with a small tooltip telling confused people what it is? That way the extra code to search for tagged addresses never gets executed for the 99% of people it's not relevant for.
-
Nick commented
You should not be deliberately programming data leaks into HIBP!
Just because Google started to support this "+tag" idea does not undo the fact that + has been a valid char left of the @ in internet email addresses since there have been internet email addresses. Go read the RFCs.
Now, if you want to do it on a domain-by-domain basis after verifying that a domain that supports "+tag" otherwise does not allow + on the left of @ in any other email address, all power to you, but please do not do it for all domains "because Google supports it". The internet has already had far too much non-standards conformant cr*p to deal with because of attitudes like that in past (when variously Sun, IBM and MS have been the problem actors).
-
Cory Charlton commented
I've recently started switching registrations to use the "+" syntax and would love it if this feature was supported. As it stands I am adding subscriptions for every "+" variation I have used (or remember using)
-
Dylan Katz commented
I use this feature quite often, but have been hesitant to use it recently due to haveibeenpwned's not supporting it. This would be an awesome function.
-
Adding both the full address with the "+" and a normalised ones without it would be an option, but it's difficult to do retrospectively and would mean enumerating back through hundreds of millions of records (I can't just query for everything containing a plus due to the data structure).
I'll continue to monitor this and if it either becomes easier to implement or more popular with users (it's still *extremely* rare) then I'll reassess.
-
Anonymous commented
How about using the canonical e-mail as an index field, and adding a second e-mail field, containing the actual email from the breach ?
This way, if guillaume+hello@whatever.com is found in a breach, create a record in the database for "guillaume@whatever.com".
-
Anjor commented
This is a neat idea, and I'd really like to see it implemented in HIBP.
Having said that, I also question how far should HIBP go and adjust for the quirks of individual email providers. Generic sub-addressing is a standard (rfc5233) and implemented by nearly every provider, but going all the way would be much more involved than that.
A few examples of such quirks:
1) AFAIK, dot addressing is something specific to gmail. Other email providers (like Hotmail) treat JaneDoe@outlook.com and Jane.Doe@outlook.com differently.
2) For plus addressing, Yahoo allows users to create a separate prefix for disposable addresses e.g. JaneDoe@yahoo.com will have addresses of type "CustomPrefix-foo@yahoo.com" and "CustomPrefix-bar@yahoo.com" (Also note the - instead of +)
3) For domain equivalence, gmail.com and googlemail.com are equivalent, but contrary to what one of the comment below claims, JaneDoe@outlook.com and JaneDoe@hotmail.com are different accounts.
Generic plus addressing is a good idea, but I feel implementing support for every possible email address equivalence might not be as useful for the effort involved, especially when the other problems are much more easily solvable by registering just one more account in HIBP.
-
The problem with a domain alias is that it's even less predictable than the "+" pattern. There's no assurance whatsoever that all domains will continue to work consistently in an interchangeable fashion nor is there necessarily a canonical list of them for each email provider.
I'd also be surprised if if many people actually used them in that way. Per the stats here, use of the "+" syntax is extremely rare and I can only imagine that domain substitution is even less so.
-
Paul M Edwards commented
Another consideration is domain aliasing. For example, these are equivalent:
JohnDoe@gmail.com
John.Doe@googlemail.comAs can be:
JaneDoe@outlook.com
JaneDoe@hotmail.com
JaneDoe@msn.com
JaneDoe@live.com -
Robert Headley commented
I just posted a similar suggestion. Google also uses periods in the email address for filtering. so email@gmail.com and e.mail@gmail.com will go to the same user, but like + syntax, it allows you to filter those messages.