Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as troyhunt+foo@hotmail.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was troyhunt+bar@hotmail.com that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.
-
Mike Chu commented
I use a plus added to my user portion of my email address eg FirstLast+SomeUntrustedService@mailfence.com. This helps me identify email list sharing or breaches.
As a HIBP visitor, I would like the ability to search my email address and automatically include records that might have the plus symbol and trailing alpha numerics so I can see any pwnage.
Example searching for test@example.com should also return test+service123@example.com
Thanks for considering
Mike Chu
-
Mario commented
Why does the the sign in and sign up boxes in google account page suggest that users including myself without a choice other than to incorporate these *_ + , wasn't an option for me ? asking everyone from Verizon ,Sprint Boost,to family members, then reaching out to Google for direction? and Nada ...what then ? I've lost my mind,completely! Or Ignorant to technology and to the internet one of the two, "ignorant " cause I'm just ain't that detached .A lack of knowledge as said for betterterminolog.I knew my account was compromised and did what I was told. Change password and email. Or both in my case. Every person either in person,by phone ,or browsing help search and trying to explain in the describe issues box I'm not that ignorant to it anymore I kept asking. Kept reading and browsing and learning. is actually leading me to be safer online gain more wisdom and engage in more productive,and practice more meaningful usage on line join in with social media societies and help where I can l'm Just trying to get answers to my online hurts ,habits and hang ups .learning comes with mistakes . Especialy when you got to help your self. In the matter Peace be to you .
-
Bruce Korb commented
An excellent idea!!
-
Juan commented
It would be a really good addition. I've been using this feature for a while now, but it becomes unmanageable to check each alias.
-
Koen commented
I would love to see this feature added.
Btw my ISP does not use the plus sign, but rather has two other options, either xyz@mylogin.myprovider.nl or mylogin-xyz@myprovider.nl
-
Rich Bo commented
I use the plus syntax for all my logins, it'd be great to have this supported in HIBP. I have dozens (if not hundreds) of unique email addresses that I would never know may be impacted by a breach, leaving me exposed. On one hand, I'm more secure in the masses (myname+thissite@gmail.com will have a different login and password than myname+thatsite@gmail.com) so a single breach doesn't mean I need to change hundreds of passwords. But loclaly I'm less secure because I never know if myname+thissite@gmail.com was part of a breach, so my password doesn't get updated.
-
John Venice commented
Paul's comment from last year seems the most feasible. An additional column in the database for an original email address (with the alias, x+y@gmail.com) and your existing email address as a normalized email (x@gmail.com). If you use the 2 repos mentioned by Not That Hard? you could then normalize on input to the database. You could even run against the existing database and just swap out the emails for normalized ones and copy the originals to the new column. Search feature then handles searches with the normalize and strip functions so it's searching for the same thing that would be found in the database.
That provides a solution to resolve the normalization of input to the database without the alias, storing the original address involved, and searching for the email by the normalized and stripped email address. Then anyone searching x+y@gmail.com or x+z@googlemail.com will in face be searching for x@gmail.com, providing them with correct and adjusted results. You simply need to render the alias out in the results by pulling the data and offering it up next to the breach.
Most people in tech expect a breach to leak an email at some point. It'd be nice to know where from and which alias. Also happens that all people in tech rely on your site to some degree, so the people standing to benefit the most are also your biggest advocates. We're all thankful for all the hard work and time you've put into this, so please don't take any of the comments as a knock on you or the site, it's just a feature that could make a huge difference to many.
-
Chris Zuber commented
Some means of normalizing emails would definitely be a great addition. Not just for the plus, but also for case and dot variations.
Having managed a website with a lot of elderly users, I know how often email addresses can be something like "First.Last@provider.tld" and that these are often inconsistent in case or dot usage.
-
Alex Guenser commented
I have been migrating some of my site information to using the + syntax, so i can better trace who gets my email from where. But I am very wary to use this too frequently, in case I can't trace now automatically if my email and password has been leaked!
-
St. Mueller commented
Like others I have been using Yahoo's disposable addresses frequently. This useful mechanism made good sense because it has kept my main email address save so far. The down side is that I cannot possible imagine how to test all 141 addresses manually through HIBP and I do not know enough about working with the API to do it on my own.
Ever since I first heard about HIBP, I was hoping for such a possibility and I am hopeful that this discussion leads to some useful outcome. My first idea was to feed a list of email addresses into HIBP and that the results are sent back to these email addresses. I am sure there are things that I am overlooking but it seems reasonable that mainly the owner of the address in question should be the one informed.
Many, many thanks to the people behind this valuable tool!
-
Tom Ryder commented
Please do this; it would be very useful for me (i.e., +1). I like the idea of normalizing the emails into some canonical form as they appear in the databases.
-
Best change for HIBP commented
I'd love to see this. Maybe this year ?
Also - how many searches / subscriptions are for e-mails with +foo in user names? Maybe this will be a better argument for implementing? -
Not that hard? commented
What about just normalizing emails? There are libraries out there that can do that with all the different rules that exists (ie + for gmail/outlook, - for yahoo, remove "." for gmail, change domainname for googlemail.com/protonmail.com (pm.com) etc. Example libraries:
https://github.com/naile/canonical-emails
https://github.com/soundcloud/normailize -
Martin commented
Lot of good comments and suggestions here. There is just one thing here that I miss. For the proof of the email address to support this you could do an e-mail verification. I understand that getting the UI right for people to specify whatever their settings are would be tricky, but let's say you could select "I have e-mails with plus suffixes", then the verification e-mail would be sent to `your.mail+generated_random_suffix@your.provider` and if you are able to verify it, at least the notification for new pwns could be working. Of course the variations would have to have their own select box or something, but that allows for incremental feature addition when people would request their particular settings. That would also show how many people actually care about anything else than "+suffix" format without the need to implement everything at once. Of course I would also love the `breachedaccount` API to support this, but I understand that is more complex to achieve.
One more thing that I have noticed here is that when Troy run the stats, only the plus sign addresses were calculated, not the .dot syntax or other suffix/prefix combinations. And it is fine, it is as impoosible to get better stats as it is to implement this completely as if you can do one, you can do the other. However that proves that partial, temporary implementation is better than nothing.
-
Anonymous commented
I think handling of these is important. I've actually used various iterations of a Gmail email address with different period placement to give the appearance of unique email addresses. This is usually for opting out of sites that publish my information using multiple versions of my name (such as whitepages.com, peoplesmart.com, etc.) where they often only accept one opt out request for an email.
An easy solution to the problem of periods used or not used at any location in a username would be to simply do a use do a comparison of email addresses without any periods. In other words, if replace(SearchedUsername,".","")=replace(PwnedUserName,".","") then there is a match. This could be implemented such that it applies automatically to email domains known to support this (such as gmail.com), and otherwise only when the option is checked by the user. That way, you're not checking every possible iteration. It might be more efficient to have a second field in the database for a "period-less" email username, but the trade-off would of course be more storage.
Similarly, the + system could be implemented in a similar way, where the plus and everything after are stripped off for searching purposes. I've never used that system, but have been looking at it recently. It seems to me that, in most cases, the + tag is not really needed, as the sites the email was used on should often bring that into context. But, as another user suggested, you could also allow searching of the + tags as well simply so a user who received a positive result on their base email username could then check using specific tags they use.
-
/dev/rob0 commented
Without reading the whole comments here (just the first page, sorry), there really is no way to do this. The Postfix MTA allows multiple, configurable delimiters. And you can never be sure that a delimiter character IS a delimiter. I can create an address like this: <"Eat @ Joe's One-Stop + Get Gas"@example.com>, and it would be perfectly valid. The localpart can contain ANY printable 7-bit ASCII characters from decimal 32 to 127 inclusive; the limit is max 63 characters. Don't take my word for it, see RFC 5322.
-
Steve Work commented
I get the complexity concern, and that '+' and '.' don't cover the bases (qmail used '-', etc.), and that broad-population email sampling shows low use. Sounds like Troy gets that the population of HIBP users isn't representative so broad numbers don't necessarily apply, and that it's pretty easy to know the full set of aliases at my end and check them all. But it sure seems like there's middle ground between flat "no" and covering all possible cases deterministically. It might be fun to work through some of them.
-
Anonymous commented
I know this will likely never get implemented, and that using aliases to prevent spam is easily circumvented by spammers stripping it again.
But it was such an innocent feature to use, without any consequence, and HIBP is the first time I've had an issue with it, and now my security is slightly worse for it. Is there no middleground?
-
Anonymous commented
I use the plus syntax for all my logins, it'd be great to have this supported in HIBP
-
Anonymous commented
I'd love to have this implemented to filter out '.' and '+foo' for email domains that allow this, specifically @gmail.com
Wouldn't it be quite easy to follow a recipe like this:
In a breach Troy could do the following for email addresses which are known to implement such a decoration scheme:
- add another record where all decorations have been removed but don't count it to the number of breached email addresses
* found in a breach: john.doe.the.greatest+somesite@gmail.com
-> add also johndoethegreatest@gmail.com as kind of hidden record for breach, tooWhen john.doe.the.greatest@gmail.com subscribes to be notified, he should get the notification that john.doe.the.greatest+somesite@gmail.com appeared in a breach!
I understand there is some complexity involved here and email decoration might not be used by many, but those who do it on purpose, do it with something in mind.