Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as email@example.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was firstname.lastname@example.org that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.
Lot of good comments and suggestions here. There is just one thing here that I miss. For the proof of the email address to support this you could do an e-mail verification. I understand that getting the UI right for people to specify whatever their settings are would be tricky, but let's say you could select "I have e-mails with plus suffixes", then the verification e-mail would be sent to `email@example.com` and if you are able to verify it, at least the notification for new pwns could be working. Of course the variations would have to have their own select box or something, but that allows for incremental feature addition when people would request their particular settings. That would also show how many people actually care about anything else than "+suffix" format without the need to implement everything at once. Of course I would also love the `breachedaccount` API to support this, but I understand that is more complex to achieve.
One more thing that I have noticed here is that when Troy run the stats, only the plus sign addresses were calculated, not the .dot syntax or other suffix/prefix combinations. And it is fine, it is as impoosible to get better stats as it is to implement this completely as if you can do one, you can do the other. However that proves that partial, temporary implementation is better than nothing.
I think handling of these is important. I've actually used various iterations of a Gmail email address with different period placement to give the appearance of unique email addresses. This is usually for opting out of sites that publish my information using multiple versions of my name (such as whitepages.com, peoplesmart.com, etc.) where they often only accept one opt out request for an email.
An easy solution to the problem of periods used or not used at any location in a username would be to simply do a use do a comparison of email addresses without any periods. In other words, if replace(SearchedUsername,".","")=replace(PwnedUserName,".","") then there is a match. This could be implemented such that it applies automatically to email domains known to support this (such as gmail.com), and otherwise only when the option is checked by the user. That way, you're not checking every possible iteration. It might be more efficient to have a second field in the database for a "period-less" email username, but the trade-off would of course be more storage.
Similarly, the + system could be implemented in a similar way, where the plus and everything after are stripped off for searching purposes. I've never used that system, but have been looking at it recently. It seems to me that, in most cases, the + tag is not really needed, as the sites the email was used on should often bring that into context. But, as another user suggested, you could also allow searching of the + tags as well simply so a user who received a positive result on their base email username could then check using specific tags they use.
Without reading the whole comments here (just the first page, sorry), there really is no way to do this. The Postfix MTA allows multiple, configurable delimiters. And you can never be sure that a delimiter character IS a delimiter. I can create an address like this: <"Eat @ Joe's One-Stop + Get Gas"@example.com>, and it would be perfectly valid. The localpart can contain ANY printable 7-bit ASCII characters from decimal 32 to 127 inclusive; the limit is max 63 characters. Don't take my word for it, see RFC 5322.
Steve Work commented
I get the complexity concern, and that '+' and '.' don't cover the bases (qmail used '-', etc.), and that broad-population email sampling shows low use. Sounds like Troy gets that the population of HIBP users isn't representative so broad numbers don't necessarily apply, and that it's pretty easy to know the full set of aliases at my end and check them all. But it sure seems like there's middle ground between flat "no" and covering all possible cases deterministically. It might be fun to work through some of them.
I know this will likely never get implemented, and that using aliases to prevent spam is easily circumvented by spammers stripping it again.
But it was such an innocent feature to use, without any consequence, and HIBP is the first time I've had an issue with it, and now my security is slightly worse for it. Is there no middleground?
I use the plus syntax for all my logins, it'd be great to have this supported in HIBP
I'd love to have this implemented to filter out '.' and '+foo' for email domains that allow this, specifically @gmail.com
Wouldn't it be quite easy to follow a recipe like this:
In a breach Troy could do the following for email addresses which are known to implement such a decoration scheme:
- add another record where all decorations have been removed but don't count it to the number of breached email addresses
* found in a breach: firstname.lastname@example.org
-> add also email@example.com as kind of hidden record for breach, too
I understand there is some complexity involved here and email decoration might not be used by many, but those who do it on purpose, do it with something in mind.
there's also a million ways to do this - e.g. not using a syntax at all or just pre-fixing or addending the service name. not sure how you could accommodate all of them.
I just voted in favor, but I understand Troy's perspective.
better enable .dot syntax for google mail accounts!
I understand the problems with complexity of the + (stripping useful information, etc) but what about periods? If my email is
I can put periods anywhere in that email address and they will all resolve to me.
You're not adding a suffix, just inserting periods. How complex would it be to detect variations like that? Perhaps a way for a user to opt-in, noting "please check for period variations on my email address"? That way you're not scanning an astronomical number of permutations against a list. For example, knowing I've opted in and my email starts with "pr" look for all breached emails starting with the letters "pr" and strip the periods from them in a temporary list, and compare with mine?
The reason I ask is because I've used a lot of period variations and can't always remember the way I've used them. I only got notification of a recent breach from a vendor themselves, not HIBP. I could add as many as I could remember to the HIBP mailing list, but don't want to stuff your database with that if this will be added at some point.
IT should work now! commented
Maybe as an alternative you could allow a 'batch' search feature, where I could load all my <name>+<website>@gmail.com addresses and a report with a matched results summary for each?
The onus would be on me to maintain my list of addresses (not hard, a CSV export from 1Password would do it).
No change to status as of now, complexity remains the same as does prevalence of use.
This feature is by far the most wanted feature for hibp on uservoice.
Not many people use email filtering that way, but almost every people using it are tech-y people who use your website and services.
It may not be significant overall but I believe it's a more significant trend within your own users.
What are your thoughts on this feature as of now?
Robert Reiser commented
I am using mail extensions (the "+" syntax") extensively to protect myself against spammers and data leaks, and therefore would like to see this feature supported. As such, I would have to submit approx. 150 different email addresses to "Notify me" in order to keep me updated.
Perhaps a different, more simple approach could work? If someone queries a regular email address (e.g. firstname.lastname@example.org), just perform the query as usual, by searching for an exact match. If someone queries an address in the format "email@example.com", could you do a regex search against the database(s)?
With this approach, it might be easier for both the affected users and the site operators to achieve something acceptable for both sides. Thanks for consideration.
AK Prashant commented
I do use email alias services from email service providers (ESPs) like GMail, Outlook & Yahoo. All three have different implementation of alias email address w.r.t. usage of syntax and domain names. Such inconsistencies among ESPs have already been commented here.
I was wondering why these popular ESPs, aren't using the domain search (https://haveibeenpwned.com/DomainSearch) or API (https://haveibeenpwned.com/API/v2) to inform its end-users of the possible breaches of pwned websites. This is similar to how domain administrators are expected to use HIBP.
For example: Yahoo informing me that my private details stored at Adobe were unintentionally exposed and stating that it got this information as-is from HIBP.
Advantages of ESP directly using the services of HIBP instead of the end-user:
1. ESP is better equipped to inform the end-user with an enumerated list of breaches associated with each of their alias email addresses (Yahoo allows up to 500 disposable email address to be created by a user) and also the primary email address. Hence, the implementation of handling of exposed alias addresses is handled by respective ESP and not HIBP. User also need not register all alias email address with HIBP.
• FirstNameLastName@yahoo.com was pwned in LinkedIn, Twitter, Yahoo.
• BaseName-Adobe@yahoo.com was pwned in Adobe.
• BaseName-FreshMenu@yahoo.com was pwned in FreshMenu.
2. I read https://www.troyhunt.com/the-legitimisation-of-have-i-been-pwned/ Unlike Amazon or Opentable, most of the internet-related service providers (IRSP) are neither proactive in handling nor do they provide timely alerts to the user of possible breaches. Example: FreshMenu decided NOT to notify its impacted customers. https://www.thenewsminute.com/article/data-breach-freshmenu-leaked-data-110k-users-2016-co-didn-t-inform-users-88195 This might be due to multiple reasons like economic reasons to reduce cost, avoid time spent in pacifying impacted customers, reduce the chances of start-up losing to competition in a new & growing market, weaker Data Protection law and enforcement in India. So it would be better if the appropriate ESP using the services of HIBP alerts the user irrespective of the breached IRSP doing it or not.
3. The total number of ESPs (free, govt., educational & corporate) is smaller than that of all other varied kind of IRSP. Hence the efforts of HIBP, ESPs and IRSP are better channelized in mitigating the effects of a breach and each of them doing what they do the best.
Stéphane Gourichon commented
# Failure and success of the suffix idea
## Failure of firstname.lastname@example.org option
I agree with Troy here, I see no clean easy handling of + suffix.
For example, whitelisting "providers known to use +syntax" obviously fails with independent sites.
Actually, the suffix trick is really (and should be) an implementation detail, not something that can be detected and handled.
For anyone not convinced you have to know that the character triggering suffix behavior is not standardized. In postfix for example you can set whatever character you want.
And indeed I chose an alternate character because of so many braindead sites that forbid the + character when registering.
## Success of the *@mydomain.org or *@mysubdomain.majorprovider.com option
If the e-mail provider really allows you a full subdomain, you can receive mail at e.g. email@example.com or one of the other options, then just use HIBP Domain Search!
Now, where is the form to register for domain notification?
The feature is mentioned on https://haveibeenpwned.uservoice.com/forums/275398-general/suggestions/15109419-include-email-addresses-or-some-info-for-domain yet not seen on main site.
Thank you for any hint.
Joshua Bowden commented
While I am an avid user of the '+' aliases along with a wildcard address (my own domain on Gmail) and would support a functionality like this, I've read the comments and have to agree with Troy's stance on this one, there are a number of good reasons why its not worth the investment.
Perhaps a simple solution is to do this on the user end - A simple program to read all the 'to' addresses with a '+' (or just every to field which reached the mailbox, which would support wildcard users) in them via Gmail API and then using HIBP API to check each of those addresses?
Banban Daudau commented
"Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be."
I guess running those stats took way more work as a simple implementation (removing the +... part before running the db search seach for '+%'). Also did you check if the adobe breach was a good choice to run this stat ? If I'm not wrong the + feature started to happen in 2002 and most of adobe account were created before this time (I really may be wrong, please verify).
Please add the + (and the period) for people who intentionally use it as a way of automatically filtering their email (me) or locating where spam could have originated.