Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as troyhunt+foo@hotmail.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was troyhunt+bar@hotmail.com that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.
-
No change to status as of now, complexity remains the same as does prevalence of use.
-
Anonymous commented
Hello Troy,
This feature is by far the most wanted feature for hibp on uservoice.
Not many people use email filtering that way, but almost every people using it are tech-y people who use your website and services.It may not be significant overall but I believe it's a more significant trend within your own users.
What are your thoughts on this feature as of now?
-
Robert Reiser commented
I am using mail extensions (the "+" syntax") extensively to protect myself against spammers and data leaks, and therefore would like to see this feature supported. As such, I would have to submit approx. 150 different email addresses to "Notify me" in order to keep me updated.
Perhaps a different, more simple approach could work? If someone queries a regular email address (e.g. first.last@domain.com), just perform the query as usual, by searching for an exact match. If someone queries an address in the format "first.last+@domain.com", could you do a regex search against the database(s)?
With this approach, it might be easier for both the affected users and the site operators to achieve something acceptable for both sides. Thanks for consideration.
-
AK Prashant commented
I do use email alias services from email service providers (ESPs) like GMail, Outlook & Yahoo. All three have different implementation of alias email address w.r.t. usage of syntax and domain names. Such inconsistencies among ESPs have already been commented here.
I was wondering why these popular ESPs, aren't using the domain search (https://haveibeenpwned.com/DomainSearch) or API (https://haveibeenpwned.com/API/v2) to inform its end-users of the possible breaches of pwned websites. This is similar to how domain administrators are expected to use HIBP.
For example: Yahoo informing me that my private details stored at Adobe were unintentionally exposed and stating that it got this information as-is from HIBP.Advantages of ESP directly using the services of HIBP instead of the end-user:
1. ESP is better equipped to inform the end-user with an enumerated list of breaches associated with each of their alias email addresses (Yahoo allows up to 500 disposable email address to be created by a user) and also the primary email address. Hence, the implementation of handling of exposed alias addresses is handled by respective ESP and not HIBP. User also need not register all alias email address with HIBP.
• FirstNameLastName@yahoo.com was pwned in LinkedIn, Twitter, Yahoo.
• BaseName-Adobe@yahoo.com was pwned in Adobe.
• BaseName-FreshMenu@yahoo.com was pwned in FreshMenu.
2. I read https://www.troyhunt.com/the-legitimisation-of-have-i-been-pwned/ Unlike Amazon or Opentable, most of the internet-related service providers (IRSP) are neither proactive in handling nor do they provide timely alerts to the user of possible breaches. Example: FreshMenu decided NOT to notify its impacted customers. https://www.thenewsminute.com/article/data-breach-freshmenu-leaked-data-110k-users-2016-co-didn-t-inform-users-88195 This might be due to multiple reasons like economic reasons to reduce cost, avoid time spent in pacifying impacted customers, reduce the chances of start-up losing to competition in a new & growing market, weaker Data Protection law and enforcement in India. So it would be better if the appropriate ESP using the services of HIBP alerts the user irrespective of the breached IRSP doing it or not.
3. The total number of ESPs (free, govt., educational & corporate) is smaller than that of all other varied kind of IRSP. Hence the efforts of HIBP, ESPs and IRSP are better channelized in mitigating the effects of a breach and each of them doing what they do the best. -
Stéphane Gourichon commented
# Failure and success of the suffix idea
## Failure of my.email+anysuffix@anydomain.tld option
I agree with Troy here, I see no clean easy handling of + suffix.
For example, whitelisting "providers known to use +syntax" obviously fails with independent sites.
Actually, the suffix trick is really (and should be) an implementation detail, not something that can be detected and handled.
For anyone not convinced you have to know that the character triggering suffix behavior is not standardized. In postfix for example you can set whatever character you want.
And indeed I chose an alternate character because of so many braindead sites that forbid the + character when registering.
## Success of the *@mydomain.org or *@mysubdomain.majorprovider.com option
If the e-mail provider really allows you a full subdomain, you can receive mail at e.g. postmaster@mysubdomain.majorprovider.com or one of the other options, then just use HIBP Domain Search!
Now, where is the form to register for domain notification?
The feature is mentioned on https://haveibeenpwned.uservoice.com/forums/275398-general/suggestions/15109419-include-email-addresses-or-some-info-for-domain yet not seen on main site.
Thank you for any hint.
-
Joshua Bowden commented
While I am an avid user of the '+' aliases along with a wildcard address (my own domain on Gmail) and would support a functionality like this, I've read the comments and have to agree with Troy's stance on this one, there are a number of good reasons why its not worth the investment.
Perhaps a simple solution is to do this on the user end - A simple program to read all the 'to' addresses with a '+' (or just every to field which reached the mailbox, which would support wildcard users) in them via Gmail API and then using HIBP API to check each of those addresses?
-
Banban Daudau commented
"Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be."
I guess running those stats took way more work as a simple implementation (removing the +... part before running the db search seach for '+%'). Also did you check if the adobe breach was a good choice to run this stat ? If I'm not wrong the + feature started to happen in 2002 and most of adobe account were created before this time (I really may be wrong, please verify).
-
parra commented
Please add the + (and the period) for people who intentionally use it as a way of automatically filtering their email (me) or locating where spam could have originated.
-
Addi Tiff commented
Maybe, someday, haveibeenpwned.com will even be able to convert (a) to @ !?
Amazing what computers can do, right? -
R commented
Add the + (plus) and . (period) already, please.
I've always done this.If you are going to be respected by techies which use it, who tell their users to do it, then you will support this function, and more users will follow the advice of their techies.
-
Graham Bull commented
I've been using Gmail with pluses for years. It's annoying when sites don't support this - and there are still plenty of them.
I'm now considering doing something like what Stephen Turner does - e.g. linkedin@example.com, without using pluses.
Not as good as HIBP supporting pluses, but it does allow you to use HIBP's domain search facility. -
Titus commented
Ik have to check for every unique adres sperately. Why? My provider supplies instead of just the xxxx@dds.nl als o<enter_anything_here>@xxxx.dds.nl.
Ik give any company it's "own" adress. I use it to automatically filter my email , move to folders, and block adresses that have been leaked. Also I can prove to a company that they are the breach.
But as it is even more specific than + adressing, I have little hope for a functionality like <wildcard>@xxxx.dds.nl.
I can imagine the possibility for abuse. Maybe this is acceptable when wildcard is only allowed when there are 3 parts after the @?
-
Stephen Turner commented
Ad an addition to this I use the format linkedin.com@example.com with my domains. Searching the data for *.com@* or other TLDs may be useful to help identify the sources of the breach.
-
If everything after the + is stripped, that information is no longer available to the owner of the address. For example, if I load a spam list and someone used "+netflix" then they no longer know it came from Netflix. Yes, they've has to explicitly check that address but many people also have domain-wide searches and this would screw that up.
In short, nothing yet has changed with this idea: the pattern is still at very close to 0% usage and the same barriers still exist.
-
Anonymous commented
I'm not sure who loses information about where the breach came from. Could you clarify that?
And I think that if you choose to implement a feature to check against these cases, you will have to do it on a provider by provider basis anyways. Like you mentioned earlier, some providers have different rules (I'll keep in mind that outlook also has this awesome feature).
What about this?
When checking out a breach, instead of just stripping when there's +syntax, you create another column, eg "base_email", and if the email uses + syntax and is from a provider that is known to use +syntax, assign a stripped version of the email, else just the normal email adress(or None)?
And then later when you're sending out notifications, also check the base_email? -
Then you lose the information about where the breach likely came from which in cases like the last breach, if very important to people. Plus, applying this to one sole email provider feels exceptionally dirty and misses the same pattern used by other providers (ie outlook.com).
-
Anonymous commented
What if you just stripped everything after the "+" and maybe the dots, but only for gmail adresses?
Since they're the largest email provider that actually ignores dots and everything after the plus(I think.)
There is a possibility of getting multiple entries by doing so, because besides finding out who added you to a mailing list and filtering out, another use of the plus and dots is registering multiple accounts with the same email adres on 1 site. -
seizedengine commented
Adding my vote to this. Its completely understandable that it is a significant development effort for a very small percentage of people however that group would appreciate it greatly. I think the number of users who do plus aliasing is also a group that is strongly security aware and are more likely to be subscribers to HIBP. In fact in posting this comment I am using a plus aliased account.
-
Anonymous commented
I think people who use + emails are both more likely to use haveibeenpwned and less likely to have their passwords compromised due to being more selective about the websites they use.
-
It's not that simple Paul, there's a lot of other downstream impact by now having more data in the database than was originally in the breach. There are other processes this feeds into not to mention the way it changes the search for the reasons I've already mentioned.
At this point in time, the fact remains that this pattern is used by almost nobody based on the data I'm seeing in the breaches. I'll keep assessing it and I *would* like to do this at some point, but it'd be a very bad ROI on the effort right now.