Enable search and notifications for email addresses using the "+" syntax
A lot of people use a syntax such as email@example.com where foo is a unique identifier for the site. They do this so that if they begin getting spammed, they can identify the source their email came from.
At the moment, HIBP treats this is a totally unique email address so if I've search for the parent email address without the "+" syntax, it won't be found. This idea is to ensure that searches and notifications recognise the syntax and return addresses that are logically still the same account.
One thing HIBP would also need to do is specify which account alias was in the breach or paste. For example, I would want to know that it was firstname.lastname@example.org that was exposed in the XYZ breach.
Edit: Just to put the value of this into context, I've just run some stats on the Adobe breach. Of the the 152,989,508 rows in the dump, only 49,905 email addresses have a "+" in the address so that's 0.03% of entries. That number is also a bit high as it includes junk entries. I'm definitely not ruling this idea out - it's still planned - I just wanted to give a sense of how useful it would be.
Edit: To add to this idea, Robert's comment about a period in the email is also very valid. I'd want to be very clear about the ubiquity of this practice across mail providers, but it's certainly a good suggestion and worth further investigation.
The problem with a domain alias is that it's even less predictable than the "+" pattern. There's no assurance whatsoever that all domains will continue to work consistently in an interchangeable fashion nor is there necessarily a canonical list of them for each email provider.
I'd also be surprised if if many people actually used them in that way. Per the stats here, use of the "+" syntax is extremely rare and I can only imagine that domain substitution is even less so.
Paul M Edwards commented
Robert Headley commented
That's along the lines of what I've been considering Nate, the main challenge is that when someone searches for an address without a plus in it, I need to be able to pull back the address WITH the plus in it. This means that I need to store a reference in the table without the plus which the current data scheme doesn't support.
Right now, there's a key and then a comma delimited list of impacted breaches. I'd need to add the account with the plus along the breach and not only that, but there could by MANY instances of accounts with a plus on the same breach for the same user. For example, let's say that again email@example.com on Adobe I need to support firstname.lastname@example.org and email@example.com. Now we're talking about a collection of related addresses against the occurrence of the master address next to the breach.
I don't mind adding an extra table query for the rare instance where the plus symbol is used and I also don't mind iterating back through every existing row for a bulk update of some kind, the main thing is that for the 99%+ of searches where there's no plus, I don't want to add an overhead for those guys.
Nate Kerkhofs commented
slight correction: in 2b, "it" refers to the main table.
Nate Kerkhofs commented
I think a possible solution would be the following:
1. add an extra table for addresses with a + in them, with a relationship to the main table;
2. when an address with a +is added, do 2 things:
a. add the address WITHOUT the + to the main table;
b. add the address WITH the plus to the second table with a link to it.
3. When a user queries an address that has a stored plus value, you can retrieve all values from the other table and show them somewhere to the user (for example, below the breach it's from).
This means you store all mails in the same method, namely always without the plus, but you can easily (single query) retrieve the value with the plus. It also seems to me like it's something that can quite easily be implemented (I think I could implement it myself, and I'm only a junior developer). The biggest downsides:
1. it takes slightly longer to import the breaches because you have to transform the pluses;
2. if the address has a + in there, you need an extra query to retrieve them, slightly affecting performance and possibly costing extra;
3. you need to develop a one-time transformation which can deal with the existing records (probably the most complicated part).
Totally right @anonymous, it's resolving the relationship between x+y@ and x@ that's the tricky bit. It'd be easy just to allow the breach to be found when searching for x@ (I'd just add it as a standalone record), but there's no construct at present to turn that around and advise the user that x@ was breached by virtue of x+y@ being breached.
I would love to have this feature, then I would implement the "+" syntax now that HIBP detect it too.
Good idea Troy :)