AdminTroy Hunt (CEO / Founder, Have I Been Pwned)
My feedback
19 results found
-
4 votes
An error occurred while saving the comment An error occurred while saving the comment There’s no guarantee that any password corpus (even when in plain text) will go into Owned Passwords, it’s usually just very large corpuses of largely unseen passwords. Best bet is to just hit the k-anonymity API on demand rather than manually downloading the hashes.
An error occurred while saving the comment I can foresee adding more information about how passwords are stored, but IMHO it needs more information than simply plain text / hashed / encrypted. MD5 / SHA-1 etc are all hashed, but (usually) easily cracked, even when salted. But even bcrypt is crackable when passwords are weak so in a case like the recent Spoutible incident where *very* weak passwords were allowed, bcrypt still poses risk. Part of my hesitancy in reporting how passwords are stored is it may lead people to make assumptions about the protection provided which, per the examples above, may not be consistent with reality.
I’ll leave this suggestion here as I do think it’s a good one, I just wanted to give some context around why it’s not quite as straight forward as it may seem.
-
1 vote
An error occurred while saving the comment It looks like the native browser email address validator is rejecting the emoji version. Try submitting this: https://jsfiddle.net/troyhunt/u10mpzef/
That's at least part of the problem. There's also then a problem with sending a verification email to one of the 4 pre-loaded ones, but for some reason that's only happening in the production environment and not for me locally (I'm wondering if there's a culture difference in the string comparison).
This is going to require some time so am going to leave it here pending for now, sorry there wasn't a quick fix.
An error occurred while saving the comment This *should* be ok, let me delve into a bit, apologies if you get some emails from the system in the process...
-
1 vote
An error occurred while saving the comment I've rephrased this idea to describe how the password is stored. If it's hashed, what algorithm is used and what work factor was applied (if any).
As for the examples you provided, they're not in HIBP as data breaches so definitely out of scope.
-
7 votes
An error occurred while saving the comment Hi Dirk, thanks for taking the time to write this. I'm going to leave it open, but TBH I think it's highly unlikely this will be built primarily because it makes HIBP responsible for communicating directly to people within an organisation. These comms are usually pretty carefully controlled by those who are responsible for infosec within the org. Let's see if it gets any further interest.
-
38 votes
An error occurred while saving the comment The order is alphabetical rather than "arbitrary". I'll look at sorting by breach date and the date added to HIBP in the future if there's sufficient demand.
-
4 votes
An error occurred while saving the comment I've renamed this to be more generic. Ledger is one of nearly 500 incidents in HIBP with many of them having inconsistently complete data across individual records.
The chances of this idea being implemented are very low: there's a *huge* amount of parsing overhead involved compared to the current model of simply running a regex over a file to extract email addresses. It also fundamentally changes the data structure as each email address now requires what's effectively a two-dimensional array of breaches and the fields exposed in each breach. That then also flows through to the UX with the need to represent that paradigm on the website, then to the API where the additional dimension also needs to be represented.
In short: very high overhead and very unlikely to happen. best bet for now is to contact Ledger (or other breached service) and ask them directly what data was compromised for your account.
-
1 vote
An error occurred while saving the comment I'll leave this here for now, but the main barrier is that I want to be sure anyone receiving information about aliases on a domain has control of that domain *now*, not just when they did the original search. Obviously people come and go from organisations and I don't want a situation where someone who previously had access then left can still pull results.
-
3 votes
An error occurred while saving the comment What do you mean by "Why do I have to become a hacker to keep myself safe"?
-
1 vote
An error occurred while saving the comment Yes, stay tuned to this thread: https://twitter.com/troyhunt/status/1164291579705610240
-
10 votes
An error occurred while saving the comment Can you expand on what you mean by "helpful for notifications"? Even with a filter, you'd still need to run the same number of queries and the data returned is small and compressed, plus you can pull the date of the incident from the API that lists the breaches and easily filter the returned records that way.
-
2,634 votes
An error occurred while saving the comment No change to status as of now, complexity remains the same as does prevalence of use.
An error occurred while saving the comment If everything after the + is stripped, that information is no longer available to the owner of the address. For example, if I load a spam list and someone used "+netflix" then they no longer know it came from Netflix. Yes, they've has to explicitly check that address but many people also have domain-wide searches and this would screw that up.
In short, nothing yet has changed with this idea: the pattern is still at very close to 0% usage and the same barriers still exist.
An error occurred while saving the comment Then you lose the information about where the breach likely came from which in cases like the last breach, if very important to people. Plus, applying this to one sole email provider feels exceptionally dirty and misses the same pattern used by other providers (ie outlook.com).
An error occurred while saving the comment It's not that simple Paul, there's a lot of other downstream impact by now having more data in the database than was originally in the breach. There are other processes this feeds into not to mention the way it changes the search for the reasons I've already mentioned.
At this point in time, the fact remains that this pattern is used by almost nobody based on the data I'm seeing in the breaches. I'll keep assessing it and I *would* like to do this at some point, but it'd be a very bad ROI on the effort right now.
An error occurred while saving the comment To David's comments, this shows how tricky the situation is; there's the spec, the practices by various mail providers and then the patterns people general use. I'm very cautious about making assumptions on these as they may not always hold true under all circumstances which then means ending up with a kludge of provider-specific hacks (i.e. always ignore the dot in Gmail addresses). I'm sure everyone can see the challenge and even if solved, there's still just that tiny percentage of people for whom it would make any difference at all.
An error occurred while saving the comment Mike, you'd be surprised at how mainstream the HIBP user base is, largely because of how much press it gets in the general media. But even if I was off by a factor of 10 (which I'm almost certainly not), in an incident like River City Media, the percentage of people using this pattern rounds to 0% even with 2 decimal points of precision!
I understand this is important to the people using it, but I need to look at the impact from the effort and at present, it remains near non-existent.
An error occurred while saving the comment Since Antonios has left a comment and I've also just loaded the largest data set ever into HIBP, I thought I'd add a current figure to the discussion here:
0.0038% was the percentage of people with a + in their email address in the River City Media spam list. 1 in every 26k people is a hard ROI to justify when there's a fair bit of work to invest!
I'll keep monitoring the use of this pattern, but as of now, it remains *exceptionally* rare.
An error occurred while saving the comment To Kem's question, we're *always* talking tiny percentages. I just checked the last set of data I loaded which was a spam list and only 0.009% of emails used the + syntax.
This is something I still want to add folks, but it'll be to the benefit of a tiny percentage of the community.
An error occurred while saving the comment Since there's a comment in here about "go read the RFC", here's the RFC that describes subaddressing: https://tools.ietf.org/html/rfc5233
An error occurred while saving the comment Adding both the full address with the "+" and a normalised ones without it would be an option, but it's difficult to do retrospectively and would mean enumerating back through hundreds of millions of records (I can't just query for everything containing a plus due to the data structure).
I'll continue to monitor this and if it either becomes easier to implement or more popular with users (it's still *extremely* rare) then I'll reassess.
An error occurred while saving the comment The problem with a domain alias is that it's even less predictable than the "+" pattern. There's no assurance whatsoever that all domains will continue to work consistently in an interchangeable fashion nor is there necessarily a canonical list of them for each email provider.
I'd also be surprised if if many people actually used them in that way. Per the stats here, use of the "+" syntax is extremely rare and I can only imagine that domain substitution is even less so.
An error occurred while saving the comment That's along the lines of what I've been considering Nate, the main challenge is that when someone searches for an address without a plus in it, I need to be able to pull back the address WITH the plus in it. This means that I need to store a reference in the table without the plus which the current data scheme doesn't support.
Right now, there's a key and then a comma delimited list of impacted breaches. I'd need to add the account with the plus along the breach and not only that, but there could by MANY instances of accounts with a plus on the same breach for the same user. For example, let's say that again foo@bar.com on Adobe I need to support foo+1@bar.com and foo+2@bar.com. Now we're talking about a collection of related addresses against the occurrence of the master address next to the breach.
I don't mind adding an extra table query for the rare instance where the plus symbol is used and I also don't mind iterating back through every existing row for a bulk update of some kind, the main thing is that for the 99%+ of searches where there's no plus, I don't want to add an overhead for those guys.
An error occurred while saving the comment Totally right @anonymous, it's resolving the relationship between x+y@ and x@ that's the tricky bit. It'd be easy just to allow the breach to be found when searching for x@ (I'd just add it as a standalone record), but there's no construct at present to turn that around and advise the user that x@ was breached by virtue of x+y@ being breached.
AdminTroy Hunt (CEO / Founder, Have I Been Pwned) shared this idea · -
7 votes
An error occurred while saving the comment Hi Scott, the V2 API is definitely rate limited as described here: https://haveibeenpwned.com/API/v2
Have I incorrectly stated it's not somewhere? I'll fix that if so.
-
1 vote
An error occurred while saving the comment Which API? If it's the one to pull back breaches for a single email address, don't you already know the email address as you've just sent it in the API request?
-
63 votes
An error occurred while saving the comment Thanks for the suggestion, I've renamed the title to reflect what you're requesting in the body.
-
11 votes
An error occurred while saving the comment This was completed a while ago but I neglected to update the idea here. See the API docs page: https://haveibeenpwned.com/API/v2#BreachesForAccount
Adding the ?truncateResponse=true query string returns just the name attribute of the breach, for example: https://haveibeenpwned.com/api/v2/breachedaccount/test@example.com?truncateResponse=true
AdminTroy Hunt (CEO / Founder, Have I Been Pwned) shared this idea · -
10 votes
An error occurred while saving the comment Are you getting notifications for when an email on the domain appears in an incident? This should save you running it periodically.
In terms of filtering, perhaps try the Excel export option then use the filters in there.
-
2 votes
An error occurred while saving the comment "who is running the website, where are the servers, what you do collect and do not collect."
All of this is already in the FAQs. If you're saying that your company can't use the service because the title of the page is "FAQs" and not "T&Cs" then no, this is not a "feature" I'll implement, it's a bureaucratic problem with your company!
If I've misunderstand and there's specific information missing from the site then please let me know what it is, but if it's merely "there is no page called T&Cs" then this may not be the right service for you.
An error occurred while saving the comment What are you actually looking for in terms which is not already documented on the site? Give me some more detail and I'll see if I can fill the gaps.
-
7 votes
An error occurred while saving the comment Could you expand on this further please - what additional info would you like to see? It looks like the askmein.com address is just pulling the description I already publish.
-
3 votes
An error occurred while saving the comment Right, so the trick then is establishing the criteria for "likely fake". One way could be a high correlation with a previous paste based on the prevalence of the same emails in both pastes. This would mean taking the emails from the new paste and seeing if a certain percentage already exist in an existing paste. At present this would be quite laborious as I'd need to check them one by one and we're sometimes talking 10k emails in a paste. Either that or re-architect things to make searching like this easier.
Out of curiosity, how much is this happening? I mean how often do you get an email notification and then conclude it's probably redundant with another paste? I'm just trying to get a sense of the scale of the issue.
An error occurred while saving the comment Originally I thought you might even be talking about identifying duplicate pastes (which happens a bit) and there are various angles to that. One thing I keep coming back to though is that even if a paste is duplicate or fake, people usually still want to know how their details are being used. In fact that's one of the other ideas currently in progress - notify people when their info appears on a paste I can't verify or may even be fake.
Would you prefer not to know when your email appears if it may be fake? Or know but be notified that it can't be verified and may be fake?
An error occurred while saving the comment Hey Josh, hanks for the idea! Tell me more about how you think this feature would be used - would it identify that perhaps a paste is fake due to the high correlation with existing breaches? Is it to try and get more confidence around the legitimacy of a paste?
The problem with implying that hashing (which *isn't* encryption!) could lead to complacency is equally relevant if only flagging plain text passwords in the fashion you've suggested. SHA-1 (even with a salt) is *almost* equally as useless as plain text so by that rationale should also be flagged. Then what about bcrypt with low work factors? Or high work factors but poor password strength controls?
This is precisely why services recommend (or even mandate) password changes when passwords of any kind are breached. I'm all for adding more details about the hashing algorithm as it's relevant for those who understand what that means, but the only guidance I want to give to end users after passwords are exposed is "change them immediately".