Posted on January 17, 2019

The Collection1 data breach

A lot of people this week have heard about a massive breach containing email addresses and passwords, commonly referred to as the collection1 breach (after the name of the folder it was originally in) – either from getting an email alert from the haveibeenpwned site or seeing a news or social media post about it.

What’s interesting is the debate that’s been going on, mostly amongst those in the cyber security area, about whether this breach should have been publicised. To save me repeating the details there are 3 very good articles that have been written that I think cover all the main points, the first by Troy Hunt who runs the haveibeenpwned service –

Graham Cluley –

And Brian Krebs –

Key details from all that I’ve read so far are that the data is a collated mix from various sources, probably put together to be sold on to be used for things like phishing attacks. Knowing that there are invalid email addresses in the data, it possibly included lists people have generated based on for example common names and nicknames and common formats for some domains. Also, a key point, it’s not known for which accounts the passwords relate to (for good reasons explained on his site Troy doesn’t list the data together in these cases). So if you don’t use a different password for every different account online where your email address is the login name and your password is in the pwned passwords list, it isn’t possible to know where the password was breached from (or indeed if it wasn’t someone else using the same password as you). It’s important to note that the pwned password check is only useful to check for passwords that shouldn’t be used as they’re potentially on lists that are available online, and if a current password you use is listed then you should of course change it as a precaution.

Back to the collection1 data, all of this means that the existence of your email address in the breach (the site will tell you which breaches your email address has been in) isn’t necessarily an issue. Also it sounds like the data is quite old, estimated at being between 2008 and 2015, and that there are already more recent and larger versions around (see Brian Kreb’s article).

The debate around the uploading of the data to the site (resulting in a lot of people signed up to the service or hearing about it finding their email address listed) is whether this was actually a good thing. Personally I think it was, if for no other reason than the fact that it’s what the site is for and so not uploading the data would have gone against that.

The issue is that a lot of people (some coming across the site for the first time) don’t understand the full background and are panicking, especially if they put their email account password into the password check and assume this means their email account has been compromised.

Now we could argue that if this results in people changing their passwords it’s a good thing, but in trying to see both sides of the argument I also agree that too many of these incidents can promote apathy amongst people as they get to the point where it’s so common to hear about data breaches they just become accepted as part of daily life.

I think this is quite an interesting debate – without being too judgemental those of us who have worked in the IT profession for a certain length of time know full well that users aren’t great at reading details properly and good at misunderstanding things if they aren’t explained very clearly and concisely. Some of the media articles on this breach haven’t helped as journalists inevitably put a sensationalist spin on it.

In summary I do agree with the decision by Troy to upload the data but I can see where people who have an issue with it are coming from too. I’ll welcome any comments/thoughts from others.

Leave a Reply

Your email address will not be published. Required fields are marked *