Over the past few weeks we’ve noticed a lot of chatter about the legality of scrubbing the web for data. This issue has been shrouded in too much ambiguity and confusion, so we thought it was about time to clear everything up – at least when it comes to how our team works.
What is web scraping?
Web scraping is the process of gathering data from websites, usually via a program that emulates the human behavior of finding information on the web. It’s useful because it helps translate a lot of unstructured data that’s available on the web into useful information that can be stored and/or analyzed.
Here’s why we’re all a little confused
We exist in the age of social media and, as a result, more and more people are choosing to make information about themselves available on social media platforms. These platforms – e.g. LinkedIn, Facebook etc. – typically require you to register and log in before accessing the data they contain. Their data is restricted, and logging in to export data from these platforms is usually a violation of their terms of service.
So far so simple.
The confusion tends to arise when you consider that, in some cases, certain social media profiles are in fact available via search engines – whether you’re logged into the platform in question or not. This is when people become – understandably – concerned that the data, despite its visibility, is also restricted.
Here’s how we can clear things up
When we first began developing our profile service, we knew the public web represented a vast resource of data that could prove invaluable to our customers. We also knew to effectively use that resource we had to fully understand all the issues involved in gathering the data. So we put together a team to conduct extensive research into all the nitty gritty details so we could become fully educated on the issues at hand.
And, here’s why you don’t have to worry
The services we provide have been developed considering every detail that our team discovered. We’ve worked very hard to ensure our methods are 100% legitimate by focusing entirely on the “public web.”
So what does that mean exactly? When we say we pull information from the “public web” we mean the part of the web which is indexed and made available via search engines – the data that is readily available without registering for or logging into any websites. In the specific case of our profile service, we are referring to the data individual users of social media platforms have actively chosen to make public.
This choice makes the entire issue wholly transparent. When a user chooses to make their profile public, they are deciding not to restrict their data to the platform they are using, choosing instead to share it with the wider web via search engines.
Once the data is made public in this way, we have the right to gather it without ever approaching any legal “gray areas” – thus ensuring you can use our services without any concerns at all.