Collecting Data from the Web – Is it legal?

Over the past few weeks we’ve noticed a lot of chatter about the legality of scrubbing the web for data. This issue has been shrouded in too much ambiguity and confusion, so we thought it was about time to clear everything up – at least when it comes to how our team works.

What is web scraping?

Web scraping is the process of gathering data from websites, usually via a program that emulates the human behavior of finding information on the web. It’s useful because it helps translate a lot of unstructured data that’s available on the web into useful information that can be stored and/or analyzed.

Here’s why we’re all a little confused

We exist in the age of social media and, as a result, more and more people are choosing to make information about themselves available on social media platforms. These platforms – e.g. LinkedIn, Facebook etc. – typically require you to register and log in before accessing the data they contain. Their data is restricted, and logging in to export data from these platforms is usually a violation of their terms of service.

So far so simple.

The confusion tends to arise when you consider that, in some cases, certain social media profiles are in fact available via search engines – whether you’re logged into the platform in question or not. This is when people become – understandably – concerned that the data, despite its visibility, is also restricted.

Here’s how we can clear things up

When we first began developing our profile service, we knew the public web represented a vast resource of data that could prove invaluable to our customers. We also knew to effectively use that resource we had to fully understand all the issues involved in gathering the data. So we put together a team to conduct extensive research into all the nitty gritty details so we could become fully educated on the issues at hand.

And, here’s why you don’t have to worry

The services we provide have been developed considering every detail that our team discovered. We’ve worked very hard to ensure our methods are 100% legitimate by focusing entirely on the “public web.”

So what does that mean exactly? When we say we pull information from the “public web” we mean the part of the web which is indexed and made available via search engines – the data that is readily available without registering for or logging into any websites. In the specific case of our profile service, we are referring to the data individual users of social media platforms have actively chosen to make public.

This choice makes the entire issue wholly transparent. When a user chooses to make their profile public, they are deciding not to restrict their data to the platform they are using, choosing instead to share it with the wider web via search engines.

Once the data is made public in this way, we have the right to gather it without ever approaching any legal “gray areas” – thus ensuring you can use our services without any concerns at all.


NewsVisual: Who will benefit from Whatsapp’s windfall?

Yesterday’s news about the unprecedented price paid by Facebook to acquire the Whatsapp messaging service left us wondering – what are the people behind Whatsapp going to do with all that money?!

Will they donate some of it? And if so, who will be the benefactors? Here’s an image from Prospect Visual that shows why Stanford might have been especially happy while reading the news yesterday…

Brian Acton is a Whatsapp co-founder – reportedly now a billionaire. And who is Jim Goetz? A prominent Whatsapp Board member and investor. A couple of days ago we posted an article on Stanford’s highly successful year of fundraising – US $1 billion in 2012-2013… Looks like they’ve got a chance to continue 2014 in the same vein!

NewsVisual: Facebook and Whatsapp – they’ve always been friends!

This morning it was announced that Facebook has agreed to purchase Whatsapp for US $19 billion. Whatsapp is a popular, fast-growing mobile messaging platform and, with this agreement, becomes Facebook’s largest ever acquisition – bigger than any previously made by Google, Apple or Microsoft.

Experts have commented on a variety of reasons that makes the acquisition unsurprising. For a different angle on the story, see this image from Prospect Visual that shows the relationships that exist between individuals at both companies.

And here’s a few more details on how Jonathan G Heiliger and Jim Goetz are connected:

It turns out that Facebook and Whatsapp have been friends for a long time…