Webscraping

Hi,

I have an internal site- SSRS actually, that I can display in an iframe. By it’s nature the site contains data that is auto generated. There’s simply nothing there for an API or URL parameter to ‘grab/manipulate’. It would be very nice if there was a way to get some of that data from the website into suitecrm. Anyone know about this? Is there an implementation?

Thanks,

Lars

This can usually be done easily in PHP with a simple curl request followed by a smart grep.

But of course it depends on the structure of that site’s output. If it’s complex you’re better off with some kind of DOM parsing.

Thanks PGR,

I assume that you mean on a Linux deployment? It’s been a while since I used Linux, but aren’t CURL and GREP Linux shell commands? Unfortunately I’m using a Windows one :frowning: . (Its more difficult, but the company I’m with won’t go for Linux, despite the stability, security, and ease. It would make everything simpler really).

Lars

No, I was thinking you could do it from within PHP. SuiteCRM already uses the curl module, and PHP provides several commands to use regular expressions.

Some examples:
https://github.com/salesagility/SuiteCRM/search?q=curl_init&unscoped_q=curl_init

https://github.com/salesagility/SuiteCRM/search?q=preg_match&unscoped_q=preg_match

This can be easily done in a way that works both on Windows and Linux.