SaaS to the Rescue – Part 1: Interfacing to Third-Party Publishers

Corporate IT groups are very good at a lot of things, but developing and managing systems that deal with external content repositories – especially valuable research content licensed from third-party providers on anything less than an enterprise-wide basis – isn’t one of them.  That’s why a SaaS-based “portal” solution for “strategic research” such as market research, competitive intelligence, product development, and technology research makes good sense, resulting in some of the world’s largest research-driven companies choosing to go that route.

What makes strategic research portals problematic (and downright scary) for corporate IT to build and manage?  There are a host (no pun intended) of issues to consider, which I’ll take up one at a time over the next few weeks in this blog.

For starters, there’s the issue of interfacing to multiple publishing systems at third-party content providers in order to aggregate the content for indexing.   Third-party business and technical research publishers have widely diverse publishing systems.  Some publish in HTML on password and firewall secured websites.  Some publish in PDF formats using a variety of content management systems.  Some publish in Microsoft Office formats including PowerPoint, Excel, and Word.  Some combine many of these together with multiple versions of the same report or “kits” of documents on a topic.  Publishers release content on widely different schedules.  Some publish at a defined time each day, for example, 2:00 a.m. in the morning.  Others publish continuously throughout the day.

The methods for transfering the content to the aggregation point are different between publishers.  Some want the aggregator to use secure FTP, some want the aggregator to crawl a special site or part of a site, some use RSS page links.  And every publisher has a unique set of metadata for its content.  There is neither overlap nor commonality between different publishers’ systems, formats, schedules, aggregation processes, and metadata.  Furthermore, the specifics of research content publishers systems are in constant flux as they change their publishing practices to suit their business needs.  What worked yesterday for aggregating a given publisher’s content may not work today.

IT departments are not technically nor culturally prepared to deal with such a fragmented, disjointed, and ever-shifting content world.   IT departments usually control every aspect of a technical system operating at their enterprises.  They select the platforms and run them, define the content formats, set the schedules, and tightly control all changes to those factors.  When they encounter multiple external parties that all have different solutions from the subject enterprise and from each other, the IT departments simply have no starting point to work with and no realistic hope of getting to end-of- job within any planning horizon.  In a few cases, IT has tried to impose common practices on the research publishers, only to rebuffed as the content publishers have little interest in supporting different publishing practices for each enterprise client that they have.  As one IT manager told us, “We met with a leading research publisher and told them that we wanted to access their content in a specific way.  Basically, they laughed and told us our petty concerns were of no consequence.”

A SaaS-based research portal solutions provider can solve these problems.  Such a provider will devote the staff and energy in developing custom aggregation solutions for each publisher that conforms to each peculiar facet of a publisher’s systems and practices.  Also, the SaaS provider will continuously monitor each publisher’s content publishing systems and adjust as necessary to changes in the systems.  By not asking the publishers to change anything on their end, such a resulting aggregation environment, though complex, works successfully to sweep in the content for indexing as it is produced from as many sources as need be to support the research needs of an enterprise.

NEXT TIME: Implementing security in a strategic research portal