Business search engine designed for market intelligence
Northern Light’s business search engine is explicitly designed to meet the needs of business and market intelligence research. Northern Light’s search engine was designed from the ground up to exploit the attributes commonly found in market research reports, business documents, competitive intelligence analysis, and industry news.
Because we own the search engine, we can shape and evolve it to our clients’ precise use cases for a level of business search service that far exceeds the capabilities of general purpose engines. Northern Light’s search engine was introduced in 1997 and has been continuously enhanced with a cumulative investment of over a million hours of engineering effort. Today it effortlessly and effectively serves the business research demands of the largest of custom search implementations worldwide.
The business search engine indexes
When it comes to business, technology, and industry-specific research, Northern Light’s business search engine provides the most extensive and useful set of search indexes available on the market. Every document in any repository indexed and searched by Northern Light is processed with the following:
Stemmed index, ommon to most good search engines today, is where Northern Light starts and most of the others stop. Good practices begin by detecting the dominant language of the document being indexed including CJK characters (Chinese, Japanese, and Korean). Stemming then controls for singulars and plurals, word tense, gerunds and participles – freeing users from having to think about alternative word forms. Northern Light indexes every word of every document.
Unstemmed index makes it possible for users of our business search engine to specify the exact wording they want to be used in a literal search. This capability is useful when the ‘helpfulness’ of stemming control gets in the way of the user’s intention – much like when auto-correct on email or text misses the point. Because Northern Light produces an unstemmed index as well as a stemmed index for every document, our users have the ability to mix and match the approaches in any query. For example, you can search on “developing countries” in the unstemmed index and not get hits on “developed country.”
Proximity index records the location of every word and every concept in every document in the repository. With this index users can specify that search terms be near each other with or without a specific order – forcing close association with words without having to think of all the different ways things might be phrased. The proximity index is a key factor in Northern Light search engine’s ability to search for phrases of any length.
Concept index records the concepts that have been identified in our text analytics solution MI Analyst – a taxonomy of tens of thousands of industry-specific concepts relating to business, technology, and corporate strategy. The concept index features concepts that are related to the industry of the client, such as clinical trials for pharmas, information technologies for IT companies, as well as business strategy concepts like new products, strategic partnerships, and acquisitions.
Name index and acronym index
Name index and acronym index are two indexes that further boost the effectiveness of Northern Light’s business search engine – one by containing only terms that have initial capital letters, and the other by searching for acronyms that are capitalized (e.g., Target), such as abbreviations of company names (e.g., HP). Being able to specify upper case for searched terms can eliminate most of the spurious hits for search terms that are both company names and regular English words.
Relevance ranking optimized for business, industry and technology research
Unlike general purpose enterprise search engines that must serve many dissimilar use cases, Northern Light’s business search engine has the luxury of knowing that our users will be professionals doing business, technology, and industry-specific market research on large repositories of market intelligence reports and news stories. This foreknowledge allows Northern Light to optimize our relevance ranking for these purposes. Northern Light’s relevance ranking balances the following factors, and can be customized to meet the needs of our clients:
Number of times the query terms are in the document
More is better and our search engine rewards more substantive documents typical of research repositories. But this factor has declining incremental impact as the number of mentions goes up to avoid overweighting really long documents.
Number of times the query terms are in the document relative to the length of the document
This number is a “density” measure which can be a very useful weighting. It is generally better to have 25 hits on the query terms in a 5 page document than 10 hits on the query terms in a 200 page document.
Word order and proximity of the query terms in the document
To our knowledge, Northern Light is the only search solution that considers word order and proximity heavily in relevance ranking. Other search engines probably don’t do this because you have to have a proximity index and a phrase index to make it work, and both of these requirements can place a heavy burden on the query processing time. By contrast, the Northern Light Search Engine features a proprietary method of considering word order and proximity in relevance ranking in a way that makes no user- perceivable impact on query response time. It turns out that this capability is a really big advantage for Northern Light in producing superior search results.
Search terms in the document metadata
The Northern Light business search engine recognizes the importance of metadata and boosts documents with search-term hits in metadata such as document titles. This boost takes place even if the search terms as entered by the user did not explicitly include a title metadata search.
Inverse document frequency of the search terms
The inverse document frequency (IDF) of the search terms measures how rare each search term is in the database as a whole and rewards documents that are relatively richer in the relatively rarer search terms. The rare words are better discriminators for relevance ranking purposes.
All things being equal, the Northern Light business search engine ranks more recent documents higher than older documents. In the default setting, this factor is weak and may have little impact because date sort and date-range select are both supported and usually preferable options for using dates. However, the impact of date boost can be turned up for clients who want to bias the results to more recent documents.
Northern Light’s search engine easily scales to enterprise class applications. For example, our largest client has 70,000 users on their SinglePoint application and sends us 500,000 user search queries per month. We handle the entire load on just one enterprise server that is “on vacation” from a utilization viewpoint.
All of these factors go into Northern Light’s ability to create search results of extraordinary quality and utility for market intelligence efforts. Clients who have compared Northern Light’s search results with commercially available enterprise search engines have, over and over, reported easily observable and significantly better quality of Northern Light’s results for market research use in gathering competitive intelligence and performing market strategy analysis.
These search results are then “read” by Northern Light’s market intelligence tool, MI Analyst text analytics engine, for automated meaning extraction and concept identification – a capability that further refines and strengthens the search engine’s primary functions.