skip to Main Content

Best Practices for Writing for Machine Learning

In 2022, machine learning has made significant inroads into enterprise applications, including knowledge management systems for market and competitive intelligence.  Given this technology reality, the content indexed and stored in such systems needs to be “machine learning friendly.”   Documents need to be written in such a way that an algorithm can read and digest them, as Northern Light does with our automatically-generated Insights Reports.

Across our universe of 250,000 market intelligence professionals using traditional search processes (manually scanning up and down a search result page), users on average, download and read only one document per search and therefore fail to acquire the insights from the documents they do not download.  However, when users start by reading the machine-authored Insights Report – a summary of important ideas within each document – they have access to insights from all the best documents on the search result instead of just one.  And interestingly, users end up downloading and reading more documents when the truly important passages from all of the best documents are presented to them, so the Insights Report becomes an important vehicle for additional content consumption.

How Automated “insights Reports” Are Created

Northern Light’s machine learning works as follows to automatically summarize the important ideas in the documents on a search result:

1. The process extracts all the “summary worthy sentences,” which are usually declarative sentences making a statement ending with a period, from all the documents on the search results page.
2. The machine draws a graph of those sentences which measures how related each sentence is to all the others and how important each sentence’s relationship partners are. The graph basically looks for the overlap of “features” between the sentences. Features are groups of words that like to hang out together in the group of documents being considered.
3. The graph determines that one sentence is most important. The machine picks it for the report.
4. The graph determines that another sentence is the second most important. The machine picks it for the report.
5. The process continues until the target number of sentences for the report is reached. For Northern Light, this is currently 40 sentences.
6. The machine orders the report sentences as they appear in the documents first, and also orders the document sentence groupings in the order the documents appear on the search result.  There is “competition” between sentences for a place on the Insights Report based on how important the graph determines that they are.

Below is a sample search result and then the automated Insights Report written by Northern Light machine learning algorithms on that search result.  

machine learning - sample search result and automated Insights Report written by Northern Light machine learning algorithms on that search result

Writing for the Machine

Here are Northern Light suggestions for writing market intelligence reports in the Machine Learning age.

Write in simple declarative sentences that have noun subjects and predicate verbs that express whole ideas within each sentence. Declarative sentences always end with a period as the terminator, never a question mark or exclamation mark.

Don’t say: The study showed how fast [CompanyName] [ProductName] is. The result: blazing speed!

  • The machine uses sentences as a fundamental unit of idea organization. It will not automatically associate a word in one sentence with an object in another sentence, even in an adjacent sentence. There is no overlap in the words in the two sentences above and the machine will not relate them directly to one another in the graphs. The second sentence is not a complete sentence in that it has no verb as a predicate and it will be ignored by our application as a fragment.

Do Say: The study showed that [CompanyName] [ProductName] is really fast and has blazing speed. 

Questions are to be avoided whenever possible since usually the answer follows the question in another sentence or in tables or lists. We don’t usually care about the questions, it is the answers we want. Put the answers into context with a simple declarative sentence ending in a period.

Don’t say: How often do users update their antivirus program? 23% said daily, 67% said weekly, and 10% said “been meaning to do that but never get around to it.”

  • The machine won’t be able to associate the answers to the questions because there is no overlap of word (the graph won’t connect them) and because the answers have such generic words and figures the machine will consider the answers as less important.

Do Say: Most users update their antivirus programs weekly (67%), some daily (23%), and some never at all (10%).

Avoid putting the important ideas in lists, or at least only in lists.

Don’t say: The key factors in enterprise cloud vendor selection are:

  1. Implementation assistance
  2. Near instant response to increased server needs
  3. Uptime history
  4. Cost
  • The machine will not be able to associate the list to the key sentence above the list that provides context because the words do not overlap.

Do Say: The key factors in enterprise cloud vendor selection are implementation assistance, near instant response to increased server needs, uptime history, and cost.

If you need a list or just want one for readability issues, repeat the findings in a sentence in notes, a sidebar, appendix, etc. using the principles of simple declarative sentences ending with periods.

Avoid pronouns (he, she, they, it) and pronoun-like words (company, firm) whenever possible.

Don’t say: We interviewed key executives in the field such as Frank Smith, SVP of Cloud Computing. He said cloud computing rocks.

  • The machine may not be able to find the antecedent of “He” in all cases and if the wrong antecedent is found and substituted, you have just generated fake news. If we stick to the words as they are written for the automated summary you will get: “He said cloud computing rocks,” leaving the reader of the automated summary to wonder who “He” is.

Do Say: We conducted interviews of executives in the field and Frank Smith SVP of Cloud Computing said cloud computing rocks.

Don’t Say: [CompanyName] had a good quarter market share wise. The company said cloud computing is winning one big new client after another.

  • While not formally a pronoun, company has pronoun-like qualities in that we don’t know who the company is if we only have the second sentence above to work with.

Do Say: [CompanyName] increased its market share during the quarter as cloud computing won one big account after another. 

Restrain the urge to write with creative expression.

Don’t say: The study shows that compared to XYZ’s servers, [CompanyName][ProductName] are quicker than a greased pig.

  • Other writers won’t use the greased pig expression and the sentence will be considered less important by the machine.

Do Say: The study shows [CompanyName] [ProductName] were faster than XYZ’s servers by a large margin.

Use standard vocabulary words that other writers are likely to use.

Don’t say: Geospatial information systems are an essential element of driverless car technology.

  • Other writers might call it GPS and if so the above sentence will be considered less important.

Do Say: GPS is an essential element of driverless car technology.

If you use bullet points make them complete sentences with noun subjects a period as the terminator.

  • As previously mentioned, the machine looks for sentences as a fundamental unit of idea organization. It won’t interpret incomplete phrases that are bullet points as meaningful.

Don’t write with social media conventions like hashtags, especially hash tags as subjects or objects in noun phrases.

Don’t say: “#IBM bring the technology and #Accenture bring the industry expertise” – sums up why #partnering is so important i… https://t.co/fCGDJrXXM9

  • By the way, the above is a recent tweet by an IDC analyst. In our implementation, which is not tuned for social media, we look for actual nouns and proper nouns as the subjects of sentences and we would conclude this sentence does not have a real subject.

Do Say: IBM bringing the technology and Accenture bringing the industry expertise sums up why partnering is so important. 

When using PowerPoint, put any essential oral commentary and the interpretation of graphical elements into the notes field on each slide using the above principles of simple declarative sentences ending with periods. Or append slides that use text sentences that recap the findings in simple declarative sentences ending in periods.

If you put a conclusion or comment in a text box on a slide with a graphical element, use a complete sentence ending with a period in the text box.

Conclusion

With the incorporation of machine learning into knowledge management applications, it has become necessary to create market and competitive intelligence research content for consumption by the human reader and the AI algorithm.  While the human brain and computer function differently, it is possible to write documents in a style that works for both.  The tips enumerated above are a guide to a writing style and structure that will maximize the reach and impact of your research documents.

Back To Top