Home

Welcome to the Knowledge Base wiki!

Umbrella SI Journey

Knowledge Base: Leverage IBM Cloud, Watson services, Data Science Experience and Open source technologies to derive insights from unstructured text content generated in various business domains.

Short Name

Build a knowledge graph from documents.

Short Description

One of the biggest challenge in the industry today is, how to make the machines understand the data in the documents just like a human can understand the context and intent of the document by reading it. The first step towards it is to somehow convert the unstructured information (free-floating text and tables text) to somewhat structured format and then process it further. That’s where Graphs play a major role in giving shape and structure to the unstructured information present in the documents.

Offering Type

Cognitive

Introduction

There is a lot of unstructured text content that is generated in any domain - software development lifecycle, finance, healthcare, social media etc. Valuable insights can be generated by analyzing the unstructured text content and correlating the information across various document sources.

This composite code pattern uses Watson Natural Language Understanding, Python NLTK, Orient DB, Node-RED and IBM Data Science Experience to build a complete analytics solution that generates insights for an informed decision making.

Author

By Neha Setia

Code

https://github.com/IBM/build-knowledge-base-with-domain-specific-documents/

Demo

N/A

Video

https://youtu.be/lC2-h2ac_Jg

Overview

This composite pattern uses a combination of other individual code patterns to derive insights from unstructured text content across various data sources. It is intended for developers who want a head start to build a complete end to end solution for such insights.

This composite pattern demonstrates a methodology to derive insights with IBM Cloud, Watson services, Python NLTK, Orient DB and IBM Data Science experience using the below code patterns:

Flow

The unstructured text data from the .docx files(HTML tables and free floating text) that need to be analyzed and correlated is extracted from the documents using custom python code.
The text is classified using NLU and also tagged using the code pattern - Extend Watson text classification
The text is correlated with other text using the code pattern - Correlate documents
The results are filtered using custom python code.
The knowledge graph is constructed.

Included components

IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost-effective apps and services with high reliability and fast speed to market.
Watson Natural Language Understanding: An IBM Cloud service that can analyze text to extract meta-data from content such as concepts, entities, keywords, categories, sentiment, emotion, relations, semantic roles, using natural language understanding.

Featured technologies

Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
Natural Language Processing

Blog

Title - Walkthrough on building a knowledge base by mining information stored in the documents.

One of the biggest challenge in the industry today is, how to make the machines understand the data in the documents just like a human can understand the context and intent of the document by reading it. The first step towards it is to somehow convert the unstructured information(free-floating text and tables text) to semi-structured format and then process it further. That’s where Graphs play a major role in giving shape and structure to the unstructured information present in the documents.

The Composite code pattern has been designed to give a detailed description to developers who are keen on building the domain-specific Knowledge Graph. The Code Pattern covers and addresses all the aspects to it, right from the challenges that one can come across while building the knowledge graph and how to resolve them, how to fine-tuning this code pattern to meet their requirements. This Code pattern makes use of the Watson NLU, Extend Watson text Classification Code Pattern to augment the entities picked by [Watson NLU] (https://developer.ibm.com/code/patterns/extend-watson-text-classification/) , and correlate documents from different sources to augment the relations picked by Watson NLU. Basically, it makes the best of both the worlds- rule-based and dynamic Watson NLU. Then the results are filtered to meet the needs of that domain.

View the entire Knowledge graph Journey, including demos, code, and more!

Links

(Watson NLU)[https://natural-language-understanding-demo.ng.bluemix.net/]
(Watson Studio)[https://dataplatform.ibm.com/]
(Python NLTK)[https://www.nltk.org/]
(Ultimate Guide to Understand & Implement Natural Language Processing)[https://www.analyticsvidhya.com/blog/2017/01/ultimate-guide-to-understand-implement-natural-language-processing-codes-in-python/]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!