Developer Guide

Setup

Setup Python virtual environment with Python >= 3.7

Clone this repository

   git clone https://github.com/amundra02/ai_pipeline.git

Activate the virtual environment and install the required packages
```
   pip install -r requirements.txt
```

Peek inside the requirements file if you have everything already installed. Most of the dependencies are common libraries.

Pipeline Buckets

Data Connections
Data Preprocessing
Feature Engineering
Algorithm Selection
Training Infrastructure
Model Deployment
Continuous Improvement

Data Connections

Source Files

Cloud Helper
- Parses the configuration file and create the necessary resources to connect to cloud.
- Create Cos Client, Cos Resource,and Clodant Client instance
Download Data
Upload Data

Methods

Get Cos Client Instance

Response

 client = get_cos_client()

Parameter	Description
client	cos client instance

Get Cos resource Instance

Response

 resource = get_cos_resource()

Parameter	Description
resource	cos resource instance

Get cloudant instance and database to fetch data

Response

 cloudant, db = get_cloudant_client()

Parameter	Description
cloudant_client	Cloudant instance - allows access to Cloudant DB
db	database name from where documents needs to be queried

Get Cos Bucket to upload processed data

Response

 bucket_name = get_upload_bucket()

Parameter	Description
bucket_name	Cos Bucket name

Get clouant database name to upload processed metadata

Response

 db_name = get_cloudant_processed_db()

Parameter	Description
db_name	Cloudant database name

Read Image From COS

Convert the downloaded streaming body objects to numpy ndarray

Request

Parameter	Description
client	cos client instance
bucket	cos bucket name from where data is fetched
file	file name to fetch

Response

 image = read_image(cos, bucket, file)

Returns

Parameter	Description
image	file fetched from cos bucket in a numpy array

Download data from IBM Cloud Object storage

Download the data from cos bucket as per the request

Request

Parameter	Description
limit	specify the number of documents to limit the results to. Possible values: value ≥ 0

Response

 metadata, image_data, labels = get_data_ibm_cos(limit)

Parameter	Description
metadata	List of metadata files
image_data	List of images (numpy array)
labels	List of label for each image

Download processed data from IBM Cloud Object storage

Request

Parameter	Description
limit	specify the number of documents to limit the results to. Possible values: value ≥ 0

Response

 metadata, image_data, labels, annotations = get_data_ibm_cos(limit)

Parameter	Description
metadata	List of metadata files
image_data	List of images (numpy array)
labels	List of label for each image
annotations	Annotation details for each image object

Create and upload metadata document for processed image file to Cloudant database

Request

Parameter	Description
metadata	metadata of image to be uploaded
annotation_meta	Annotation details for image object

Response

 response = upload_metadata(metadata, annotation_meta)

Parameter	Description
response	api response of post call

Write Image to COS

Convert the numpy ndarray image data into Image object and store the data in cos bucket

Request

Parameter	Description
client	cos client instance
bucket	cos bucket name where data is uploaded
file	file name to upload
image	image data to be uploaded

Response

    write_image_cos(cos, bucket, file, image)

Data Preprocessing

Source Files:

Data Preprocessing
- Data resizing: Resized data by specifying the width and height. OpenCv Method
Data Annotation
- Bounding Boxes: bounding box for images with singular object.
- Methods: Adaptive thresholding, Canny edge detection, Contour detection

Methods

Resize image by specifying width, height, and interpolation method

Resize the input image with the given parameters.

Request

Parameter	Description
image	Input image file
width	Output image width
height	Output image height
interpolation	Opencv Interpolation Method

Response

 resized_image = resize(image, width, height, interpolation_method)

Parameter	Description
resized_image	Resized image

Get Resized Data

Resize the input data as per the specification

Request

Parameter	Description
width	Output image width
height	Output image height
interpolation_method	Opencv Interpolation Method

Response

 image_resize = ImageResize(width, height, interpolation_method)
 metadata, resized_data, labels = image_resize.get_resized_data()

Parameter	Description
metadata	List of metadata files
resized_data	List of resized images (numpy array)
labels	List of label for each image

Find Contour in an Image

This method finds all the contours in an input image based on the input method. It takes advantage of opencv methods to remove noise, detect edges, perform adaptive thresholding, and to detect contours.

Request

Parameter	Description
image	Input image
method	contour detection method. Possible values - adaptive thresholding(0), edge detection (1); Default - 0

Response

 contours = find_contours(image, 0)

Parameter	Description
contours	detected contours

Draw bounding rectangle on an object in an image

Finds the coordinates of the rectangle which contains the object in a given contour and draws the rectangle on an input image.

Request

Parameter	Description
contours	detected contours of an image
image	Input image
method	contour detection method. Possible values - adaptive thresholding(0), edge detection (1); Default - 0

Response

 drawn_image, coordinates = draw_bounding_rectangle(contours, image, 0)

Parameter	Description
drawn_image	Image with rectangle on the object
coordinates	Coordinates of the drawn rectangle in the form <x, y, w, h>

Create the annotation deatils and upload the processed data

Generate the metadata for processed image data and upload the new metadata in cloudant database with processed meta files.

Request

Parameter	Description
metadata	metadata file of an image
image	Processed image file
label	Label of processed image
coordinates	Annotation coordinaes of image

Response

 upload_processed_image(metadata, image, label, coordinates)

Get annotated data

Get the annotated processed data

Response

 annotation = Annotation()
 annotated_data = annotation.get_annotated_data()

Feature Engineering

Algorithm Selection

Training Infrastructure

Source File: Prepare Training

Methods

Split the downloaded data into Train & Validation

Split the data in training and testing folders using Sklearn train test split with test size of 20%.
Creates the label file for each image file.
Creates a file with all the labels.

Request

Parameter	Description
metadata	List of metadata files
image_data	List of images (numpy array)
labels	List of label for each image
annotations	Annotation details for each image object

Response

  split_tarin_test_data(metadata, image_data, labels, annotations)

Create Yolo label file

Creates the label file for each image file with format <x_center> <y_center> .
Name of label file is same as the name of image

Request

Parameter	Description
annotation	Annotation coordinates for object in an image
filename	Name of the file to be created
image	Image object
label_id	Label id of object label

Response

  create_yolo_label_file(annotation, filename, image, label_id)

Convert annotation coordinates in Yolo format

Converts the standard annotation coordinates of object in this format: .

Request

Parameter	Description
coordinates	coordinates for object in an image
width	image width
height	image height

Response

  x, y, w, h = get_yolo_format_annotations(coordinates, width, height)

Parameter	Description
x	x_center relative to width of image
y	y_center relative to height of image
w	width of object relative to width of image
h	height of object relative to height of image

Create file with all the classes

Create a obj.names file which conatins all the avialable classes in a data sample.

Request

Parameter	Description
classes	set of all the vaialble classes of objects

Response

  create_class_names_file(classes)

Append content of a directory in a file

List the contents of given data directory in a file. This is used to list all the train and test file names with jpg extension which is an input to Yolo algorithm.
This will list out the filename with relaive path to the darknet directory.

Request

Parameter	Description
content_path	data directory
filename	filename where all the content will be listed

Response

  append_dir_content_in_file(content_path, filename)

Create a file which contains the traing details for Yolo

append location of Train.txt file which contains path to all the training files
append location of Test.txt file which contains path to all the validation files
append location of classes( obj.names ) file which contains all the class names
append location of backup directory which will be used for training backups

Request

Parameter	Description
backup_dir	Path of backup directory

Response

  append_training_details(backup_dir)

Download Pretrained weights for Yolo Custom training

Download weight file from darknet repository

Response

  download_pretrained_weights()

Files

developer.md

Latest commit

History

developer.md

File metadata and controls

Developer Guide

Setup

Pipeline Buckets

Data Connections

Methods

Response

Response

Response

Response

Response

Request

Response

Request

Response

Request

Response

Request

Response

Request

Response

Data Preprocessing

Methods

Request

Response

Request

Response

Request

Response

Request

Response

Request

Response

Response

Feature Engineering

Algorithm Selection

Training Infrastructure

Methods

Request

Response

Request

Response

Request

Response

Request

Response

Request

Response

Request

Response

Response

Model Deployment

Continuous Improvement