Parser Guidelines

Introduction

Parser guidelines aim to provide the typical practices expected to be followed to successfully build an instrument parser using any java-based tool within LDAS.

Pre-requisites

To get started with parser creation, you are required to be familiar with the following items:

  1. Talend Open Studio for Data Integration (can use any other java-based tool) – version v8.0.1
  2. Knowledge of Java version 17 or Java basics and Python (latest version).

You should also have access LDAS v2023.1.0 with license and the ‘Instrument Super User’ role for using the Instrument integration component of LDAS.


Parser execution in LDAS

The following diagram explains the different steps that takes place within LDAS while it picks data from instrument and transforms using parser to move the data for downstream usage.

Step

Build the parser files as per the requirements

Step

Upload the parser in the "Create Instrument" page defined within LDAS.

Step

LDAS receives the instrument's input data, which could be through a file, agent or an API

Step

LDAS places the input file in the working directory

Step

LDAS initiates the activity and executes the parser

Step

Parser generates the output files in the location configured in LDAS

Step

LDAS moves the output files for the final process (External System Push and Archive)

If you are attempting to build the parser that enables the steps mentioned above; it is imperative that you develop the parser in accordance with the expectations set within this guideline.


Building a parser for LDAS

The different components that help to build a standard acceptable parser for LDAS to function as expected is covered in this section.

Convention and folder structure

Once the parser is built, the source class files with a relevant name is compressed along with the “.zip” extension.

The naming convention to be followed for creating the parser and the folder structure to be maintained should be as mentioned below.

  1. Naming Convention

    The expected name for the zipped parser folder is "<ParserName><VersionNumber>" whereas parser name and version number should be separated by ‘‘. No other special characters other than ‘_‘ are allowed in the naming.

    For example, if the name of the parser is “Bioreactor”, then the zipped parser folder should be “Bioreactor_0.1”, where “Bioreactor” is the parser name and “0.1” is the version separated by ‘_’.

  2. Folder Structure

    The expected structure for the parser folder is provided below.

  • <ParserName>_<VersionNumber>.zip
    • <ParserName>_<VersionNumber>
      • <ParserName>

On extracting the zipped parser folder, it must contain folder with the name "<Parser Name>_<VersionNumber>". Within this folder, another folder named <ParserName> should be present.

Example:

If the zipped parser folder is “Bioreactor_0.1.zip”, on unzipping it should contain a folder “Bioreactor_0.1” and inside this folder, a subfolder called “Bioreactor” (as shown in the below image).

Figure 1 Parser zipping format

📘

Note:

If the parser is not in the above format and structure, LDAS will not execute the parser.

Input Parameters

When creating a parser for integration with LDAS, the following parameters are passed as input parameters from LDAS to parser’s main method.


Parameter NameDescriptionPurpose
file_nameName of the input fileThis attribute is used to store the name of the input file which will be generated by the instrument or placed by the user.
temp_pathThe path of the working directory in which LDAS will place the input files.Whenever any input file is received, LDAS will place the file in this working directory. The parser will access the input file from this location.
fileSeparatorThe delimiter used as file path separator.This character must be used by the parser for file path separator. For windows “\” is used and “/” for linux environment.

Table 1 List of parameters available in LDAS

Input Directory

LDAS will create a folder named “input” inside the working directory (temp_path) and place the input files in that folder for the parser to pick the input files.

  • Working directory
    • input
    • output

The parser must be configured to get the “temp_path” parameter from the LDAS and use it to find the input files.

Example:

Figure 1 Input folder structure

Output directory

LDAS creates a folder named “output” folder inside the working directory (temp_path).

Inside this output folder a “json” folder is created, which contains the output files in a zipped format as shown below.

  • Working directory
    • output
      • json
        • Output files in zipped format

LDAS will further process the output files only if it finds a “.zip” extension inside the json folder.

Files that need to be processed to any external system endpoint and/or any files generated by the parser for archiving within LDAS should be kept within this output zipped folder.

Figure 2 Zip file location

The output files should be present immediately after the extracting the zip folder (as shown in the below image). Some examples of output files are result, metadata, and response files.

Figure 3 Zip file contents

Data push to external system

Data from instruments that are parsed, can be pushed by LDAS into more than one external system connected. However, it is important that the file containing this data have the extension “.result”. When there is a need to push data to multiple external systems then multiple .result files are expected to be created within parser and configured the mapping within LDAS.The .result file should have its content in the below format to push the data to the target external system.

Figure 4 result file format


When the data is being pushed into LDAS Archival, for ensuring that the metadata of the files are present, and the necessary metadata information is expected to be present within the file with an extension “.metadata”. The name of the metadata file is expected to be the same name as the input file name. This is to ensure that the metadata gets mapped to the right input file. For example, if metadata is to be captured for an input file named “Routine_Analysis” then the metadata file name is expected to be “Routine_Analysis.metadata”.

The metadata file should have its contents in the below format:

Figure 5 metadata file format

Whenever the instrument Initiation Method is set as ‘Pull from Instrument’ in the Create Instrument page of LDAS and ‘Return Response’ is set as Yes, it is imperative that there is an output file present with extension “.response”.

Figure 6 Create Instrument page


Only when this .response file is present within the output files, the system be able to send the response as expected.

The .response file should have its contents in the below format:

Figure 7 The response file formats

Parser environment

Upon building parsers as per the naming conventions and folder structure explained in this guideline, the last folder <ParserName> should contain a batch (.bat) or a shell (.sh) file and an executable jar.

This is to ensure that when a Windows environment is used, LDAS runs the parser using this batch (.bat) file and when a Linux environment is used, LDAS runs the parser using the shell script (.sh) file.

Batch script and shell script should contain the parameters and path of the library used in the parser building.

Figure 8 Location of batch and shell script to run the parser

Parser log file

You are expected to ensure that a parser log file is always generated for every parser developed and integrated with LDAS. This log file is expected to be present within the working directory.

  • Working directory
    • parser.log

The log information can be used to debug the parser execution.
The file must be named as “parser.log” and placed in the working directory, so that LDAS can access.

Figure 9 Location of Parser.log file