Specialized Data Factory Processing
Takeraway
For some very specific use case, when an high level of customization is required, for security reasons or for performance reasons, Product-Live Data Factory platform is able to deport specific task processing to a tierce infrastructure. The use cases we have encountered so far are as follows:
- The need to connect to a specific system that is not publicly accessible
- Processed highly sensitive information that cannot be exposed to the outside world
- Executed very intensive processing that requires very specific hardware
- Handle specific tasks that are not natively available in the Data Factory platform
Overview
The Data Factory platform is able to delegate the processing of specific tasks to a tierce infrastructure. In this case, the Data Factory platform is only responsible for orchestrating the tasks within a job, and in the case of a custom task, it's up to the task creator to execute it and return the result to the platform.
Custom Task
A custom task is a task that is not natively managed by the Data Factory platform. It is up to the task creator to execute it and return the result to the platform.
Example
The following example illustrates the execution of a job containing a custom task. In this example, the job is composed of two tasks, the first one is a native task (for: an exhaustive export of the data stored in one particular table), the second one is a custom task (for example the integration in a specific system of the data exported in the first task).
Requirements
To be able to use the specialized processing feature, the following requirements must be met:
- You must have access to the Data Factory platform, a pipeline and a valid token allowing you to interact with the platform.
- You need to be able to manage the production of the application that will perform the desired task (an example is available below), but also to manage the execution of this code in a production environment (whether on your own servers or on a cloud infrastructure provider such as Azure or AWS).
Implementation
As described above, interoperability between your applications and the Product-Live Data Factory platform is achieved via our APIs. The only interactions required are the following:
- Retrieving a task to be processed by your application
- Communication of the result produced to our services
Of course, you can use the language of your choice to perform thoses two opérations. You can generate an SDK using the OpenAPI definition of our API, or use our Node.js/Typescrtipt SDK.
A sample implementation using the NestJs framework and Typescript language is available here (Product-Live/data-factory-task-example).