Skip to content

Azure Cloud Services Utilised in the PII Platform

The PII App leverages various Azure cloud services to enhance its capabilities in PII recognition, machine learning, and overall system performance. By integrating Azure services, the application ensures high accuracy, scalability, and compliance with data privacy regulations.

Asynchronous Task-Based Architecture for the Integration Layer

The Asynchronous Task-Based Architecture efficiently handles independent tasks by processing them asynchronously. Tasks such as fetching file contents, processing data, and updating results are queued using a task queue system like Celery. Worker nodes asynchronously pick up and execute these tasks, enabling parallel processing and non-blocking operations. The architecture interacts with external APIs, such as the Relativity One REST API, for data fetching and updating, while the backend handles core logic like PII detection. Here is a breakdown of the key layers within this architecture:

Task Executor and Queue Management

  • Celery: The integration utilises Celery, an asynchronous task queue/job queue system, to manage and execute tasks. Celery is responsible for queuing tasks related to fetching file contents from Relativity One, processing these files, and updating the results back to Relativity One.

  • Task Queuing by Using RabbitMQ: Tasks such as fetching file contents, analysing files, and pushing results are placed in a queue. Celery workers then pick up these tasks and execute them asynchronously, ensuring efficient handling and load distribution.

API Interactions

  • Fetching File Contents: The PII platform makes API calls to the Relativity One REST API to fetch the contents of the files that need to be analysed. These API calls are initiated by tasks queued in Celery.

  • Processing Files: Once the file contents are fetched, the PII platform processes these files to identify any PII. This processing is handled within the PII platform's backend.

  • Pushing Results: After analysing and detecting PII, the results are pushed back to Relativity One. This involves updating the metadata tags on the documents in Relativity One to reflect the presence of PII.

Workflow

  • Task Initiation: A user action or a scheduled job initiates a task to fetch file contents from Relativity One.

  • Task Queuing: The task is queued in Celery, awaiting execution by a Celery worker.

  • File Fetching: The Celery worker picks up the task and makes an API call to Relativity One to fetch the file contents.

  • File Analysis: The fetched file contents are processed by the PII platform to identify and PII.

  • Result Pushing: After processing, another task is queued to push the analysis results back to Relativity One. This task updates the document metadata to reflect the analysis and PII detection performed.

Benefits of Asynchronous Task-Based and Multi-Layered Architectures

  • Efficiency: Asynchronous processing ensures that tasks are executed without blocking other operations, leading to better resource utilisation and reduced latency.

  • Fault Tolerance: Decoupling task initiation from execution helps isolate failures. If a task fails, it can be retried or handled without impacting the entire system.

  • Load Distribution: Tasks are distributed across multiple workers, preventing bottlenecks and ensuring balanced workload management.

  • Modularity: The system's components are decoupled, making it easier to maintain and update individual parts without affecting the whole system.

  • Improved Throughput: Parallel task execution enhances overall throughput, allowing the system to process more tasks in less time.

  • Flexibility: Tasks can be prioritised, scheduled, and managed independently, providing flexibility in how operations are handled.

  • Resource Management: Efficiently manages system resources by allocating tasks to available workers, ensuring optimal performance.