Data Extractions: Step-by-step

Learn how to streamline your workflows by extracting relevant values from files

Step 1: Upload your application form

The first step is to send the files to Herald. This can be accomplished using the [.h-endpoint-link]POST /files[.h-endpoint-link] endpoint.

Note that this endpoint requires the request body to be formatted as [.h-code]multipart/form-data[.h-code] instead of [.h-code]application/json[.h-code].

Here is an example to demonstrate this using the [.h-code]curl[.h-code] utility:

  
curl --request POST \
  --url https://sandbox.heraldapi.com/files \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer api-key-goes-here' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@/path/to/your/file.pdf
  

The endpoint also takes an additional “type” field in the request body, which you can leave null for this workflow

Retrieve the corresponding file id from the API response.

Example response:

POST /files
Copied

{
  "file": {
    "id": "d7c4579a-5450-4e79-bcfc-e918b3c8a564",
    "format": "pdf",
    "file_name": "herald_quote_summary_d7c4579a-5450-4e79-bcfc-e918b3c8a564",
    "text": "Application Prefill",
    "created_at": "2022-08-11",
    "size": 2470,
    "status": "available",
    "associations": null
  }
}
 

Repeat this for all files for which you wish to extract values from. One data_extraction can accept multiple files, and files can be reused for multiple separate extractions in any combination.

Step 2: Create a data extraction

Create a data extraction indicating the desired uploaded files using the field file_ids included in the [.h-endpoint-link]POST /data_extractions[.h-endpoint-link] endpoint. Optionally include the context field to indicate any relevant context for the data extraction. In this example we are specifying the line of business for which we wish to extract parameters

Example request:

POST /data_extractions
Copied

{
	"file_ids": [
  	"d7c4579a-5450-4e79-bcfc-e918b3c8a564",
    "601048b5-1e50-433d-8f1c-292ad7c7eb7a"
  ],
  "context": "This is a Cyber Quote"
}
 

You should expect a response that looks like the following, where status is pending and parameter values are null.  This response indicates that your files are being processed.

Example response:

POST /data_extractions
Copied

{
  "data_extraction": {
    "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
    "status": "pending",
    "file_ids": [
      "d7c4579a-5450-4e79-bcfc-e918b3c8a564",
      "601048b5-1e50-433d-8f1c-292ad7c7eb7a"
    ],
    "risk_values": null,
    "coverage_values": null,
    "context": "This is a Cyber Quote",
    "created_at": "2023-10-11T21:51:52.737Z",
    "updated_at": "2023-10-11T21:51:52.737Z"
  }
}
 


Step 3: Get your extraction results

Once the data extraction has been processed (expect ~30s wait time), you can send a request to [.h-endpoint-link]GET /data_extractions/{data_extraction_id}[.h-endpoint-link] with the extraction_id to retrieve results. This can be accomplished either via polling intermittently for asynchronous updates or listening on webhooks.

The response body will include the set of risk and coverage values that have been extracted based on applicable information contained in the files.

Example response:

GET /data_extractions/{data_extraction_id}
Copied

{
  "data_extraction": {
    "id": "7aca2557-584f-40de-bbf0-9bdbee7c9fd5",
    "status": "available",
    "file_ids": [
      "d7c4579a-5450-4e79-bcfc-e918b3c8a564",
      "601048b5-1e50-433d-8f1c-292ad7c7eb7a"
    ],
    "risk_values": [
      {
        "value": "ACME Inc.",
        "risk_parameter_id": "rsk_m4p9_insured_name"
      },
      {
        "risk_parameter_id": "rsk_jsy2_primary_address",
        "value": {
          "line1": "100 Main St",
          "line2": null,
          "line3": null,
          "city": "Somerville",
          "state": "MA",
          "postal_code": "02144",
          "country_code": "USA",
          "organization": null
        }
      },
      {
        "value": 3000000,
        "risk_parameter_id": "rsk_vrb1_total_annual_revenue"
      },
      {
        "value": "yes",
        "risk_parameter_id": "rsk_7ahp_has_domain"
      },
      {
        "value": "example.com",
        "risk_parameter_id": "rsk_dy7r_domain_names"
      }
    ],
    "coverage_values": null,
    "context": "This is a Cyber Quote"
    "created_at": "2024-10-24T14:57:43.624Z",
    "updated_at": "2024-10-24T14:59:12.031Z"
  }
}
 

Now you are ready to apply your extracted values to an application! Visit the applications data extractions page to learn more about applying extracted values to an application.

[Optional] View all extractions associated with a file

In the event you created multiple extractions on the same file object (for example, when you were unsatisfied with the results from an earlier extraction), you have the option to review all historical extractions associated with a file by querying the [.h-endpoint-link]GET /data_extractions[.h-endpoint-link] endpoint with a file id.

If you expect a large number of extractions associated with the file, you can also include a limit and a pageparameter in the request to specify requirements for pagination.