Data Extractions
Extract information from uploaded files
Documents are widely-used within the quote-and-bind workflow and in the insurance industry broadly. For example, many brokers collect application PDFs as part of the quote process. Others may require the insureds to submit documents like financial statements or location summary as part of the application. As a result, brokers typically already have access to many data used to create quotes in these documents.
However, these documents are often not consistently structured, readily machine-readable, or readily human-readable, making it a challenge to manually extract data from them when attempting to create a quote.
This feature gives the ability to extract data from documents for easy use with Herald’s API.
Extraction use cases
When Herald performs an extraction we take as a source some document (e.g., a PDF) and output structured data. Here are two commonly seen use cases for using Herald’s data extractions endpoints.
- Gathering information to fill an application for submission to Herald. In this case, Herald takes as a source a set of files and extracts all of the information that can be used to populate risk and coverage parameters using Herald’s library for reference. These extracted values can then be applied to a Herald application and submitted once the application is complete.
- Extracting data to populate into a client’s system. The goal of this extraction is to store structured data in a client system of record. In these cases, the data may or may not be applied to a Herald application.
Data Extractions Flow
Herald utilizes AI-based extractions. Our AI agent reads and interprets a set of documents and maps the data into the target data model (Herald parameters or a clients data model). Additionally, a user has the opportunity to provide context to the AI agent which can influence model choice and how values are extracted.
A Data Extraction can have the following statuses:
What files can Herald extract data from
Herald supports extraction from a wide-variety of file types to enable your desired workflow.
Supported File Formats:
- IMAGES: png, jpg, bmp, gif, jpeg, svg
- DOCS: abw, cgm, cwk, doc, docx, dot, dotm, hwp, key, lwp, mw, mcw, pages, pbd, ppt, pptm, pot, potm, potx, rtf, sda, sdd, sdp, sdw, sgl, sti, sxi, sxw, stw, sxg, txt, uof, uop, uot, vor, wpd, wps, xml, pdf
- SPREADSHEETS: xlsx, xls, xlsb, xlw, csv, dif, sylk, slk, prn, numbers, et, ods, fods, uos1, uos2, dbf, wk1, wk2, wk3, wk4, wks, 123, wq1, wq2, wb1, wb2, wb3, qpw, xlr, eth, tsv
- OTHER: json
Herald will explicitly deny file types that pose security risks
Denied File Types:
- zip, xlsm
[.icon-circle-blue][.icon-circle-blue] For additional file type and format needs, please reach out to Herald’s Customer Success representative.
How data extraction works
Upload files via [.h-code]POST[.h-code] [.h-endpoint-link]/files[.h-endpoint-link] endpoint. Then create a data extraction using the [.h-code]POST[.h-code] [.h-endpoint-link]/data_extractions[.h-endpoint-link] endpoint. After the extraction is complete Herald will return the extracted data mapped to Herald parameters! You will not receive data that cannot be mapped to Herald parameters or fail our input validation.