Unstructured Market Structure
There’s a whole marketplace structure of unstructured technology toolset vendors today. Amazon Web Services (AWS) offers an entire menu of functions in this space. Amazon Comprehend is a natural language processing and machine learning service capable of extracting metadata, extracting key phrases and determining sentiment from text in multiple languages. AWS positions this service alongside the Amazon Transcribe speech-to-text tools, the quirkily named Amazon Rekognition image and video analysis service… and there’s also Amazon Textract, which extracts metadata from scanned documents and images.
Given the breadth of AWS services in this market, it would be reasonable to expect similar-but-skewed proprietary versions of these functions in the major cloud service provider hyperscalers. Microsoft Azure Cosmos DB is a globally distributed, multi-model database with enough intelligence to be able to manage structured, semi-structured and unstructured data. This cloud-native database might be used alongside the playfully named Microsoft Blob Storage service, an object storage service designed for storing large amounts of unstructured data that might exist in images, videos, documents and other binary data. Also from Microsoft, AI Document Intelligence uses machine learning to extract text, key-value pairs, tables and structures from documents automatically.
Not to be left out, Google Cloud Platform also works at this level. The cloud and search giant points to its BigQuery brand and the object tables function within it. “Object tables provides a structured record interface for unstructured data stored in Google Cloud Storage. This enables [users]
to directly run analytics and machine learning on images, audio, documents and other file types using existing frameworks like SQL and remote functions natively in BigQuery itself,” noted the Google Cloud’s Gaurav Saxena and Thibaud Hottelier, at the time of this product’s launch a couple of years back.
An IT Sub-Discline Of Its Own
Given the services that exist as fairly prominent functions in the major cloud providers and from the toolsets that exist from more specialized players, working with unstructured data is clearly now a more pressing need. Often referred to as enterprise content management, ECM is certainly growing in the combined shadow of big data analytics and and rise of artificial intelligence.
The natural evolution for a data market like this is the arrival of industry-specific services aligned to industry verticals. Known for its work in unstructured data management across the healthcare industry, Hyland treads a careful line with its messaging as the company clearly wants to be seen as applicable to all use cases. The company says Hyland Content Intelligence turn unstructured data into actionable, AI-ready content with the 2025 arrival of its Knowledge Enrichment (currently in Beta) service being among its star players.
Related technologies are also present at IBM in the form of Watson Discovery for unstructured search and AI; Elastic for indexing and querying of unstructured text and logs; Cloudera for Hadoop-based data lake services across unstructured and semi-structured data; Databricks, Collibra, Alation, Palantir and Varonis, to name but a mouthful, there is a lot of structure being applied to the unstructured data space.
Black Box Blind Spots
“Unstructured data remains a black box for most organizations, [especially] as it becomes critical for AI and business operations,” said Jay Limburn, chief product officer at Ataccama. “Without a way to structure, govern and trust that information, enterprises risk missing the full value of their data.”
Limburn points to his firm’s Ataccama One platform as a means to combine data quality, governance, observability, lineage and master data management. Ataccama One is now available on Snowflake Marketplace as a new integration with Document AI, a Snowflake AI feature that uses Arctic-TILT, a proprietary large language model used to extract data from documents.
This fusion of data structuring services is billed as a means of turning unstructured content, such as contracts, invoices and PDFs, into structured data by running models directly within Snowflake. Businesspeople can use natural language prompts, such as “What is the effective date of the contract?”, which are then processed by Snowflake to create structured outputs written directly into Snowflake tables.
Unstructured AI Services
Where does the unstructured marketplace go next? If we accept the proposition that AI services are partly responsible for the surge in this sector (or let’s at least call it a sub-surge in a sub-sector), then we might actually see AI services themselves starting to shoulder the responsibility for structuring our unstructuredness.
Given the current debate over whether chat-based AI services will take over browser search – and the fact that OpenAI offers GPT-based APIs for text extraction, summarization, semantic intent analysis and classification – that might be exactly what happens.
– Adrian Bridgwater, Published Courtesy of Forbes.