Why Unstructured Data Is Sorting Itself Out

Why Unstructured Data Is Sorting Itself Out
Lego pieces for sale at a Lego Store in Annapolis, Maryland, on April 7, 2025. Earlier in March 2025, Lego’s CEO told AFP that US President Donald Trump’s tariff threats do not keep him up at night, as the world’s largest toymaker on Tuesday posted record earnings for 2024. Sales rose 13 percent to 74.3 billion kroner ($10.8 billion) last year, while net profit grew five percent to 13.8 billion kroner. (Photo by Jim WATSON / AFP) (Photo by JIM WATSON/AFP via Getty Images) AFP via Getty Images

Information, without order, is chaotic. Attempting to work with data without structure and form is rather like watching white noise fuzz on an un-cabled television set, where shapes are almost familiar, but devoid of any recognizable manifestation. Unstructured data inside organizations appears to be full of energy, but it is weighed down by an inertia which precludes it from being useful, primarily because it doesn’t know which home (application) it belongs to.

What Is Unstructured Data?

To define the term, let’s first say that structured data includes spreadsheets with their formalized rows and columns, “form-based” data resources where we know the fields in a document and so we know what values to expect… and of course relational databases, the purest form of an ordered and structured data repository. Unstructured data, therefore, includes non-tabular data spanning records of phone calls and voicemails, it is raw video that has yet to get meta-tagged to explain its contents, it is blogs and web pages, it’s emails and also social media posts in all their forms.

Some data that may appear structured (such as sensor data from surveillance and internet of things devices) is still essentially unstructured i.e. 6,000 temperature readings and gyroscope movement records aren’t necessarily structured just because they are numbered by sequence; they need to be extracted, parsed, deduplicated and manipulated to become structured for productive use. In so many cases, unstructured data is regarded as an untapped source of real business context, but it is often the hardest to bring in line, the hardest to govern and the toughest to operationalize.

Technology analyst house IDC refers to the unclassified morass of information as the “unseen data conundrum” and estimates that unsiloed reserves of unstructured data now make up “the majority of enterprise information” today. IDC also suggests that it is more than doubling (growing 55%) each year. These data blind spots are thought to create operational risk and to potentially undermine the value of AI. This is important now because organizations are using unstructured data to power large language models and retrieval-augmented generation applications.

No Comments Yet

Leave a Reply

Your email address will not be published.

© 2025 Open Data News Wire. Use Our Intel. All Rights Reserved. Washington, D.C.