Metadata Extraction/Indexing - Documentation topics on: file assets,indexes,metadata,velocity,.

Metadata Extraction/Indexing

You may dynamically extract metadata information from files using Velocity code. For information on how metadata is extracted on upload, or how it displays on file content, please see the File Metadata documentation.

Searching Metadata Fields

Backend Search

Once you have uploaded your files, the metadata fields and values are stored in the database and also indexed using ElasticSearch. You can search the Metadata fields using the Content Search feature in the dotCMS backend UI:

  1. Go to the Content tab.
  2. Select Type: File Asset.
    • This displays all the file content you have created or uploaded.
  3. Click Advanced on the left side bar.
  4. Enter the information you wish to search for in the Metadata field.

File Search

For example, to search for JPG images enter: contentType:image/jpeg. To search for all PDF documents enter: contentType:application/pdf.

Searching for Images


You may search in file metadata using ElasticSearch by adding a +FileAsset.metadata term to the query to search the Metadata field:

+contentType:FileAsset +FileAsset.metaData.[fieldname]:*[value]*

For example, to search for all JPEG images, you can use the following search term:

+contentType:FileAsset +FileAsset.metaData.contentType:*image/jpeg*

Accessing File Contents

You can also search inside document contents using the content: keyword. This works both in backend Content search and in ElasticSearch queries.

The following example searches for all files that have the words: “direct access to our” inside the document content from the dotCMS backend:

Searching for PDF Documents

The following search terms perform the same search within an ElasticSearch query:

+contentType:FileAsset +FileAsset.metaData.content:*direct access to our*

Displaying Metadata Fields and Values

You may access metadata fields from any files retrieved using a content pull in your Widgets and other Velocity code, and you may create ElasticSearch queries using the metadata fields.

Note: The metadata keys and values for each file are stored as a JSON string.

In your code, you can access the metadata in three different ways:

  • Retrieve and process the complete Metadata string (as a JSON string):
  • Loop through all individual Metadata fields:
    #foreach($field in $file.metaData.entrySet())
        $field.key : $field.value
  • Access individual Metadata fields by name and value:


The following example searches for two most recently modified JPEG image files and displays the metadata in all three ways listed above.

#foreach($file in $dotcontent.pull("+contentType:FileAsset +metaData:image/jpeg",2,"modDate desc"))
    <h2>File: $file.title</h2>
    ##Loop through all metadata field key, values
    #foreach($field in $file.metaData.entrySet())
        <li><b>$field.key</b> $field.value</li>
    <h3>Print each value separately:</h3>
    <p>Content type: $file.metaData.contentType</p>
    <p>File Size: $file.metaData.fileSize</p>

The following image displays the results of a widget using the above code on a sample site:

File Listing Widget