Searching in Enonic CMS
The search functionality in Enonic CMS was completely rewritten for version 4.3. As a result, the options for creating specialized searches improved, and most types of searches enjoyed a performance boost, while backward compatibility was maintained, one hundred percent. (Version 4.4 have a few more performance improvements, but otherwise, nothing have changed, so while this article was written for v.4.3, it also applies to v.4.4.)
This article will first explain the way searching was done in v.4.1 and v.4.2, then dig into the new and cool functionality that was added in v.4.3.
First, a definition:
- Indexing: The process of storing data in a way that makes the data searchable.
Search configuration
All content that is stored in Enonic CMS is indexed when it is stored. How it is indexed, depends on the configuration of the content type. When defining the content type, the indexing of each field have to be specified, for instance like this:
<indexparameters>
<index xpath="contentdata/teaser"/>
<index xpath="contentdata/heading"/>
<index xpath="contentdata/article/preface"/>
<index xpath="contentdata/article/text"/>
</indexparameters>
This is a part of the configuration XML from the definition of an article, and specifies 4 fields, which will be indexed. These are the teaser and the heading, then the preface and text of the main article.
If the indexparameters node is empty, none of the fields of the content will be indexed:
<indexparameters/>
Since ‘contentdata’ is a rather long word that is repeated often, Enonic CMS has an abbreviation for it, ‘data’, which can be used both in searches and the configuration. So, for instance, we could write the second line above, like this:
<index xpath="data/teaser"/>
Executing a search
All the examples in this article may be tested, with modifications for your own datamodel, in the filter field of the advanced search for the archive, except the ORDER BY part.
Basic searching
To search for a text string in the article, this search would normally be good enough:
contentdata/article/text CONTAINS '<SEARCH_STRING>'
However, if all the indexed fields should be searched, each of the indexed fields must be a part of the search, like this:
data/teaser CONTAINS '<SEARCH_STRING>' OR data/heading CONTAINS '<SEARCH_STRING>' OR data/article/preface CONTAINS '<SEARCH_STRING>' OR data/article/text CONTAINS '<SEARCH_STRING>'
Search operators
CONTAINS is the most common matching operator, since it is the one that most resembles the type of searches that are executed in search engines. However, there are many more ways to create specific and intelligent search expressions:
- = (equals) : Requires an exact match between the indexed data and the search string.
- != (not equal) : Matches everything that is not exactly equal.
- < (less than) : May be used with numerical, string and date values. For strings it matches values that would appear before the search string in an alphabetical sort. For dates, all dates before the given date are matched.
- <= (less than or equal) : Same as less than, but also includes matches where the values exactly matches the search string.
- > (greater than) : May be used with numerical, string and date values. For strings, it matches values that would appear after the search string in an alphabetical sort. For dates, all dates after the given date are matched.
- >= (greater than or equal) : Same as greater than, but also includes matches where the values exactly matches the search string.
- IN : Match a set of values, listed in a comma separated list, inside a set of parenthesis.
- CONTAINS : Match all values where the search sting appear anywhere within the indexed values.
- STARTS WITH : Match all values that starts with the specified search string.
- ENDS WITH : Match all values that ends with the specified search string.
-
LIKE : Similar to CONTAINS, but percentage signs have to be inserted as wildcard.
- LIKE ‘%<SEARCH_STRING>% ‘, is the same as CONTAINS.
- LIKE ‘<SEARCH_STRING>%’ is the same as STARTS WITH.
- LIKE ‘%<SEARCH_STRING>’ is the same as ENDS WITH.
- LIKE ‘<SEARCH_STRING>’ is the same as EQUALS.
- The percentage sign may also be placed inside the SEARCH_STRING, like this: data/name LIKE ‘John % Bull’, which should return any person named John Bull, regardless of middle name.
These matching operators may be combined in different ways using the following logical operators:
- AND : Both the expressions on the left side and the right side of the AND must match in order for the content to be a part of the result.
- OR : Either of the expressions on the left side or the right side of the OR may match in order for the content to be part of the result.
- NOT : A NOT placed in front of any expression will exactly reverse the result, so that the content that would otherwise be ignored are now included and vice versa.
Finally, there is a keyword, ORDER BY, that determines the order that the resulting content is returned in. If a specific ordering of the requested content is desired, attaching ‘ORDER BY’ followed by a field name, will sort the result according to that field. If the field can contain many equal values, several fields may be specified in a comma separated list. ‘DESC’, short for descending, may be attached after each field to reverse the ordering. This is especially useful with date values to get the newest content first. ‘ASC’, short for ascending may also be used, but since that is default, it has no effect on the result.
Here are some examples:
data/heading STARTS WITH 'Enonic' AND data/modifieddate > '2008-01-01'
Result: All content modified in 2008 or later where the heading starts with the word Enonic. Note: The searches are not case sensitive, so it may also match headings starting with ‘ENONIC’ or other case variations.
(data/age IN (13, 14, 15, 16, 17, 18, 19) OR data/age >= 67) AND (data/firstname = 'Kim' OR data/firstname = 'Kimberly') ORDER BY data/age DESC, data/lastname
Result: Teenagers and retired people - actually people who have turned 67 years of age - whose first name is Kim or Kimberly. The oldest people are listed first, and those of the same age are ordered alphabetically by their last name.
Date functions
Date columns are basically different from strings, so in order to work with these, there are 3 supplied functions that may be used:
- now() : Returns the current time for comparisons.
- today() : Returns the current date. More specifically, it returns the current date with a timestamp of midnight. This is important since most databases do not operate with pure dates, but will include a timestamp on all dates. Because of this, it is important to note that the equals operator will only match if the timestamp is also equal. Normally it is, because the default timestamp in the databases is midnight.
- date() : Parses the specified date and converts it to a date for comparisons. Note that the only date format supported by Enonic CMS is YYYY-MM-DD. This is chosen because dates in this format may be compared or sorted alphabetically, and still give the correct result.
New features in version 4.3
So far, we have been discussing the search functionality of v.4.1 and v.4.2. All this will work in v.4.3, but in addition we have some new cool features.
Indexing standard fields
First we have some fields that are automatically indexed, even if the ‘indexparameters’ node in the configuration XML is empty. These are:
- contenttypename : The name of the content type the content belongs to.
- created : The time and date the content was created
- owner/key : The database key of the user who created the content, or whom later have taken the ownership of the content.
- timestamp : The time and date of the last change to the content.
- modifier/key : The database key of the user who made the last change to the document.
- publishfrom : The time and date when the content was / will be published.
- publishto : The time and date when the content was /will be taken offline.
- status : An integer value indicating the progress of the document. 0 = draft, 1 = waiting for approval, 2 = approved, 3 = archived.
- title : The title or heading of the document.
- fulltext : Content of binary data. More information in the next section.
Using these fields, all kinds of date-limited searches can be made. It’s also very useful to be able to search in the title of all content in the archive, even content which may have no other textual information stored, like images. A final idea on how to use these fields is restricting the search to a specific content type across all archives.
Fulltext searches
Another new feature in v.4.3 is indexing of binary data. Not only are they indexed with the standard fields mentioned above, but for those types of files where it makes sense, like PDF or Microsoft Office documents, Enonic CMS tries to extract the text content of the binary file to index it for searches.
For such documents, the indexed data is stored in a field called simply ‘fulltext’, so a search like this will find all binary documents with the word Enonic in them:
fulltext CONTAINS 'Enonic'
Summary of new features
All the new fields may be combined with the fields that have been specifically indexed, in any way you can imagine. Here are some examples:
data/heading STARTS WITH 'Enonic' AND timestamp > date('2008-01-01')
Result: All content created or modified in 2008 or later where the heading starts with the word Enonic.
(contenttypename = 'person' AND data/person/education CONTAINS 'artificial intelligence') OR (contenttypename = 'article' AND data/article/text CONTAINS 'artificial intelligence') OR fulltext CONTAINS 'artificial intelligence'
Result: Most content in the archive that has anything to do with artificial intelligence, based on the assumption that for this imaginary archive, the most likely place to find such content is either in the education information about a person, in the main text of an article or in binary documents.
(publishfrom > now() OR publishto < now()) AND data/article/text CONTAINS 'Enonic'
Result: Content where the article text contains the word Enonic, that is not currently published.




Comments
If you want to comment on this article you need to be logged in.