Leading manufacturing companies have made it a corporate priority to reduce product costs, increase engineering productivity, and speed time to market through parts reuse. But PDM systems—designed for version and revision control—often don't support reuse search or knowledge-based methodology, making parts libraries difficult to use.
Full-text searching helps to solve this problem, allowing documents and parts to be indexed and then searched by their attributes and the contents of their occurrences. This means that users who need to find something quickly in ProjectLink or PDMLink can simply type the name of what they are looking for and immediately see a list of alternative documents, parts, and assemblies.
In this article we discuss how to integrate Apache Lucene, a full-featured text search engine library written entirely in Java, with ProjectLink and PDMLink. This open-source technology is suitable for nearly any application that requires full-text search, especially cross-platform. See www.apache.org for free download.
Using Digester to Parse an XML File
To get started, we use Commons Digester to parse a simple XML file. Digester provides a high-level interface for mapping XML documents to Java objects and can be customized to Windchill WT objects. {Digester requires a few additional Java libraries, including an XML parser compatible with either SAX 2.0 or JAXP 1.1.)
XML Example of an Assembly (cvs or xml files imported from PDMLink)
The box below contains entries about WTDocuments. To demonstrate handling of elements with and without attributes, this example includes an attribute of the <part> element and leaves all other elements without any attributes.
<?xml version='1.0' encoding='utf-8'?>
<assembly-part>
<part type="part1">
<attribute_1>ATTRIBUTE_1</attribute_1>
<attribute_2>ATTRIBUTE_2</attribute_2>
<attribute_3>ATTRIBUTE_3</attribute_3>
...
<attribute_n>ATTRIBUTE_N</attribute_n>
</part>
<part type="part2">
<attribute_1>ATTRIBUTE_1</attribute_1>
<attribute_2>ATTRIBUTE_2</attribute_2>
<attribute_3>ATTRIBUTE_3</attribute_3>
...
<attribute_n>ATTRIBUTE_N</attribute_n>
</part>
</assembly-part>
Parse XML Data
Using Digester to parse the XML document is very simple. The most involved part is centralized in the main() method. The first rule tells Digester to create an instance of the WTAssemblyParser class when the pattern <assembly-part> is found. Because <assembly-part> is the first element in the XML file, this rule will be the first to be triggered when we use Digester with our XML file.
The next rule instructs Digester to create an instance of class Part (WTPart can be created as an extended object) when it finds the <part> child element under the <assembly-part> parent.
Our WTAssembly Parser class contains several rules that look similar to the one shown below. They instruct Digester to invoke the setAttribute_1() method of the part class instance and use the value enclosed by <attribute_1> elements as the method parameter.
Now we can use Lucene and create indices for WTObjects. There are four fundamental Lucene classes for indexing text: (1) IndexWriter, (2) Analyzer, (3) Document, and (4) Field . The IndexWriter class creates new indexes. First it is passed through Analyzer classes, which are in charge of extracting indexable tokens out of text. Lucene comes with a few different Analyzer implementations.
We create our own WTPartAnalyzer or WTDocumentAnalyzer for certain things. For example, some of them deal with skipping stop words (frequently used words that don't help distinguish one document from the other, such as a, an, the, in, and on ), while others deal with converting all tokens to lowercase letters, so that searches are not case-sensitive. An index consists of a set of documents, and each document consists of one or more fields.
Lucene-based WTParts indexer
Let's consider a simple scenario in which we add a single contact entry with all its fields to the index.
The first parameter in IndexWriter 's constructor specifies the directory where the index should be stored. The second parameter provides the implementation of Analyzer that should be used for preprocessing the text. The particular implementation of Analyzer used here employs the whitespace character as the delimiter for tokenizing the input. The last parameter is a boolean flag that, when true , tells IndexWriter to create a new index in the specified directory or to overwrite any existing index in that directory. A value of false tells IndexWriter to add Document s to an existing index instead.
We then create a blank Document and add several Text Field s to it. After the Document is populated, we add it to the index through the instance of IndexWriter . Finally, we close the index. Closing the IndexWriter is important to ensures that all index changes are saved to the disk.
Integrating the Digester and Lucene Tools
Now that you know how to use these tools on their own, we can combine the two classes we've written. We'll use Digester to handle XML parsing and Lucene to handle indexing. Let's look at some selections from this class in more detail. Just as we did in the AddressBookIndexer class, we need to open the Lucene index for writing using IndexWriter. We pass in the path to the index directory, the Analyzer to process all data being indexed, and a createFlag that is set to true , so that the index is opened in the append mode.
// IndexWriter to use for adding contacts to the index
writer = new IndexWriter(indexDir, analyzer, flag);
Use addPart(Part) Method to Add the Document to the Index
The modified addPart(Part) method shown below now creates a fresh instance of the Lucene Document every time it is called. After the Document is populated with data from the Part instance that is passed into the method, it is added to the index through an instance of IndexWriter .
Document wtDocument = new Document();
/*
* We can provide additional interface to Lucene Document extended from
Finally, at the end of the main() method, the index is optimized and closed to ensure that all Parts added are indeed written to the index on the disk.
// optimize and close the index
writer.optimize();
writer.close();
Using Lucene Classes to Search Text
Now we can create a Lucene index from an assembly containing parts entries encoded in XML. Here, we run a query that looks for all contents that contain the keyword "Zane" in the field called name .
1. Search the Assembly Index Created with the Lucene Indexer
// org.apache.lucene
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
// java
import java.io.IOException;
/**
* <code>WTAssemblySearcher</code> class provides a simple
* example of searching with Lucene. The index being searched
* is called "assembly-part", located in a appropriate directory.
* But in this case we store in temp folder.
*
*/
public class WTAssemblySearcher
{
public static void main(String[] args) throws IOException
The IndexSearcher class is used for accessing an existing index. The argument passed to its constructor is the path to the directory where the index is stored. Lucene provides a few different query types, the simplest being TermQuery. The call to IndexSearcher 's search(Query) method executes the search against the index and returns a collection of matching Document s in an instance of Hits .
For more advanced search design, you can use other Lucene capabilities and integrate them with the Windchill API (part of it is already done in PDMLink and ProjectLink). For instance, you can use several different types of queries with Lucene—boolean queries, phrase queries, wild-card queries, and so on. Lucene also lets you search multiple indices at once, as well as search indices located on remote computers (distributed search). Another useful feature is Lucene's QueryParser , which supports a powerful and user-friendly query syntax that can be integrated into Windchill WTQuery framework.